All language subtitles for The.Human.Face.of.Big.Data.2014.1080p.WEBRip.x264-RARBG

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish Download
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,796 --> 00:00:05,801 (piano music) 2 00:00:14,243 --> 00:00:15,451 - [Voiceover] In the near future, 3 00:00:15,451 --> 00:00:18,751 every object on earth will be generating data, 4 00:00:18,751 --> 00:00:21,343 including our homes, our cars, 5 00:00:21,343 --> 00:00:23,049 even our bodies. 6 00:00:23,049 --> 00:00:24,269 - Do you see it? 7 00:00:25,146 --> 00:00:26,883 Yeah, right up there. 8 00:00:27,400 --> 00:00:29,399 - [Voiceover] Almost everything we do today 9 00:00:29,399 --> 00:00:32,072 leaves a trail of digital exhaust, 10 00:00:32,072 --> 00:00:34,698 a perpetual stream of texts, location data, 11 00:00:34,698 --> 00:00:37,163 and other information that will live on 12 00:00:37,163 --> 00:00:40,038 well after each of us is long gone. 13 00:00:43,136 --> 00:00:45,681 We are now being exposed to as much information 14 00:00:45,681 --> 00:00:49,528 in a single day as our 15th century ancestors 15 00:00:49,528 --> 00:00:52,857 were exposed to in their entire lifetime. 16 00:00:53,734 --> 00:00:56,035 But we need to be very careful 17 00:00:56,035 --> 00:00:58,081 because in this vast ocean of data 18 00:00:58,081 --> 00:01:01,800 there's a frighteningly complete picture of us, 19 00:01:01,800 --> 00:01:04,008 where we live, where we go, 20 00:01:04,008 --> 00:01:06,472 what we buy, what we say, 21 00:01:06,472 --> 00:01:10,604 it's all being recorded and stored forever. 22 00:01:12,399 --> 00:01:15,200 This is the story of an extraordinary revolution 23 00:01:15,200 --> 00:01:18,454 that's sweeping almost invisibly through our lives 24 00:01:18,454 --> 00:01:20,546 and about how our planet is beginning to develop 25 00:01:20,546 --> 00:01:25,305 a nervous system with each of us acting as human sensors. 26 00:01:26,554 --> 00:01:29,733 This is the human face of big data. 27 00:01:33,353 --> 00:01:35,116 - All these devices and machines and everything 28 00:01:35,116 --> 00:01:37,243 we're building these days, whether it's phones or computers 29 00:01:37,243 --> 00:01:41,084 or cars or refrigerators, are throwing off data. 30 00:01:41,996 --> 00:01:45,506 - Information is being extracted out of toll booths, 31 00:01:45,506 --> 00:01:46,970 out of parking spaces, 32 00:01:46,970 --> 00:01:48,760 out of Internet searches, 33 00:01:48,760 --> 00:01:51,259 out of Facebook, out of your phone, 34 00:01:51,259 --> 00:01:53,595 tablets, photographs, videos. 35 00:01:53,595 --> 00:01:57,825 - Every single thing that you do leaves a digital trace. 36 00:01:57,825 --> 00:02:00,336 - The exhaust or evidence of humans 37 00:02:00,336 --> 00:02:04,543 interacting with technology and what side effect that has 38 00:02:04,543 --> 00:02:08,089 and that's literally, it's just this massive amount of data. 39 00:02:14,151 --> 00:02:15,440 - What we're doing is we're measuring things 40 00:02:15,440 --> 00:02:16,649 more than we ever have. 41 00:02:16,649 --> 00:02:19,691 It's that active measurement that produces data. 42 00:02:19,691 --> 00:02:22,072 - If you were some omniscient god 43 00:02:22,072 --> 00:02:24,839 and you could look at the footprints of electric devices, 44 00:02:24,839 --> 00:02:26,791 you could kind of see the world. 45 00:02:26,791 --> 00:02:30,103 If the whole world is being recorded in real time, 46 00:02:30,103 --> 00:02:31,940 you could see everything that is going on in the world 47 00:02:31,940 --> 00:02:34,230 through the footprints. 48 00:02:34,230 --> 00:02:35,519 I think it's a lot like written language, right, 49 00:02:35,519 --> 00:02:36,986 it's just at some point they got to the point 50 00:02:36,986 --> 00:02:38,370 where you had to start writing stuff down. 51 00:02:38,370 --> 00:02:40,451 You just got to the point where it wouldn't work 52 00:02:40,451 --> 00:02:42,042 unless we wrote it down, which is making the same point 53 00:02:42,042 --> 00:02:43,879 where well it ain't gonna work unless we write 54 00:02:43,879 --> 00:02:45,715 all the data down and then look at it. 55 00:02:45,715 --> 00:02:48,844 - And all that data coming in is big data. 56 00:02:49,965 --> 00:02:53,185 - We estimate that by 2020 57 00:02:53,185 --> 00:02:56,985 the data volumes will be at about 40 zigabytes. 58 00:02:56,985 --> 00:02:58,112 Just to put it in perspective, 59 00:02:58,112 --> 00:03:00,530 if you were to add up every single grain of sand 60 00:03:00,530 --> 00:03:03,702 on the planet and multiply that by 75, 61 00:03:03,702 --> 00:03:07,125 that would be 40 zigabytes of information. 62 00:03:08,549 --> 00:03:11,477 - All the data processing we did in the last two years 63 00:03:11,477 --> 00:03:13,057 is more than all the data processing 64 00:03:13,057 --> 00:03:16,646 we did in the last 3,000 years. 65 00:03:16,646 --> 00:03:18,819 - And so the more information we get, 66 00:03:18,819 --> 00:03:21,998 the larger the problems will be that we solve. 67 00:03:23,828 --> 00:03:26,873 - Every powerful tool has a dark side, every last one. 68 00:03:26,873 --> 00:03:28,256 Anything that's going to change the world, 69 00:03:28,256 --> 00:03:31,510 by definition has to be able to change it for the worse 70 00:03:31,510 --> 00:03:32,684 as much as for the better. 71 00:03:32,684 --> 00:03:35,598 It doesn't work one way without the other. 72 00:03:35,598 --> 00:03:36,934 - When it comes to big data, a lot of people 73 00:03:36,934 --> 00:03:38,433 are very nervous. 74 00:03:38,433 --> 00:03:41,106 Data can be used in any number of ways 75 00:03:41,106 --> 00:03:43,698 that you're either aware of or you're not. 76 00:03:43,698 --> 00:03:47,333 The less aware of the use of that data that you are, 77 00:03:47,333 --> 00:03:50,215 the less power you have in the coming society 78 00:03:50,215 --> 00:03:51,214 we're going to live. 79 00:03:51,214 --> 00:03:53,178 - Well sort of just in the beginning of this big data thing, 80 00:03:53,178 --> 00:03:54,297 you don't know how it's going to change it, 81 00:03:54,297 --> 00:03:55,937 but you just know it is. 82 00:03:55,937 --> 00:04:00,942 (dramatic music) 83 00:04:07,878 --> 00:04:11,760 - The first real data set to change everything in the world 84 00:04:11,760 --> 00:04:13,933 was the astronomical data set, 85 00:04:13,933 --> 00:04:18,151 meticulously collected over tens of years by Copernicus 86 00:04:18,151 --> 00:04:20,767 that ultimately revealed, even though the sun seemed to be 87 00:04:20,767 --> 00:04:23,405 moving over the sky every morning and every night, 88 00:04:23,405 --> 00:04:25,949 the sun is not moving, it is we who are moving, 89 00:04:25,949 --> 00:04:27,785 it is we who are spinning. 90 00:04:27,785 --> 00:04:30,074 It happened again when we suddenly could see 91 00:04:30,074 --> 00:04:31,342 beneath the visible level 92 00:04:31,342 --> 00:04:33,968 and the microscope in the 1650s and 60s, 93 00:04:33,968 --> 00:04:36,839 opened up the invisible world 94 00:04:36,839 --> 00:04:40,396 and we for the first time were seeing cells and bacteria 95 00:04:40,396 --> 00:04:43,324 and creatures that we couldn't imagine were there. 96 00:04:43,324 --> 00:04:46,124 It then happened again when we revealed the atomic world, 97 00:04:46,124 --> 00:04:47,578 when we said wait a second, there's a level 98 00:04:47,578 --> 00:04:50,087 below the optical microscope where we could begin 99 00:04:50,087 --> 00:04:53,632 to see things at billionths of a meter at a nanometer scale, 100 00:04:53,632 --> 00:04:55,433 where we imagined the atom and the nucleus 101 00:04:55,433 --> 00:04:57,559 and the electron, where we understood that light 102 00:04:57,559 --> 00:05:00,069 is electromagnetic frequencies. 103 00:05:00,069 --> 00:05:02,731 But now, there's actual a supervisible world 104 00:05:02,731 --> 00:05:04,192 coming into play. 105 00:05:04,192 --> 00:05:06,947 Ironically, big data is a microscope. 106 00:05:06,947 --> 00:05:09,829 We're now collecting exabytes and petabytes of data 107 00:05:09,829 --> 00:05:11,874 and we're looking through that microscope 108 00:05:11,874 --> 00:05:13,963 using incredibly powerful algorithms 109 00:05:13,963 --> 00:05:17,561 to see what we would never see before. 110 00:05:20,564 --> 00:05:22,826 - Before what we did was we 111 00:05:22,826 --> 00:05:25,826 thought of things and then we wrote it down 112 00:05:25,826 --> 00:05:28,491 and that became knowledge. 113 00:05:30,041 --> 00:05:31,215 Big data's kind of the opposite. 114 00:05:31,215 --> 00:05:35,388 You have a pile of data that isn't knowledge really 115 00:05:35,388 --> 00:05:37,886 until you start looking at it and noticing wait, 116 00:05:37,886 --> 00:05:40,105 maybe if you shift it this way and you shift it this way, 117 00:05:40,105 --> 00:05:43,019 this turns into this interesting piece of information. 118 00:05:43,019 --> 00:05:45,855 - I think that the BDAD moment, 119 00:05:45,855 --> 00:05:48,040 you know, before data, after data moment, 120 00:05:48,040 --> 00:05:49,501 is really Search. 121 00:05:49,501 --> 00:05:52,372 (tapping) 122 00:05:52,372 --> 00:05:56,040 That was the moment at which we got a tool 123 00:05:56,040 --> 00:05:59,562 that was used by hundreds of millions of people 124 00:05:59,562 --> 00:06:01,015 within a few years, 125 00:06:01,015 --> 00:06:04,444 where we could navigate an incredible amount 126 00:06:04,444 --> 00:06:06,241 of information. 127 00:06:06,241 --> 00:06:10,065 We took all of human knowledge that was in text, right, 128 00:06:10,065 --> 00:06:11,669 and we put it on the web 129 00:06:11,669 --> 00:06:13,365 and we thought to ourselves, "Well we're done. 130 00:06:13,365 --> 00:06:14,749 "Wow that was hard." 131 00:06:14,749 --> 00:06:18,084 And now we realize that was the first minute 132 00:06:18,084 --> 00:06:21,013 of the first inning of the game, right, 133 00:06:21,013 --> 00:06:23,105 because that was just the knowledge we already had 134 00:06:23,105 --> 00:06:25,359 and the knowledge that we continue to add to the web 135 00:06:25,359 --> 00:06:28,732 at a relatively slow pace, you know. 136 00:06:28,732 --> 00:06:30,531 But there is so much more information 137 00:06:30,531 --> 00:06:32,461 that we have not digitized and so much more 138 00:06:32,461 --> 00:06:35,214 information that we're about to take advantage of. 139 00:06:35,214 --> 00:06:39,040 (piano music) 140 00:06:39,040 --> 00:06:41,552 - [Voiceover] In recent years, our technology has allowed us 141 00:06:41,552 --> 00:06:45,470 to store and process mass quantities of data. 142 00:06:47,115 --> 00:06:49,392 Visualizing that data will allow us to see 143 00:06:49,392 --> 00:06:51,953 complex systems function, 144 00:06:53,074 --> 00:06:54,922 see patterns and meaning in ways 145 00:06:54,922 --> 00:06:57,752 that were previously impossible. 146 00:06:59,466 --> 00:07:03,761 Almost everything is measurable and quantifiable. 147 00:07:12,528 --> 00:07:14,294 - So when I look at data, what's exciting to me 148 00:07:14,294 --> 00:07:16,166 is kind of recontextualizing that data 149 00:07:16,166 --> 00:07:18,211 and taking it and putting it back into a form 150 00:07:18,211 --> 00:07:21,221 that we can perceive, understand, talk about, 151 00:07:21,221 --> 00:07:22,727 think about. 152 00:07:24,185 --> 00:07:25,522 - [Voiceover] This is the data for airplane traffic 153 00:07:25,522 --> 00:07:28,403 over North America for a 24-hour period. 154 00:07:28,403 --> 00:07:29,868 When it's visualized, you see everything starts 155 00:07:29,868 --> 00:07:32,447 to fade to black as everyone goes to sleep, 156 00:07:32,447 --> 00:07:34,376 then on the West Coast, planes start moving across 157 00:07:34,376 --> 00:07:36,375 on red-eye flights to the east 158 00:07:36,375 --> 00:07:38,293 and you see everyone waking up on the East Coast, 159 00:07:38,293 --> 00:07:42,136 followed by European flights in the upper right-hand corner. 160 00:07:42,136 --> 00:07:44,473 I think it's one thing to say that there's 140,000 planes 161 00:07:44,473 --> 00:07:47,646 being monitored by the federal government at any one time 162 00:07:47,646 --> 00:07:49,028 and it's another thing to see that system 163 00:07:49,028 --> 00:07:51,766 as it ebbs and flows in front of you. 164 00:07:57,082 --> 00:07:59,336 These are text messages being sent in the city of Amsterdam 165 00:07:59,336 --> 00:08:00,801 on December 31st. 166 00:08:00,801 --> 00:08:02,427 You're seeing the daily flow of text messages 167 00:08:02,427 --> 00:08:05,019 from different parts of the city until we approach midnight, 168 00:08:05,019 --> 00:08:06,434 where everyone says-- 169 00:08:06,434 --> 00:08:09,149 - [Voiceover] Happy New Year! 170 00:08:09,654 --> 00:08:12,618 - It takes people or programs or algorithms 171 00:08:12,618 --> 00:08:15,325 to connect it all together to make sense of it 172 00:08:15,325 --> 00:08:16,581 and that's what's important. 173 00:08:16,581 --> 00:08:20,090 We have every single action that we do in this world 174 00:08:20,090 --> 00:08:23,181 is triggering off some amount of data 175 00:08:23,181 --> 00:08:24,634 and most of that data is meaningless 176 00:08:24,634 --> 00:08:27,644 until someone adds some interpretation of it, 177 00:08:27,644 --> 00:08:30,485 someone adds a narrative around it. 178 00:08:36,732 --> 00:08:38,998 - Often, we sort of think of data as stranded numbers, 179 00:08:38,998 --> 00:08:41,544 but they're tethered to things 180 00:08:41,544 --> 00:08:44,425 and if we follow those tethers in the right ways, 181 00:08:44,425 --> 00:08:47,680 then we can find the real-world objects 182 00:08:47,680 --> 00:08:49,446 and the real-world stories that were there. 183 00:08:49,446 --> 00:08:52,654 So a lot of the work is that kind of work. 184 00:08:52,654 --> 00:08:56,036 It's almost investigative work of trying to follow 185 00:08:56,036 --> 00:08:59,412 that trail from the data to what actually happened. 186 00:09:05,891 --> 00:09:07,553 - Sometimes the power of large data sets 187 00:09:07,553 --> 00:09:09,801 isn't immediately obvious. 188 00:09:10,481 --> 00:09:11,989 Google flu trends is a great example 189 00:09:11,989 --> 00:09:14,988 of taking a look at a massive corpus of data 190 00:09:14,988 --> 00:09:17,033 and deriving somewhat tangential information 191 00:09:17,033 --> 00:09:19,878 that can actually be really valuable. 192 00:09:19,878 --> 00:09:21,923 - [Voiceover] Until recently, the only way to detect 193 00:09:21,923 --> 00:09:24,678 a flu epidemic was by accumulating information 194 00:09:24,678 --> 00:09:27,431 submitted by doctors about patient visits, 195 00:09:27,431 --> 00:09:30,941 a process that took about two weeks to reach the CDC. 196 00:09:30,941 --> 00:09:33,022 So the researchers turned it around. 197 00:09:33,022 --> 00:09:36,149 They asked themselves if they could predict a flu outbreak 198 00:09:36,149 --> 00:09:40,412 in real time simply using data from online searches. 199 00:09:40,412 --> 00:09:42,970 So they set out to do the near impossible, 200 00:09:42,970 --> 00:09:45,713 searching the searches, billions of them, 201 00:09:45,713 --> 00:09:48,513 spanning five years to see if user queries 202 00:09:48,513 --> 00:09:50,774 could tell them something. 203 00:09:51,976 --> 00:09:53,604 - When we do searches on Google, 204 00:09:53,604 --> 00:09:55,033 we all think of it as a one-way street, 205 00:09:55,033 --> 00:09:57,276 that we're going into Google and extracting information 206 00:09:57,276 --> 00:09:58,996 from Google, but one of the things we don't really 207 00:09:58,996 --> 00:10:00,704 think about very much is we're actually contributing 208 00:10:00,704 --> 00:10:03,329 information back simply by doing the search. 209 00:10:03,329 --> 00:10:05,711 - [Voiceover] And that's where the breakthrough occurred. 210 00:10:05,711 --> 00:10:07,965 In looking at all the data, they saw that not only 211 00:10:07,965 --> 00:10:10,544 did the number of flu-related searches correlate 212 00:10:10,544 --> 00:10:12,311 with the people who had the flu, 213 00:10:12,311 --> 00:10:14,438 but they also could identify the search terms 214 00:10:14,438 --> 00:10:17,575 that could let them accurately predict flu outbreaks 215 00:10:17,575 --> 00:10:20,862 up to two weeks before the CDC. 216 00:10:20,862 --> 00:10:23,034 - The CDC system takes about a week or two 217 00:10:23,034 --> 00:10:25,801 for the numbers to sort of fully flow in. 218 00:10:25,801 --> 00:10:28,055 What Google could do is to say based on our model, 219 00:10:28,055 --> 00:10:29,764 we'll have it on the spot. 220 00:10:29,764 --> 00:10:32,390 We'll just run the algorithm 221 00:10:32,390 --> 00:10:34,738 based on how people are searching right now. 222 00:10:34,738 --> 00:10:35,946 - And now we have, for the first time, 223 00:10:35,946 --> 00:10:38,829 this real-time feedback loop where we can see 224 00:10:38,829 --> 00:10:41,868 in real time what's going on and respond to it. 225 00:10:42,500 --> 00:10:44,206 - Now there is a flip side to this though 226 00:10:44,206 --> 00:10:46,678 and that is there was a big story this year that 227 00:10:46,678 --> 00:10:49,711 there was a lot of media attention about 228 00:10:49,711 --> 00:10:52,059 what an intense flu season this was. 229 00:10:52,059 --> 00:10:52,965 And so what did that do? 230 00:10:52,965 --> 00:10:55,022 That drove up search. 231 00:10:55,022 --> 00:10:56,567 That drove people who were more interested 232 00:10:56,567 --> 00:10:57,986 in what's going on with this flu 233 00:10:57,986 --> 00:11:01,414 or might have made more people think I must have it 234 00:11:01,414 --> 00:11:04,919 and so they were off, they got it way wrong. 235 00:11:05,923 --> 00:11:08,340 - So you know, one way to think about big data 236 00:11:08,340 --> 00:11:10,723 and all of the computational tools 237 00:11:10,723 --> 00:11:13,059 that we wrap around that big data 238 00:11:13,059 --> 00:11:16,220 to let us discover patterns that are in the data 239 00:11:16,220 --> 00:11:20,698 is when we point all that machinery at ourselves. 240 00:11:22,911 --> 00:11:25,038 - [Voiceover] At MTI, Deb Roy and his colleagues 241 00:11:25,038 --> 00:11:26,712 wanted to see if they could understand 242 00:11:26,712 --> 00:11:28,830 how children acquire language. 243 00:11:30,265 --> 00:11:32,217 - And we realize that no one really knew 244 00:11:32,217 --> 00:11:34,806 for a simple reason, there was no data. 245 00:11:34,806 --> 00:11:36,386 - [Voiceover] After he and his wife Rupal 246 00:11:36,386 --> 00:11:38,896 brought their newborn son home from the hospital, 247 00:11:38,896 --> 00:11:41,360 they did what every normal parent would do, 248 00:11:41,360 --> 00:11:43,918 mount a camera in the ceiling of each room in their home 249 00:11:43,918 --> 00:11:47,281 and record every moment of their lives for two years, 250 00:11:47,281 --> 00:11:50,470 a mere 200 gigabytes of data recorded every day. 251 00:11:53,509 --> 00:11:55,682 - [Deb] We ended up transcribing somewhere between 252 00:11:55,682 --> 00:11:57,984 eight and nine million words of speech. 253 00:11:57,984 --> 00:12:00,238 - [Voiceover] Ga ga ga. 254 00:12:00,238 --> 00:12:03,733 - And as soon as we had that, we could go and identify 255 00:12:03,733 --> 00:12:08,504 the exact moment where my son first said a new word. 256 00:12:10,380 --> 00:12:12,676 - [Deb] We started calling them births. 257 00:12:15,505 --> 00:12:17,551 - We took this idea of a word birth and we started 258 00:12:17,551 --> 00:12:20,817 thinking about why don't we trace back in time 259 00:12:20,817 --> 00:12:23,773 and look at the gestation period for that word. 260 00:12:25,532 --> 00:12:28,030 One example of this was water. 261 00:12:28,030 --> 00:12:32,679 So we looked at every time my son heard the word water, 262 00:12:32,679 --> 00:12:36,062 what was happening, where in the house were they, 263 00:12:36,062 --> 00:12:37,723 how were they moving about 264 00:12:37,723 --> 00:12:40,931 and using that visual information 265 00:12:40,931 --> 00:12:43,360 to capture something about the context 266 00:12:43,360 --> 00:12:46,161 within which the words are used. 267 00:12:46,161 --> 00:12:47,835 We call them wordscapes. 268 00:12:47,835 --> 00:12:49,252 Then we could ask the question 269 00:12:49,252 --> 00:12:52,425 how does the wordscape associated with a word 270 00:12:52,425 --> 00:12:56,466 predict when my son will actually start using that word? 271 00:12:56,466 --> 00:12:58,686 - [Voiceover] What they learned from watching Deb's son 272 00:12:58,686 --> 00:13:02,817 was that the texture of the wordscapes had predictive power. 273 00:13:02,817 --> 00:13:04,985 If most of the previous research had indicated 274 00:13:04,985 --> 00:13:08,378 that the way language was learned was through repetition, 275 00:13:08,378 --> 00:13:10,377 then this analysis of the data showed that it wasn't 276 00:13:10,377 --> 00:13:14,758 actually repetition that generated learning, but context. 277 00:13:14,758 --> 00:13:16,851 Words with more distinct wordscapes, 278 00:13:16,851 --> 00:13:19,849 that is words heard in many varied locations, 279 00:13:19,849 --> 00:13:21,728 would be learned first. 280 00:13:21,728 --> 00:13:23,646 - Not only is that true, 281 00:13:23,646 --> 00:13:26,575 but the wordscapes are far more predictive 282 00:13:26,575 --> 00:13:28,109 of when a word will be learned 283 00:13:28,109 --> 00:13:31,003 than the frequency, the number of times it's actually heard. 284 00:13:31,003 --> 00:13:33,292 It's like we're building a new kind of instrument, 285 00:13:33,292 --> 00:13:35,256 like we're building a microscope 286 00:13:35,256 --> 00:13:38,684 and we're able to examine something that is around us, 287 00:13:38,684 --> 00:13:42,067 but it has a structure and patterns and beauty 288 00:13:42,067 --> 00:13:45,402 that are invisible without the right instruments 289 00:13:45,402 --> 00:13:48,783 and all of this data is opening up 290 00:13:48,783 --> 00:13:52,963 to our ability to perceive things around us. 291 00:13:53,548 --> 00:13:55,833 (giggling) 292 00:13:55,833 --> 00:13:57,253 - He's walking. 293 00:13:57,253 --> 00:14:02,258 (beeping) 294 00:14:03,935 --> 00:14:05,608 - A lot of people don't realize 295 00:14:05,608 --> 00:14:07,991 that when a baby is born premature, 296 00:14:07,991 --> 00:14:11,076 it can develop infection in the hospital 297 00:14:11,076 --> 00:14:13,081 and it can kill them. 298 00:14:15,585 --> 00:14:19,847 In our research, we started to just look at infection. 299 00:14:19,847 --> 00:14:22,928 By the time the baby is physically showing signs 300 00:14:22,928 --> 00:14:27,025 of having infection, they are very, very unwell. 301 00:14:27,611 --> 00:14:30,748 So the very first time that I went into a neonatal 302 00:14:30,748 --> 00:14:32,922 intensive care unit, I was amazed 303 00:14:32,922 --> 00:14:35,711 by the sights, the sound, the smell, 304 00:14:35,711 --> 00:14:37,547 just the whole environment, 305 00:14:37,547 --> 00:14:40,432 but mainly for me, the data. 306 00:14:41,472 --> 00:14:45,319 What shocked me was the amount of data lost. 307 00:14:45,319 --> 00:14:46,981 They showed me the paper chart 308 00:14:46,981 --> 00:14:49,654 that the information's recorded onto. 309 00:14:49,654 --> 00:14:52,779 One number every hour for the baby's heart rate, 310 00:14:52,779 --> 00:14:55,533 the respiration, the blood oxygen. 311 00:14:55,533 --> 00:14:58,846 Now in that time, the baby's heart has beaten 312 00:14:58,846 --> 00:15:00,962 more than 7,000 times, 313 00:15:00,962 --> 00:15:03,437 they breathe more than 2,000 times, 314 00:15:03,437 --> 00:15:06,610 and the monitor showing the blood oxygen level 315 00:15:06,610 --> 00:15:10,233 has showed that more than three and a half thousand times. 316 00:15:10,233 --> 00:15:11,749 I said, "Well, where's all the data going 317 00:15:11,749 --> 00:15:13,110 "that's in those machines?" 318 00:15:13,110 --> 00:15:16,038 And they said, "Oh it scrolls out of the memory." 319 00:15:16,038 --> 00:15:21,043 So we have an enormous amount of data lost. 320 00:15:21,384 --> 00:15:23,429 So we're trying to gather that information 321 00:15:23,429 --> 00:15:25,649 and use it over a longer time 322 00:15:25,649 --> 00:15:28,148 in much more complex ways than before 323 00:15:28,148 --> 00:15:31,622 and we try and write computing code 324 00:15:31,622 --> 00:15:34,203 to look at the trends in the monitors 325 00:15:34,203 --> 00:15:35,585 and the trends in the data 326 00:15:35,585 --> 00:15:39,630 to see how that can tell us when a baby's becoming unwell. 327 00:15:39,630 --> 00:15:42,640 - [Voiceover] So Dr. McGregor did what data scientists do, 328 00:15:42,640 --> 00:15:44,511 she looked for the invisible. 329 00:15:44,511 --> 00:15:46,220 She and her team analyzed the data 330 00:15:46,220 --> 00:15:49,113 from thousands of heart beats and what they discovered 331 00:15:49,113 --> 00:15:51,194 were minute fluctuations that could predict 332 00:15:51,194 --> 00:15:53,607 the onset of life-threatening infections 333 00:15:53,607 --> 00:15:56,420 long before physical symptoms appeared. 334 00:15:56,420 --> 00:16:00,046 - When the body first starts dealing with infection, 335 00:16:00,046 --> 00:16:01,800 there are these subtle changes 336 00:16:01,800 --> 00:16:04,938 and that's why we have to watch every single heart beat. 337 00:16:04,938 --> 00:16:07,228 And what we're finding is that when you're starting 338 00:16:07,228 --> 00:16:10,401 to become unwell, the heart's ability to react, 339 00:16:10,401 --> 00:16:14,329 to speed up and slow down, gets subdued. 340 00:16:16,542 --> 00:16:19,425 The human body has always been 341 00:16:19,425 --> 00:16:21,830 exhibiting these certain things. 342 00:16:21,830 --> 00:16:25,842 The difference is we've started to gather 343 00:16:25,842 --> 00:16:28,315 more information about the body now 344 00:16:28,315 --> 00:16:32,360 so that we can build this virtual person. 345 00:16:32,360 --> 00:16:35,915 The better we have the virtual representation, 346 00:16:35,915 --> 00:16:38,368 then the better we can start to understand 347 00:16:38,368 --> 00:16:41,053 what will happen to them in the future. 348 00:16:41,053 --> 00:16:44,968 Back in 1999 I was pregnant with my first child. 349 00:16:44,968 --> 00:16:48,560 She was born premature and she passed away. 350 00:16:49,217 --> 00:16:52,526 There was no other viable outcome for her. 351 00:16:52,526 --> 00:16:56,988 But there are so many others who have just been born early 352 00:16:56,988 --> 00:17:01,626 and they just need that opportunity to grow and develop. 353 00:17:02,585 --> 00:17:06,420 We want to let the computers monitor a baby 354 00:17:06,420 --> 00:17:10,302 as it breathes, as its heart beats, as it sleeps, 355 00:17:10,302 --> 00:17:15,160 so that these algorithms are watching for certain behaviors 356 00:17:15,160 --> 00:17:19,169 and if something starts to go wrong for that baby, 357 00:17:19,169 --> 00:17:22,968 we have the ability to intervene. 358 00:17:25,055 --> 00:17:27,727 If we can just save one life, 359 00:17:27,727 --> 00:17:31,987 then for me personally, it's already worthwhile. 360 00:17:34,654 --> 00:17:38,791 - Everybody understands what it takes to digitize 361 00:17:38,791 --> 00:17:43,626 photography, a movie, a magazine, newspaper, 362 00:17:43,626 --> 00:17:46,508 but they haven't yet grasped what it means 363 00:17:46,508 --> 00:17:50,686 to digitize the medical essence of a human being. 364 00:17:52,273 --> 00:17:56,201 Everything about us now that's medically relevant 365 00:17:56,201 --> 00:17:57,955 can be captured. 366 00:17:57,955 --> 00:18:01,361 With sensors, we can digitize all of our metrics 367 00:18:01,361 --> 00:18:04,510 and with imaging, we can digitize our anatomy 368 00:18:04,510 --> 00:18:06,091 and with our sequence of our DNA, 369 00:18:06,091 --> 00:18:08,851 we can digitize our biology. 370 00:18:10,182 --> 00:18:12,552 - The data story in the genome is the fact that 371 00:18:12,552 --> 00:18:15,655 we have six billion data points sitting in our genomes 372 00:18:15,655 --> 00:18:18,738 that we've never had access to before. 373 00:18:20,696 --> 00:18:22,253 When you sequence a person's genome, 374 00:18:22,253 --> 00:18:24,624 there are known differences in the human genome 375 00:18:24,624 --> 00:18:27,216 that can predict a risk for a disease, 376 00:18:27,216 --> 00:18:29,134 or that you're a carrier for a disease, 377 00:18:29,134 --> 00:18:31,144 or that you have a certain ancestry. 378 00:18:31,144 --> 00:18:33,352 There's a lot of information packed in the genome 379 00:18:33,352 --> 00:18:36,112 that we're starting to learn more and more about. 380 00:18:38,326 --> 00:18:41,499 Getting your own personal information through your genome 381 00:18:41,499 --> 00:18:43,544 would not have been possible 382 00:18:43,544 --> 00:18:45,952 even 10 years ago because of cost. 383 00:18:45,952 --> 00:18:47,923 The technologies that have enabled this 384 00:18:47,923 --> 00:18:50,178 have dropped precipitously and now we're able to 385 00:18:50,178 --> 00:18:55,012 get a really good look at your genome for under $500. 386 00:18:55,012 --> 00:18:58,522 - And when it becomes 100 bucks or 10 bucks, 387 00:18:58,522 --> 00:19:02,166 we're going to have everyone's genome as data. 388 00:19:04,658 --> 00:19:06,332 - The results came back on Tuesday, 389 00:19:06,332 --> 00:19:08,714 it was October 2nd, 1996. 390 00:19:08,714 --> 00:19:11,643 I was diagnosed that day with breast cancer. 391 00:19:11,643 --> 00:19:14,106 A year out of treatment, I found a lump on the other breast 392 00:19:14,106 --> 00:19:17,151 in the exact same position and I went in 393 00:19:17,151 --> 00:19:20,238 and they told me that I had breast cancer again. 394 00:19:21,370 --> 00:19:23,996 Sedona's known about me being tested for the BRCA gene, 395 00:19:23,996 --> 00:19:25,531 she's known my sister has tested, 396 00:19:25,531 --> 00:19:26,925 she knows my other sister tested 397 00:19:26,925 --> 00:19:29,180 and was negative for the gene mutation 398 00:19:29,180 --> 00:19:32,002 and so she actually told me, "When I'm 18, I want to test, 399 00:19:32,002 --> 00:19:34,639 "you know, and see if I have this gene mutation or not." 400 00:19:34,639 --> 00:19:39,185 I am gonna be completely distraught 401 00:19:39,185 --> 00:19:42,368 if I hand this gene down to my kid. 402 00:19:42,368 --> 00:19:44,785 - Do you know what your chances are of having the mutation 403 00:19:44,785 --> 00:19:45,947 that your mom has? 404 00:19:45,947 --> 00:19:47,085 - I'd say 50/50. 405 00:19:47,085 --> 00:19:48,841 - You're exactly right. 406 00:19:48,841 --> 00:19:51,130 BRCA2 is a gene that we all have, 407 00:19:51,130 --> 00:19:52,978 it's called tumor suppressor gene, 408 00:19:52,978 --> 00:19:55,128 but women, if you have a mutation in the gene 409 00:19:55,128 --> 00:19:57,766 it causes the gene not to function like it should. 410 00:19:57,766 --> 00:20:01,392 So the risk mainly of breast and ovarian cancer 411 00:20:01,392 --> 00:20:04,076 is a lot higher than in the general population. 412 00:20:04,076 --> 00:20:06,575 - An average woman would have a 12% risk 413 00:20:06,575 --> 00:20:08,214 of getting breast cancer in a lifetime 414 00:20:08,214 --> 00:20:09,551 and most women aren't going out there, 415 00:20:09,551 --> 00:20:11,387 getting preventive mastectomies, 416 00:20:11,387 --> 00:20:13,600 but when you're faced with an 87% risk 417 00:20:13,600 --> 00:20:16,308 of getting breast cancer in your lifetime, 418 00:20:16,308 --> 00:20:20,486 it kind of makes that a possible choice. 419 00:20:23,316 --> 00:20:26,070 - [Voiceover] You'll need to swish this mouth wash 420 00:20:26,070 --> 00:20:28,040 for 30 seconds. 421 00:20:28,708 --> 00:20:30,510 - We are definitely moving into a world 422 00:20:30,510 --> 00:20:33,334 where the patient or the person is at the center of things 423 00:20:33,334 --> 00:20:36,350 and hopefully also at the controls. 424 00:20:36,971 --> 00:20:38,691 People will have access to the data 425 00:20:38,691 --> 00:20:43,235 that is informative around the type of disease they have 426 00:20:43,235 --> 00:20:45,827 and that data then can point much more directly 427 00:20:45,827 --> 00:20:47,953 to proper treatments, 428 00:20:47,953 --> 00:20:49,627 but the data can also say that a treatment 429 00:20:49,627 --> 00:20:51,963 works for a person or it doesn't work for a person 430 00:20:51,963 --> 00:20:53,671 based on their genetic profile 431 00:20:53,671 --> 00:20:55,345 and we're gonna start moving more and more 432 00:20:55,345 --> 00:20:57,518 into this notion of personalized medicine 433 00:20:57,518 --> 00:20:59,063 as we learn more about the genome 434 00:20:59,063 --> 00:21:01,236 and the study of pharmacogenetics, 435 00:21:01,236 --> 00:21:05,037 which is how do our genes influence the drugs we take. 436 00:21:05,037 --> 00:21:07,408 Ultimately, instead of treating disease, 437 00:21:07,408 --> 00:21:09,116 is there data that could really help us 438 00:21:09,116 --> 00:21:12,673 move away from contracting these illnesses to begin with 439 00:21:12,673 --> 00:21:15,723 and go more toward a preventive model? 440 00:21:15,723 --> 00:21:20,228 (mellow music) 441 00:21:20,228 --> 00:21:25,202 - Now you can't talk about information separate from health. 442 00:21:25,202 --> 00:21:26,484 How you feel is information, 443 00:21:26,484 --> 00:21:28,200 how you respond to a drug is information, 444 00:21:28,200 --> 00:21:30,037 your genetic code is information. 445 00:21:30,037 --> 00:21:32,001 What's really happening is when we start collecting it, 446 00:21:32,001 --> 00:21:32,919 we're going to start seeing it 447 00:21:32,919 --> 00:21:34,964 and we're going to start interpreting it. 448 00:21:35,807 --> 00:21:38,096 We're beginning the age of collecting information 449 00:21:38,096 --> 00:21:40,398 from sensors that are cheap and ubiquitous 450 00:21:40,398 --> 00:21:42,513 that we can process continuously 451 00:21:42,513 --> 00:21:45,159 and we can actually start knowing things. 452 00:21:45,159 --> 00:21:47,239 - If we monitored our health throughout the day, 453 00:21:47,239 --> 00:21:50,737 continuously every second, what would that really enable? 454 00:21:50,737 --> 00:21:53,422 - And there's now a lot of really great technology 455 00:21:53,422 --> 00:21:57,129 coming out around this sense of tracking and monitoring 456 00:21:57,129 --> 00:22:00,058 and we have all kinds of sensor companies and devices. 457 00:22:00,058 --> 00:22:01,859 - We're actually collecting a lot of physiological 458 00:22:01,859 --> 00:22:04,265 information, you know, heart rate, breathing, 459 00:22:04,265 --> 00:22:07,324 in real-time, you know, every minute, every second. 460 00:22:08,992 --> 00:22:11,491 - [Linda] People wanting to measure their daily activities 461 00:22:11,491 --> 00:22:13,571 and being able to track your own sleep, 462 00:22:13,571 --> 00:22:16,581 being able to watch and monitor your own food uptake, 463 00:22:16,581 --> 00:22:18,717 being able to track your own movement. 464 00:22:18,717 --> 00:22:20,169 - It's almost like looking down at our lives 465 00:22:20,169 --> 00:22:21,518 from 30,000 feet. 466 00:22:21,518 --> 00:22:23,272 There's a company right now in Boston 467 00:22:23,272 --> 00:22:25,527 that can actually predict that you're going to get depressed 468 00:22:25,527 --> 00:22:27,562 two days before you get depressed 469 00:22:27,562 --> 00:22:29,027 and the gentleman who created it said 470 00:22:29,027 --> 00:22:31,000 if you actually watch any one of us, 471 00:22:31,000 --> 00:22:34,208 most people have a very discernible pattern of behavior. 472 00:22:34,208 --> 00:22:37,253 And for the first week, our software basically determines 473 00:22:37,253 --> 00:22:39,008 what your normal pattern is 474 00:22:39,008 --> 00:22:40,554 and then two days before you're showing 475 00:22:40,554 --> 00:22:42,727 any outward signs of depression, 476 00:22:42,727 --> 00:22:44,610 the amount of Tweets and emails that you're sending 477 00:22:44,610 --> 00:22:47,154 go down, your radius of travel starts shrinking, 478 00:22:47,154 --> 00:22:49,153 the amount of time that you spend at home goes up. 479 00:22:49,153 --> 00:22:52,151 - You can look to see if how you exercise 480 00:22:52,151 --> 00:22:54,081 changes your social behavior, 481 00:22:54,081 --> 00:22:56,173 if what you eat changes how you sleep 482 00:22:56,173 --> 00:23:00,008 and how that impacts your medical claims. 483 00:23:00,008 --> 00:23:01,972 - All kinds of data and information 484 00:23:01,972 --> 00:23:05,063 are sitting inside the world you do every day. 485 00:23:05,063 --> 00:23:06,528 - Now, with all these devices, 486 00:23:06,528 --> 00:23:10,270 we have real-time information, real-time understanding. 487 00:23:10,270 --> 00:23:11,327 - Now that might sound interesting, 488 00:23:11,327 --> 00:23:13,617 might help you shed a few pounds, 489 00:23:13,617 --> 00:23:15,255 realize you're eating too many potato chips 490 00:23:15,255 --> 00:23:16,755 and sitting around too much perhaps 491 00:23:16,755 --> 00:23:19,009 and that's useful to you individually, 492 00:23:19,009 --> 00:23:23,146 but if hundreds of millions of people do that, 493 00:23:23,146 --> 00:23:26,145 you have a big cloud of data 494 00:23:26,145 --> 00:23:29,236 about people's behavior that can be crawled through 495 00:23:29,236 --> 00:23:31,956 by pattern recognition algorithm. 496 00:23:33,204 --> 00:23:35,622 And doctors and health policy officials 497 00:23:35,622 --> 00:23:38,213 can start to see patterns that change the way, 498 00:23:38,213 --> 00:23:40,677 collectively as a society, we understand 499 00:23:40,677 --> 00:23:44,129 not just our health, but every single area 500 00:23:44,129 --> 00:23:46,723 where data can be applied 501 00:23:46,723 --> 00:23:49,323 because we start to understand how we might, 502 00:23:49,323 --> 00:23:53,107 collectively as a culture, change our behavior. 503 00:23:56,657 --> 00:23:58,586 - And if you look at the future of this, 504 00:23:58,586 --> 00:24:02,642 we're gonna be embedded in a sea of information services 505 00:24:02,642 --> 00:24:07,360 that are connected to massive databases in the cloud. 506 00:24:07,360 --> 00:24:11,111 (rhythmic electronic music) 507 00:24:11,111 --> 00:24:12,611 - If you take a look at everything that you touch 508 00:24:12,611 --> 00:24:15,039 in everyday life, the majority of these things 509 00:24:15,039 --> 00:24:18,246 were invented many, many, many, many, many years ago 510 00:24:18,246 --> 00:24:20,385 and they're ripe for reinvention 511 00:24:20,385 --> 00:24:22,594 and when they get reinvented, 512 00:24:22,594 --> 00:24:23,848 they're gonna be connected, 513 00:24:23,848 --> 00:24:26,184 they're gonna be connected in some way 514 00:24:26,184 --> 00:24:30,031 that data that comes off of these devices that you touch 515 00:24:30,031 --> 00:24:32,855 is gonna be collected and stored in a central location 516 00:24:32,855 --> 00:24:36,457 and people are gonna run big data algorithms on this data 517 00:24:36,457 --> 00:24:37,911 and then you're gonna get the feedback 518 00:24:37,911 --> 00:24:41,043 of the collective whole rather than the individual. 519 00:24:42,931 --> 00:24:44,383 - So it's taking people who are already out there, 520 00:24:44,383 --> 00:24:45,895 who already have these devices, 521 00:24:45,895 --> 00:24:48,358 and turning all these people into contributors 522 00:24:48,358 --> 00:24:51,281 of information back to the system. 523 00:24:52,949 --> 00:24:56,487 You become one of the nodes on the network. 524 00:24:57,586 --> 00:24:59,539 I think the Internet, as wondrous as it's been 525 00:24:59,539 --> 00:25:01,666 over the last 20 years, was like a layer 526 00:25:01,666 --> 00:25:03,920 that needed to be in place for all these sensors 527 00:25:03,920 --> 00:25:06,764 and devices to be able to communicate with each other. 528 00:25:06,764 --> 00:25:09,181 - You know, we're building this global brain 529 00:25:09,181 --> 00:25:12,854 that has these new functions and we're accessing them 530 00:25:12,854 --> 00:25:14,899 primarily now through our mobile devices, 531 00:25:14,899 --> 00:25:17,409 or obviously also on our desktops, 532 00:25:17,409 --> 00:25:19,117 but increasingly mobile. 533 00:25:19,117 --> 00:25:22,535 - I think this data revolution has a strange impact really 534 00:25:22,535 --> 00:25:26,009 of people feeling like there's somebody listening to them 535 00:25:26,009 --> 00:25:29,728 and that could mean listening in the sense of Big Brother, 536 00:25:29,728 --> 00:25:31,355 someone's listening in, 537 00:25:31,355 --> 00:25:34,492 or it could be someone's really hearing me. 538 00:25:34,492 --> 00:25:37,189 This device in my hand knows who I am, 539 00:25:37,189 --> 00:25:40,746 it can somewhat anticipate what I want 540 00:25:40,746 --> 00:25:44,162 or where I'm going and react to that. 541 00:25:45,748 --> 00:25:48,886 The implications of that are huge 542 00:25:48,886 --> 00:25:50,350 for the decisions that we make 543 00:25:50,350 --> 00:25:52,902 and for the systems that we're part of. 544 00:25:55,486 --> 00:25:57,742 I think about living in a city 545 00:25:57,742 --> 00:26:00,252 and how you're experience of living in that city 546 00:26:00,252 --> 00:26:02,413 would be, in 10 or 15 years. 547 00:26:02,413 --> 00:26:03,782 You've got places like Chicago 548 00:26:03,782 --> 00:26:05,258 where they're being hugely innovative 549 00:26:05,258 --> 00:26:07,431 and they're taking massive data sets, 550 00:26:07,431 --> 00:26:09,128 combining them in interesting ways, 551 00:26:09,128 --> 00:26:10,895 running interesting algorithms on them 552 00:26:10,895 --> 00:26:13,521 and figuring out ways that they can intervene 553 00:26:13,521 --> 00:26:15,695 in this system to sort of see patterns 554 00:26:15,695 --> 00:26:18,362 and be able to react to those patterns. 555 00:26:19,030 --> 00:26:23,295 When you take in data, it affects you as an individual 556 00:26:23,295 --> 00:26:24,783 and then you affect the system 557 00:26:24,783 --> 00:26:26,421 and that affects the data again 558 00:26:26,421 --> 00:26:29,722 and this round trip that you start to see yourself part of 559 00:26:29,722 --> 00:26:33,309 makes me understand that I'm an actor in a larger system. 560 00:26:33,309 --> 00:26:35,564 For instance, if you know by looking at the data, 561 00:26:35,564 --> 00:26:37,703 and you have to put different data sets together 562 00:26:37,703 --> 00:26:40,480 to be able to see this, that some of the street lights, 563 00:26:40,480 --> 00:26:43,165 you know, when they go out, they cause higher crime 564 00:26:43,165 --> 00:26:44,966 in that particular block, 565 00:26:44,966 --> 00:26:46,419 (siren blares) 566 00:26:46,419 --> 00:26:49,173 you start to see ways that if you can query that data 567 00:26:49,173 --> 00:26:51,765 in intelligent ways, that you can prioritize 568 00:26:51,765 --> 00:26:54,275 the limited resources that you have in a city 569 00:26:54,275 --> 00:26:56,774 to take care of the things that have, you know, 570 00:26:56,774 --> 00:26:59,903 follow along effects and follow along costs. 571 00:27:00,408 --> 00:27:02,202 - In the end, you know, you're going to hope that 572 00:27:02,202 --> 00:27:05,421 this is just our reaction as a species 573 00:27:05,421 --> 00:27:07,386 to this scale problem, right, 574 00:27:07,386 --> 00:27:08,932 how do you get another, you know, 575 00:27:08,932 --> 00:27:11,473 two billion people on the planet? 576 00:27:11,473 --> 00:27:13,600 You can't do it unless you start instrumenting 577 00:27:13,600 --> 00:27:16,111 every little thing and dialing it in just right. 578 00:27:16,111 --> 00:27:18,284 - And you know, right now you wait for the bus 579 00:27:18,284 --> 00:27:20,945 because the bus is coming on a particular schedule 580 00:27:20,945 --> 00:27:23,118 and it's great, we're now at the point where 581 00:27:23,118 --> 00:27:26,128 your phone will tell you when the bus is really coming, 582 00:27:26,128 --> 00:27:28,720 not just when the bus is scheduled to come. 583 00:27:29,631 --> 00:27:31,724 You know, take that a little bit forward. 584 00:27:31,724 --> 00:27:33,060 What about when there's more use 585 00:27:33,060 --> 00:27:34,896 on one line than the other? 586 00:27:34,896 --> 00:27:36,650 Well instead of sticking with the schedule, 587 00:27:36,650 --> 00:27:40,033 does the system start to understand 588 00:27:40,033 --> 00:27:44,576 that maybe this route doesn't need 10 buses today 589 00:27:44,576 --> 00:27:46,751 and automatically shift those resources 590 00:27:46,751 --> 00:27:50,210 over to the lines where the buses are full. 591 00:27:50,210 --> 00:27:53,557 - Boston just created a new smartphone app 592 00:27:53,557 --> 00:27:57,229 which uses the accelerometer in your phone. 593 00:27:57,229 --> 00:27:59,646 So if you're driving through the streets of south Boston 594 00:27:59,646 --> 00:28:03,122 and all of a sudden there's a big dip in the street, 595 00:28:03,122 --> 00:28:05,830 the phone realizes it. 596 00:28:05,830 --> 00:28:07,584 So anybody in the city of Boston 597 00:28:07,584 --> 00:28:09,339 that has this up and running 598 00:28:09,339 --> 00:28:12,012 is feeding real-time data on the quality of the roads 599 00:28:12,012 --> 00:28:13,639 to the city of Boston. 600 00:28:13,639 --> 00:28:15,394 - Then you start to feel that your city 601 00:28:15,394 --> 00:28:17,311 is sort of a responsive organism 602 00:28:17,311 --> 00:28:21,398 just like your body puts your blood where it needs it. 603 00:28:22,657 --> 00:28:26,040 Think about ways that we could live in cities 604 00:28:26,040 --> 00:28:29,050 when they're that responsive to our needs 605 00:28:29,050 --> 00:28:31,304 and think about the implications of that for the planet 606 00:28:31,304 --> 00:28:33,640 because really cities are also really 607 00:28:33,640 --> 00:28:36,973 how we're going to survive the 21st century. 608 00:28:36,973 --> 00:28:39,645 You can live in a city with a far smaller footprint 609 00:28:39,645 --> 00:28:42,015 than anywhere else in the world 610 00:28:42,015 --> 00:28:45,619 and I think data and sort of the responsive systems 611 00:28:45,619 --> 00:28:48,414 will play an enormous role in that. 612 00:28:51,093 --> 00:28:53,010 - I think one of the most exciting things about data 613 00:28:53,010 --> 00:28:56,765 is that, you know, it's giving us extra senses, 614 00:28:56,765 --> 00:28:58,391 it's expanding upon, you know, 615 00:28:58,391 --> 00:29:01,529 our ability to perceive the world 616 00:29:01,529 --> 00:29:03,993 and it actually ends up giving us the opportunity 617 00:29:03,993 --> 00:29:06,201 to make things tangible again 618 00:29:06,201 --> 00:29:08,084 and to actually get a perspective on ourselves, 619 00:29:08,084 --> 00:29:11,506 both as individuals and also as society. 620 00:29:13,510 --> 00:29:16,847 - And there's always that moment in data visualization 621 00:29:16,847 --> 00:29:18,438 when you're looking at, you know, 622 00:29:18,438 --> 00:29:20,193 tons and tons and tons of data. 623 00:29:20,193 --> 00:29:22,610 The point is not to look at the tons and tons 624 00:29:22,610 --> 00:29:25,283 and tons of data, but what are the stories 625 00:29:25,283 --> 00:29:27,451 that emerge out of it. 626 00:29:28,956 --> 00:29:31,002 - If you said look, give me the home street address 627 00:29:31,002 --> 00:29:35,429 of everyone who entered New York State prison last year 628 00:29:35,429 --> 00:29:37,300 and the home street address of everyone 629 00:29:37,300 --> 00:29:39,438 who left New York State prison last year 630 00:29:39,438 --> 00:29:42,146 and we said look, let's get the numbers, put it on a map 631 00:29:42,146 --> 00:29:44,154 and actually show it to people. 632 00:29:44,154 --> 00:29:47,251 And when we first produced our Brooklyn map, 633 00:29:47,251 --> 00:29:49,082 which was the first one we did, 634 00:29:49,082 --> 00:29:51,673 they hit the floor, not because nobody knew this. 635 00:29:51,673 --> 00:29:53,126 You know, everyone knew anecdotally 636 00:29:53,126 --> 00:29:57,472 how concentrated the effect of incarceration was, 637 00:29:57,472 --> 00:30:00,308 but no one had actually seen it based on actual data. 638 00:30:00,308 --> 00:30:04,318 We started to show these remarkably intensive 639 00:30:04,318 --> 00:30:06,986 concentrations of people going in and out of prison, 640 00:30:06,986 --> 00:30:09,125 highly disproportionately located 641 00:30:09,125 --> 00:30:12,443 in very small areas around the city. 642 00:30:16,214 --> 00:30:19,003 - [Voiceover] And what we found is that the home addresses 643 00:30:19,003 --> 00:30:22,268 of incarcerated people correlates very highly 644 00:30:22,268 --> 00:30:25,819 with poverty and with people of color. 645 00:30:28,940 --> 00:30:31,415 - You have a justice system, which by all accounts 646 00:30:31,415 --> 00:30:32,822 is supposed to be essentially based on 647 00:30:32,822 --> 00:30:37,179 a case-by-case, individual decision of justice. 648 00:30:37,179 --> 00:30:39,015 Well when you looked at the map over time, 649 00:30:39,015 --> 00:30:43,315 what you really were seeing was this mass population 650 00:30:43,315 --> 00:30:48,162 movement out and mass population resettlement back, 651 00:30:48,162 --> 00:30:50,564 this cyclical movement of people. 652 00:30:51,276 --> 00:30:52,952 - So once we had mapped the data, 653 00:30:52,952 --> 00:30:55,334 we quantified it in terms of how much it cost 654 00:30:55,334 --> 00:30:58,132 to house those same people in prison. 655 00:30:58,132 --> 00:30:59,050 - And that's where we started to think 656 00:30:59,050 --> 00:31:01,560 about million dollar blocks. 657 00:31:01,560 --> 00:31:06,395 We found over 35 individual city blocks in Brooklyn alone 658 00:31:06,395 --> 00:31:08,654 for which the state was spending 659 00:31:08,654 --> 00:31:11,072 more than a million dollars every year 660 00:31:11,072 --> 00:31:14,042 to remove and return people to prison. 661 00:31:16,663 --> 00:31:18,882 We needed to reframe that conversation 662 00:31:18,882 --> 00:31:21,682 and what immediately emerged out of this was 663 00:31:21,682 --> 00:31:23,937 this idea of justice reinvestment. 664 00:31:23,937 --> 00:31:25,819 We weren't building anything in those places 665 00:31:25,819 --> 00:31:27,889 for those dollars. 666 00:31:27,889 --> 00:31:30,329 How can we demand sort of more equity 667 00:31:30,329 --> 00:31:31,991 for that investment 668 00:31:31,991 --> 00:31:33,873 to extract those neighborhoods 669 00:31:33,873 --> 00:31:37,801 from what decades of criminalization has done? 670 00:31:37,801 --> 00:31:40,788 And that shift had to come from the data 671 00:31:40,788 --> 00:31:43,961 and a new way of thinking about information. 672 00:31:46,314 --> 00:31:48,900 These maps did that. 673 00:31:52,450 --> 00:31:54,612 - The amount of data that now is being collected 674 00:31:54,612 --> 00:31:59,086 about those areas that are stuck in cycles of poverty, 675 00:31:59,086 --> 00:32:02,549 cycles of famine, cycles of war, 676 00:32:02,549 --> 00:32:05,978 gives people or governments and NGOs 677 00:32:05,978 --> 00:32:09,349 an opportunity to do good. 678 00:32:09,349 --> 00:32:12,602 Understanding on the ground, information on the ground, 679 00:32:12,602 --> 00:32:15,124 data on the ground can change the way 680 00:32:15,124 --> 00:32:18,158 people apply resources 681 00:32:18,158 --> 00:32:21,255 which are intended to try to help. 682 00:32:22,596 --> 00:32:24,015 - We really fundamentally believe 683 00:32:24,015 --> 00:32:25,897 that data has intrinsic value 684 00:32:25,897 --> 00:32:27,559 and we also fundamentally believe 685 00:32:27,559 --> 00:32:30,488 that the individuals who create that data 686 00:32:30,488 --> 00:32:33,742 should be able to benefit from that data. 687 00:32:34,863 --> 00:32:36,669 But we're working with one of the big mobile phone 688 00:32:36,669 --> 00:32:39,714 operators in Kenya, we're looking at the dynamics 689 00:32:39,714 --> 00:32:42,550 of these mobile phone subscribers. 690 00:32:42,550 --> 00:32:44,886 Millions of phones in Kenya. 691 00:32:46,101 --> 00:32:47,355 We're looking at how the population 692 00:32:47,355 --> 00:32:49,813 was moving over the country. 693 00:32:50,772 --> 00:32:53,201 And we're overlaying that movement data 694 00:32:53,201 --> 00:32:56,397 with data about parasite prevalence 695 00:32:56,397 --> 00:32:59,622 from household surveys and data from hospitals. 696 00:33:02,545 --> 00:33:05,346 We can start identifying these malaria hot spots, 697 00:33:05,346 --> 00:33:09,016 regions within Kenya that desperately needed 698 00:33:09,016 --> 00:33:11,311 the eradication dollars. 699 00:33:13,769 --> 00:33:15,860 It's fascinating to start extracting models 700 00:33:15,860 --> 00:33:17,418 and plotting graphs of the behavior 701 00:33:17,418 --> 00:33:19,660 of tens of millions of people in Kenya, 702 00:33:19,660 --> 00:33:22,508 but it's meaningful when you can make those insights count, 703 00:33:22,508 --> 00:33:25,007 when you can take the insights that you've gleaned 704 00:33:25,007 --> 00:33:26,807 and put them into practice 705 00:33:26,807 --> 00:33:29,771 and measure what the impact was 706 00:33:29,771 --> 00:33:32,108 and hopefully making the lives of the people 707 00:33:32,108 --> 00:33:34,186 who are generating this data better. 708 00:33:34,186 --> 00:33:37,163 (children yelling) 709 00:33:37,163 --> 00:33:41,544 (siren blaring) 710 00:33:41,544 --> 00:33:45,344 - That afternoon when the earthquake struck in January, 711 00:33:45,344 --> 00:33:48,645 I was watching CNN and saw the breaking news 712 00:33:48,645 --> 00:33:52,317 and I had taken my wife in Port-au-Prince at the time 713 00:33:52,317 --> 00:33:54,154 and for the better part of 12 hours 714 00:33:54,154 --> 00:33:56,362 had no idea whether any one of my friends 715 00:33:56,362 --> 00:33:58,450 were alive or dead. 716 00:33:58,450 --> 00:34:01,495 - [Voiceover] Meier was a Tufts University PhD student 717 00:34:01,495 --> 00:34:04,040 and directed crisis mapping for Ushahidi, 718 00:34:04,040 --> 00:34:06,260 a nonprofit that collects, visualizes, 719 00:34:06,260 --> 00:34:08,375 and then maps crisis data. 720 00:34:08,375 --> 00:34:10,095 - And so I went on social media 721 00:34:10,095 --> 00:34:12,315 and I found dozens and dozens of Haitians 722 00:34:12,315 --> 00:34:15,522 tweeting live about the damage 723 00:34:15,522 --> 00:34:17,486 and a lot of the time they were sharing 724 00:34:17,486 --> 00:34:19,276 where this damage was happening. 725 00:34:19,276 --> 00:34:22,286 So they would say the church on the corner of X and Y 726 00:34:22,286 --> 00:34:25,261 has been destroyed or is collapsed 727 00:34:25,261 --> 00:34:27,423 and they would refer to street names and so on. 728 00:34:27,423 --> 00:34:29,933 So it's about really becoming a digital detector 729 00:34:29,933 --> 00:34:33,637 and then trying to understand where on the map this was. 730 00:34:33,637 --> 00:34:35,148 - [Voiceover] So he called everyone he knew 731 00:34:35,148 --> 00:34:37,937 and put together a mostly volunteer team in Boston 732 00:34:37,937 --> 00:34:40,575 to prioritize the most life and death tweets 733 00:34:40,575 --> 00:34:42,876 and map them for rescue workers. 734 00:34:42,876 --> 00:34:45,967 - For the first time, it wasn't the government 735 00:34:45,967 --> 00:34:47,838 emergency management organization 736 00:34:47,838 --> 00:34:50,011 that had the best data of what was happening, 737 00:34:50,011 --> 00:34:53,301 but it was legions of volunteers that came together 738 00:34:53,301 --> 00:34:55,346 and crowdmapped the location 739 00:34:55,346 --> 00:34:57,101 of buildings that had collapsed, 740 00:34:57,101 --> 00:34:58,948 people that were trapped in rubble, 741 00:34:58,948 --> 00:35:00,738 locations where water was needed, 742 00:35:00,738 --> 00:35:03,867 where physicians were needed and the like. 743 00:35:04,500 --> 00:35:06,673 - I think we've seen, not only in Haiti 744 00:35:06,673 --> 00:35:08,556 but almost every disaster since Haiti, 745 00:35:08,556 --> 00:35:13,065 just an explosion of social media content. 746 00:35:13,065 --> 00:35:14,727 - [Voiceover] Disaster mapping groups like Meier's 747 00:35:14,727 --> 00:35:16,610 realized that there was so much at stake 748 00:35:16,610 --> 00:35:19,015 and so much raw data coming from social media 749 00:35:19,015 --> 00:35:20,619 during natural disasters. 750 00:35:20,619 --> 00:35:22,328 They needed to come up with new algorithms 751 00:35:22,328 --> 00:35:24,536 to sort through the flood of information. 752 00:35:24,536 --> 00:35:28,383 - We are drawing on artificial intelligence, 753 00:35:28,383 --> 00:35:31,090 machine learning, working with data scientists 754 00:35:31,090 --> 00:35:34,182 to develop semi-automated ways 755 00:35:34,182 --> 00:35:38,063 to extract relevant, informative and actionable information 756 00:35:38,063 --> 00:35:40,156 from social media during disasters. 757 00:35:40,156 --> 00:35:41,329 So one of our projects is called 758 00:35:41,329 --> 00:35:44,455 Artificial Intelligence for Disaster Response. 759 00:35:46,332 --> 00:35:47,958 During the Hurricane Sandy, 760 00:35:47,958 --> 00:35:51,971 we collected five million tweets during the first few days. 761 00:35:52,593 --> 00:35:55,858 With the Sandy data, we've been able to show empirically 762 00:35:55,858 --> 00:35:58,485 that we can automatically identify whether or not 763 00:35:58,485 --> 00:36:02,622 a tweet has been written by an eye witness. 764 00:36:02,622 --> 00:36:04,074 So somebody who is writing something 765 00:36:04,074 --> 00:36:06,620 saying the bridge is down, 766 00:36:06,620 --> 00:36:08,840 we can say with a degree of accuracy 767 00:36:08,840 --> 00:36:11,304 of about 80% and higher whether that tweet 768 00:36:11,304 --> 00:36:13,012 has actually been posted by an eye witness, 769 00:36:13,012 --> 00:36:16,214 which is really important for disaster response. 770 00:36:18,230 --> 00:36:20,729 I think that goes to the heart of why 771 00:36:20,729 --> 00:36:23,367 something like social media and Twitter is so important. 772 00:36:23,367 --> 00:36:26,540 Having these millions of eyes and ears on the ground. 773 00:36:26,540 --> 00:36:28,341 It's about empowering the crowd, 774 00:36:28,341 --> 00:36:30,002 it's about empowering those who are effected 775 00:36:30,002 --> 00:36:32,134 and those who want to help. 776 00:36:32,134 --> 00:36:34,040 These are real lives that we're capturing. 777 00:36:34,040 --> 00:36:36,481 This is not abstract information. 778 00:36:36,481 --> 00:36:39,398 These are real people who are affected by disasters 779 00:36:39,398 --> 00:36:41,815 who are trying to either help or seek help. 780 00:36:41,815 --> 00:36:44,321 It doesn't get more real than this. 781 00:36:48,788 --> 00:36:51,170 - Today, technology allows, 782 00:36:51,170 --> 00:36:53,624 in a lot of our communication tools, 783 00:36:53,624 --> 00:36:56,052 allows an idea to be spread instantly 784 00:36:56,052 --> 00:37:00,023 and with the original source of truth. 785 00:37:00,023 --> 00:37:02,894 I can have an idea and I can decide that 786 00:37:02,894 --> 00:37:04,324 I want to bring this around the world 787 00:37:04,324 --> 00:37:07,999 and I can do it almost instantaneously. 788 00:37:09,795 --> 00:37:11,666 - Tunisia's a great example. 789 00:37:11,666 --> 00:37:15,175 There were little uprisings happening all over Tunisia 790 00:37:15,175 --> 00:37:17,431 and each one was brutally squashed 791 00:37:17,431 --> 00:37:19,650 and there was no media attention 792 00:37:19,650 --> 00:37:24,276 so no one knew that any other little village had an issue. 793 00:37:24,276 --> 00:37:27,157 But what happened was in one village 794 00:37:27,157 --> 00:37:30,214 there was the man who self-immolated in protest 795 00:37:30,214 --> 00:37:33,165 and the images were put online 796 00:37:33,165 --> 00:37:37,977 by a distant group onto Facebook 797 00:37:37,977 --> 00:37:39,883 and then Al Jazeera picked it up 798 00:37:39,883 --> 00:37:42,696 and broadcasted the image across their region 799 00:37:42,696 --> 00:37:44,857 and then all of Tunisia realized 800 00:37:44,857 --> 00:37:47,287 wait a second, we're about to have an uprising 801 00:37:47,287 --> 00:37:48,449 and it just went. 802 00:37:48,449 --> 00:37:53,454 (yelling) 803 00:37:55,095 --> 00:37:58,513 So Tunisia was really activists on the ground, 804 00:37:58,513 --> 00:38:02,395 social media and mainstream media working together, 805 00:38:02,395 --> 00:38:05,486 spreading across Tunisia this idea that 806 00:38:05,486 --> 00:38:07,031 you're not the only ones 807 00:38:07,031 --> 00:38:10,745 and it gave everyone the courage to do the uprising. 808 00:38:12,412 --> 00:38:14,713 Technology has fundamentally changed 809 00:38:14,713 --> 00:38:17,167 the way people interact with government. 810 00:38:17,167 --> 00:38:19,432 That's another layer of the stack 811 00:38:19,432 --> 00:38:21,012 that's sort of being opened up. 812 00:38:21,012 --> 00:38:23,104 I think that's one of the key challenges that big data 813 00:38:23,104 --> 00:38:26,067 has so much opportunity for both good 814 00:38:26,067 --> 00:38:28,287 and for also really screwing up our system. 815 00:38:28,287 --> 00:38:30,414 - You can't talk about data without talking about people 816 00:38:30,414 --> 00:38:31,959 because people create the data 817 00:38:31,959 --> 00:38:33,923 and people utilize the data. 818 00:38:33,923 --> 00:38:38,171 (whirring) 819 00:38:44,360 --> 00:38:47,265 - So a handful of years ago there's a guy named Andrew Pole 820 00:38:47,265 --> 00:38:50,077 who is a statistician who gets hired by Target. 821 00:38:50,077 --> 00:38:51,496 He's sitting at his desk and some guys 822 00:38:51,496 --> 00:38:53,076 from the marketing department come by and they say, 823 00:38:53,076 --> 00:38:55,505 "Look, if we wanted to figure out 824 00:38:55,505 --> 00:38:58,003 "which of our customers are pregnant, 825 00:38:58,003 --> 00:39:00,049 "could you tell us that?" 826 00:39:00,049 --> 00:39:01,722 So what Andrew Pole started doing is he said 827 00:39:01,722 --> 00:39:05,569 the women who had signed up for the baby registry, 828 00:39:05,569 --> 00:39:07,477 let's track what they're buying 829 00:39:07,477 --> 00:39:09,602 and see if there's any patterns. 830 00:39:09,602 --> 00:39:11,531 I mean, obviously if someone starts buying a crib 831 00:39:11,531 --> 00:39:13,332 or a stroller, you know they're pregnant. 832 00:39:13,332 --> 00:39:15,622 But by using all of this data they had collected, 833 00:39:15,622 --> 00:39:18,388 they were able to start seeing these patterns 834 00:39:18,388 --> 00:39:21,331 that you couldn't actually guess at. 835 00:39:22,394 --> 00:39:25,526 When women were in their second trimester, 836 00:39:25,526 --> 00:39:28,524 they suddenly stopped buying scented lotion 837 00:39:28,524 --> 00:39:30,697 and started buying unscented lotion 838 00:39:30,697 --> 00:39:32,777 and about at the end of their second trimester, 839 00:39:32,777 --> 00:39:35,078 the beginning of their third trimester, they would start 840 00:39:35,078 --> 00:39:38,887 buying a lot of cotton balls and wash cloths. 841 00:39:38,887 --> 00:39:42,850 - And then they could start to subtly send you coupons 842 00:39:42,850 --> 00:39:45,901 for things that might be related to your pregnancy. 843 00:39:46,720 --> 00:39:48,114 - The decided to do a little test case. 844 00:39:48,114 --> 00:39:50,648 So they send out some of these ads to a local community 845 00:39:50,648 --> 00:39:52,787 and a couple weeks later this father comes in 846 00:39:52,787 --> 00:39:55,704 to one of the stores and he's furious 847 00:39:55,704 --> 00:39:58,923 and he's got a flyer in his hand that was sent to his house 848 00:39:58,923 --> 00:40:02,049 and he finds the manager and he says to the manager, 849 00:40:02,049 --> 00:40:03,932 he says, "Look, I'm so upset. 850 00:40:03,932 --> 00:40:07,313 "You know, my daughter is 18 years old. 851 00:40:07,313 --> 00:40:10,277 "I don't know what you're doing sending her this trash. 852 00:40:10,277 --> 00:40:12,497 "You sent her these coupons for diapers 853 00:40:12,497 --> 00:40:15,077 "and for cribs and for nursing equipment. 854 00:40:15,077 --> 00:40:16,623 "She's 18 years old 855 00:40:16,623 --> 00:40:18,877 "and it's like you're encouraging her to get pregnant." 856 00:40:18,877 --> 00:40:21,004 Now the manager, who has no idea what's going on 857 00:40:21,004 --> 00:40:23,839 with the pregnancy prediction machine 858 00:40:23,839 --> 00:40:25,222 that Andrew Pole built, 859 00:40:25,222 --> 00:40:26,896 says "Look, I'm so sorry. 860 00:40:26,896 --> 00:40:30,231 "I apologize, it's not going to happen again." 861 00:40:30,231 --> 00:40:32,568 And a couple days later the guy feels so bad about this 862 00:40:32,568 --> 00:40:35,159 that he calls the father at home and he says to the father, 863 00:40:35,159 --> 00:40:36,879 "I just wanted to apologize again. 864 00:40:36,879 --> 00:40:38,622 "I'm so sorry this happened." 865 00:40:38,622 --> 00:40:40,167 And the father kind of pauses for a moment. 866 00:40:40,167 --> 00:40:42,597 He says, "Well, I want you to know 867 00:40:42,597 --> 00:40:44,305 "I had a conversation with my daughter 868 00:40:44,305 --> 00:40:47,106 "and there's been some activities in my household 869 00:40:47,106 --> 00:40:49,023 "that I haven't been aware of 870 00:40:49,023 --> 00:40:50,778 "and she's due in August. 871 00:40:50,778 --> 00:40:53,777 "So I owe you an apology." 872 00:40:53,777 --> 00:40:55,368 And when I asked Andrew Pole about this, 873 00:40:55,368 --> 00:40:56,996 before he stopped talking to me, 874 00:40:56,996 --> 00:41:00,122 before Target told him that he couldn't talk to me anymore, 875 00:41:00,122 --> 00:41:03,469 he said, "Oh look, like you gotta understand, 876 00:41:03,469 --> 00:41:05,305 "like this science is just at the beginning, 877 00:41:05,305 --> 00:41:07,257 "like we're still playing with what we can figure out 878 00:41:07,257 --> 00:41:08,598 "about your life." 879 00:41:08,598 --> 00:41:13,603 (mellow electronic music) 880 00:41:18,126 --> 00:41:19,962 - Everybody who's on Facebook is involved 881 00:41:19,962 --> 00:41:22,298 in a transaction in which they're donating their data 882 00:41:22,298 --> 00:41:24,262 to Facebook, who then sells their data 883 00:41:24,262 --> 00:41:26,040 and in return they get this service 884 00:41:26,040 --> 00:41:27,388 which allows them to post pictures 885 00:41:27,388 --> 00:41:28,353 and connect to their friends 886 00:41:28,353 --> 00:41:30,608 and so on and so on and so on and so on. 887 00:41:30,608 --> 00:41:32,153 That's the transaction, 888 00:41:32,153 --> 00:41:34,442 but nobody knows that's the transaction. 889 00:41:34,442 --> 00:41:36,395 Most people, I think, don't understand that. 890 00:41:36,395 --> 00:41:39,626 They just literally think they're getting Facebook for free 891 00:41:39,626 --> 00:41:41,125 and it's not a free thing, 892 00:41:41,125 --> 00:41:46,130 we're paying for it by allowing them access to our data. 893 00:41:48,377 --> 00:41:51,143 - There are a lot of people on Facebook who don't know, 894 00:41:51,143 --> 00:41:54,653 for example, how much information is really out there 895 00:41:54,653 --> 00:41:57,453 about themselves and probably and apparently don't care 896 00:41:57,453 --> 00:42:00,286 as long as they can put up pictures of their cats. 897 00:42:00,286 --> 00:42:04,005 I think most people, when they think about privacy, 898 00:42:04,005 --> 00:42:06,338 they don't seem to connect 899 00:42:06,338 --> 00:42:09,647 their willingness to share their personal information 900 00:42:09,647 --> 00:42:12,553 with the world, either through social media 901 00:42:12,553 --> 00:42:14,900 or through shopping online or anything else, 902 00:42:14,900 --> 00:42:18,614 they don't seem to equate that with surveillance. 903 00:42:21,083 --> 00:42:24,593 - Every time I receive a text message, 904 00:42:24,593 --> 00:42:26,544 every time I make a phone call, 905 00:42:26,544 --> 00:42:28,473 my location is being recorded. 906 00:42:28,473 --> 00:42:32,518 That data about me is being pushed off to a server 907 00:42:32,518 --> 00:42:35,319 that is owned by my mobile operator. 908 00:42:35,319 --> 00:42:36,865 If I call that mobile phone operator and say 909 00:42:36,865 --> 00:42:39,619 "Hey, I'd like to have my data, please. 910 00:42:39,619 --> 00:42:40,735 "At the minimum, share it with me. 911 00:42:40,735 --> 00:42:45,337 "I'd like to see my locations over time." 912 00:42:45,337 --> 00:42:47,798 They won't give it to me. 913 00:42:47,798 --> 00:42:50,854 - The increased ability of these devices that we have 914 00:42:50,854 --> 00:42:53,387 to become recording and sensing objects, 915 00:42:53,387 --> 00:42:55,435 so data collection devices essentially, 916 00:42:55,435 --> 00:42:59,658 in public space, that changes a lot of things. 917 00:43:00,186 --> 00:43:02,359 - Even if the phone company took away 918 00:43:02,359 --> 00:43:04,207 all of your personal identifying information, 919 00:43:04,207 --> 00:43:06,625 it would know within about 30 centimeters 920 00:43:06,625 --> 00:43:08,135 where you woke up every morning 921 00:43:08,135 --> 00:43:09,553 and where you went to work every day 922 00:43:09,553 --> 00:43:10,762 and the path that you took 923 00:43:10,762 --> 00:43:12,145 and who you were walking with 924 00:43:12,145 --> 00:43:14,016 and so even if they didn't know who you are, 925 00:43:14,016 --> 00:43:16,021 they know who you are. 926 00:43:16,724 --> 00:43:20,280 What I'm really worried about is the cost to democracy. 927 00:43:20,280 --> 00:43:23,744 Now, today, it's nearly impossible to be truly anonymous 928 00:43:23,744 --> 00:43:27,742 and so the ability to everything to be connected to you 929 00:43:27,742 --> 00:43:29,426 and for everything you do in the real world 930 00:43:29,426 --> 00:43:30,879 to be connected to you, everything you're doing 931 00:43:30,879 --> 00:43:33,308 in cyberspace, and then the ability for 932 00:43:33,308 --> 00:43:35,399 whoever it is to take that, put it together, 933 00:43:35,399 --> 00:43:37,236 and turn it into a story. 934 00:43:37,236 --> 00:43:40,689 My fear really is that once there's so much data out there 935 00:43:40,689 --> 00:43:42,489 and once governments and companies 936 00:43:42,489 --> 00:43:45,836 start to be able to use that data to profile people, 937 00:43:45,836 --> 00:43:48,871 to filter them out, everybody is going to start to worry 938 00:43:48,871 --> 00:43:51,886 about their activities. 939 00:43:52,390 --> 00:43:56,854 - We're at a very, very important point 940 00:43:56,854 --> 00:44:01,536 where I think our society has come to realize this fact 941 00:44:01,536 --> 00:44:06,505 and just begun in earnest to debate the implictions of it. 942 00:44:07,254 --> 00:44:11,055 - You have, I think, an attitude in the NSA 943 00:44:11,055 --> 00:44:14,436 that they have a right to every bit of information 944 00:44:14,436 --> 00:44:16,305 they can collect. 945 00:44:16,305 --> 00:44:20,523 We have constructed a world where 946 00:44:20,523 --> 00:44:22,943 the government is collecting secretly 947 00:44:22,943 --> 00:44:25,870 all of the data it can on each individual citizen, 948 00:44:25,870 --> 00:44:29,783 whether that individual citizen has done anything or not. 949 00:44:29,783 --> 00:44:32,932 They have been collecting massive amounts of data 950 00:44:32,932 --> 00:44:36,012 through cell phone providers, Internet providers, 951 00:44:36,012 --> 00:44:38,801 that is then sifted through secretly 952 00:44:38,801 --> 00:44:42,577 by people over whom no democratic institution 953 00:44:42,577 --> 00:44:44,440 has effective control. 954 00:44:45,665 --> 00:44:47,955 There's a feeling that if you're not 955 00:44:47,955 --> 00:44:49,033 communing with terrorists, 956 00:44:49,033 --> 00:44:51,334 what do you care if the government gathers your information. 957 00:44:51,334 --> 00:44:53,345 This is probably the most pernicious, 958 00:44:53,345 --> 00:44:56,053 anti Bill of Rights line of thought that there is 959 00:44:56,053 --> 00:44:57,970 because these are rights we hold in common. 960 00:44:57,970 --> 00:44:59,481 Every violation of somebody else's rights 961 00:44:59,481 --> 00:45:01,612 is a violation of yours. 962 00:45:02,408 --> 00:45:04,116 - What's going to happen, I think, is that we now 963 00:45:04,116 --> 00:45:06,580 have so much information out there about ourselves 964 00:45:06,580 --> 00:45:08,544 and the ability for people to abuse it, 965 00:45:08,544 --> 00:45:09,962 people are going to get hurt, 966 00:45:09,962 --> 00:45:11,124 people are going to lose their jobs, 967 00:45:11,124 --> 00:45:12,751 people are going to get divorced, 968 00:45:12,751 --> 00:45:14,598 people are going to get killed 969 00:45:14,598 --> 00:45:16,307 and it's going to become really painful 970 00:45:16,307 --> 00:45:17,544 and everyone's going to realize 971 00:45:17,544 --> 00:45:19,439 we have to do something about this 972 00:45:19,439 --> 00:45:21,055 and then we're going to start to change. 973 00:45:21,055 --> 00:45:23,483 Now the question is how bad is it. 974 00:45:23,483 --> 00:45:26,447 - [Voiceover] You can't have a secret operation 975 00:45:26,447 --> 00:45:29,957 validated by a secret court based on secret evidence 976 00:45:29,957 --> 00:45:31,165 in a democratic republic. 977 00:45:31,165 --> 00:45:33,920 So the system closes and no information gets out 978 00:45:33,920 --> 00:45:37,684 except it gets leaked or it gets dumped on the world 979 00:45:37,684 --> 00:45:39,521 by outside actors, whether that's WikiLeaks, 980 00:45:39,521 --> 00:45:40,812 or whether that's Bradley Manning, 981 00:45:40,812 --> 00:45:42,275 or whether that's Edward Snowden. 982 00:45:42,275 --> 00:45:43,856 That's the way that people find out 983 00:45:43,856 --> 00:45:46,029 what their government is up to. 984 00:45:46,029 --> 00:45:47,412 We're living in a future where we've lost 985 00:45:47,412 --> 00:45:48,574 our right to privacy. 986 00:45:48,574 --> 00:45:49,992 We've given it away for convenience sake 987 00:45:49,992 --> 00:45:51,840 in our economic and social lives 988 00:45:51,840 --> 00:45:55,415 and we've lost it for fear's sake vis-a-vis our government. 989 00:45:58,270 --> 00:46:01,068 - Any time you're looking at an ability to segment, 990 00:46:01,068 --> 00:46:04,734 analyze, you've got to think about both sides. 991 00:46:05,240 --> 00:46:06,995 But there's so much good here, 992 00:46:06,995 --> 00:46:10,167 there's so much chance to improve the quality of life 993 00:46:10,167 --> 00:46:12,213 that to basically close the box and say, 994 00:46:12,213 --> 00:46:13,084 "You know what, we're not going to look 995 00:46:13,084 --> 00:46:15,304 "at all this information, we're not going to collect it," 996 00:46:15,304 --> 00:46:16,756 that's not practical. 997 00:46:16,756 --> 00:46:20,098 What we're going to have to do is think as a community. 998 00:46:20,557 --> 00:46:22,940 - We have cultures that have never been in dialogue 999 00:46:22,940 --> 00:46:26,031 with more than a hundred or 200 or 400 people 1000 00:46:26,031 --> 00:46:29,366 now connected to three billion. 1001 00:46:29,366 --> 00:46:34,371 (mellow music) 1002 00:46:35,918 --> 00:46:38,347 The phone is the on-ramp to the information network. 1003 00:46:38,347 --> 00:46:40,218 Once you're on the information network, 1004 00:46:40,218 --> 00:46:42,689 you're in, everybody's in. 1005 00:46:42,689 --> 00:46:44,479 - Billions and billions of people 1006 00:46:44,479 --> 00:46:46,909 who have been excluded from the discussion, 1007 00:46:46,909 --> 00:46:48,949 who couldn't afford to step into the world 1008 00:46:48,949 --> 00:46:50,158 of being connected, 1009 00:46:50,158 --> 00:46:51,738 step into the world of information, 1010 00:46:51,738 --> 00:46:54,992 step into the world of being able to learn things 1011 00:46:54,992 --> 00:46:58,380 they could never learn are suddenly on the network. 1012 00:47:00,303 --> 00:47:01,186 - [Voiceover] The world of the Internet, 1013 00:47:01,186 --> 00:47:02,430 from an innovation perspective, 1014 00:47:02,430 --> 00:47:05,149 is push innovation out of large institutions 1015 00:47:05,149 --> 00:47:07,700 to people on the edges. 1016 00:47:09,821 --> 00:47:13,377 - [Voiceover] I suspect as we equip these next billion 1017 00:47:13,377 --> 00:47:17,793 consumers with these devices that connect them 1018 00:47:17,793 --> 00:47:21,141 with the rest of the world and with the Internet, 1019 00:47:21,141 --> 00:47:24,855 we'll have a lot to learn about how they use them. 1020 00:47:26,730 --> 00:47:28,276 - All of these people in these countries 1021 00:47:28,276 --> 00:47:29,869 are now connecting with each other, 1022 00:47:29,869 --> 00:47:33,750 sharing data about prices of crops, prices of parts. 1023 00:47:33,750 --> 00:47:35,621 The Africans are talking to the Chinese 1024 00:47:35,621 --> 00:47:37,504 who are talking to the Indians 1025 00:47:37,504 --> 00:47:41,335 and the world is connected in its nooks and crannies. 1026 00:47:43,931 --> 00:47:46,777 - The person that is in Rwanda 1027 00:47:46,777 --> 00:47:50,659 that has their first phone that now has access 1028 00:47:50,659 --> 00:47:53,077 to an education system 1029 00:47:53,077 --> 00:47:55,831 that they never could have dreamed of before 1030 00:47:55,831 --> 00:47:58,632 can start finding solutions 1031 00:47:58,632 --> 00:48:02,467 for his or her little town, 1032 00:48:02,467 --> 00:48:04,762 his or her village. 1033 00:48:05,942 --> 00:48:09,277 - Once we have that ability to connect people 1034 00:48:09,277 --> 00:48:10,486 and they are able to be connected, 1035 00:48:10,486 --> 00:48:12,531 there's gonna be some genius, you know, 1036 00:48:12,531 --> 00:48:14,751 in some remote location who would never 1037 00:48:14,751 --> 00:48:16,332 have been discovered, who would never have had 1038 00:48:16,332 --> 00:48:19,423 the capability to get to the education, 1039 00:48:19,423 --> 00:48:23,974 to get to the resources that he or she needs and... 1040 00:48:26,059 --> 00:48:30,313 that young woman is going to change the world 1041 00:48:30,313 --> 00:48:33,073 rather than just changing her village. 1042 00:48:33,694 --> 00:48:38,668 - The idea that that genius will be able to find 1043 00:48:38,668 --> 00:48:41,423 his or her way into the greater culture 1044 00:48:41,423 --> 00:48:44,630 through the tiny, little two-by-two window 1045 00:48:44,630 --> 00:48:48,010 of a feature phone is very exciting. 1046 00:48:48,010 --> 00:48:51,221 - A billion people in India, a billion people in China, 1047 00:48:51,221 --> 00:48:52,534 you're talking, you know, 1048 00:48:52,534 --> 00:48:54,451 500 million to a billion in Africa. 1049 00:48:54,451 --> 00:48:56,997 Suddenly the world has a lot more minds 1050 00:48:56,997 --> 00:49:01,575 connected in the simplest, least expensive possible way 1051 00:49:01,575 --> 00:49:03,511 to make the world better. 1052 00:49:04,342 --> 00:49:05,969 - So you look at the agricultural revolution 1053 00:49:05,969 --> 00:49:07,514 and the Industrial Revolution. 1054 00:49:07,514 --> 00:49:09,676 Is the Internet and then the data revolution 1055 00:49:09,676 --> 00:49:11,605 associated with it of that scale? 1056 00:49:11,605 --> 00:49:13,197 It's certainly possible. 1057 00:49:13,197 --> 00:49:14,684 - I don't think there's any question that 1058 00:49:14,684 --> 00:49:16,289 we're at a moment in human history 1059 00:49:16,289 --> 00:49:19,043 that we will look back on in 50 or a hundred years 1060 00:49:19,043 --> 00:49:24,048 and say right around 2000 or so it all changed. 1061 00:49:25,970 --> 00:49:27,724 And I do think we will date 1062 00:49:27,724 --> 00:49:32,323 before the explosion of data and after. 1063 00:49:32,323 --> 00:49:33,903 I don't think it's an issue of climate change 1064 00:49:33,903 --> 00:49:36,414 or health or jobs, I think it's all issues. 1065 00:49:36,414 --> 00:49:39,737 Everything has information at its core, everything. 1066 00:49:39,737 --> 00:49:43,259 So if information matters, then reorganizing 1067 00:49:43,259 --> 00:49:45,386 the entire information network of the planet 1068 00:49:45,386 --> 00:49:47,721 is like wiring up the brain of a two-year-old child. 1069 00:49:47,721 --> 00:49:49,186 Suddenly that child can talk 1070 00:49:49,186 --> 00:49:51,568 and think and act and behave, right. 1071 00:49:51,568 --> 00:49:55,032 The world is wiring up a cerebral cortex, if you will, 1072 00:49:55,032 --> 00:49:57,228 of billions of connected elements 1073 00:49:57,228 --> 00:49:59,494 that are going to exchange billions of ideas, 1074 00:49:59,494 --> 00:50:01,087 billions of points of knowledge, 1075 00:50:01,087 --> 00:50:04,047 and billions of ways of working together. 1076 00:50:04,047 --> 00:50:07,591 - Together, there becomes an enormous wave of change 1077 00:50:07,591 --> 00:50:09,892 and that wave of change is going to take us 1078 00:50:09,892 --> 00:50:13,861 in directions that we can't begin to imagine. 1079 00:50:14,355 --> 00:50:18,690 - The ability to turn that data into actionable insight 1080 00:50:18,690 --> 00:50:20,538 is what computers are very good at, 1081 00:50:20,538 --> 00:50:23,048 the ability to take action is what we're really good at 1082 00:50:23,048 --> 00:50:26,244 and I think it's really important to separate those two 1083 00:50:26,244 --> 00:50:28,928 because people conflate them and get scared 1084 00:50:28,928 --> 00:50:31,056 and think the computers are taking over. 1085 00:50:31,056 --> 00:50:33,565 The computers are this extraordinary tool 1086 00:50:33,565 --> 00:50:37,610 that we have at our disposal to accelerate our ability 1087 00:50:37,610 --> 00:50:38,946 to solve the problems that, frankly, 1088 00:50:38,946 --> 00:50:40,457 we've gotten ourselves into. 1089 00:50:40,457 --> 00:50:42,247 - I am fundamentally optimistic, 1090 00:50:42,247 --> 00:50:46,257 but I'm not blindly, foolishly optimistic. 1091 00:50:46,257 --> 00:50:48,964 You got to remember, the financial crisis was brought to us 1092 00:50:48,964 --> 00:50:52,020 by big data people as well because 1093 00:50:52,020 --> 00:50:53,856 they weren't actually thinking very hard 1094 00:50:53,856 --> 00:50:55,853 about how do they create value for the world. 1095 00:50:55,853 --> 00:50:57,074 They were just thinking about 1096 00:50:57,074 --> 00:51:00,110 how do they create value for themselves. 1097 00:51:00,661 --> 00:51:02,660 You know, we have a fair amount of literature, 1098 00:51:02,660 --> 00:51:04,624 a fair amount of understanding that if you take 1099 00:51:04,624 --> 00:51:07,538 more out of the ecosystem than you put back in, 1100 00:51:07,538 --> 00:51:09,595 the whole thing breaks down. 1101 00:51:09,595 --> 00:51:13,733 That's why I think we have to actually earn our future. 1102 00:51:13,733 --> 00:51:16,104 We can't just sort of pat ourselves on the back 1103 00:51:16,104 --> 00:51:18,323 and think it's just going to fall into our laps. 1104 00:51:18,323 --> 00:51:22,194 We have to care about what kind of future we're making 1105 00:51:22,194 --> 00:51:23,959 and we have to invest in that future 1106 00:51:23,959 --> 00:51:26,168 and we have to make the right choices. 1107 00:51:26,168 --> 00:51:30,057 - It is, to me, paramount that a culture understands, 1108 00:51:30,057 --> 00:51:31,975 our culture understands 1109 00:51:31,975 --> 00:51:36,980 that we must take this data thing as ours, 1110 00:51:38,102 --> 00:51:40,075 that we are the platform for it, 1111 00:51:40,075 --> 00:51:42,400 humans, individuals are the platform for it, 1112 00:51:42,400 --> 00:51:45,083 that it is not something done to us, 1113 00:51:45,083 --> 00:51:49,012 but rather it is ours to do with something as we wish. 1114 00:51:52,644 --> 00:51:54,842 When I was young, we landed on the moon 1115 00:51:54,842 --> 00:51:58,931 and so the future to me meant going further than that. 1116 00:51:58,931 --> 00:52:00,697 We looked outward. 1117 00:52:00,697 --> 00:52:04,032 Today, I think there's a new energy around 1118 00:52:04,032 --> 00:52:06,497 the future and it has much more to do 1119 00:52:06,497 --> 00:52:08,960 with looking at where we are now 1120 00:52:08,960 --> 00:52:11,714 and the globe we stand on 1121 00:52:11,714 --> 00:52:14,977 and solving for that. 1122 00:52:14,977 --> 00:52:17,859 The tools that are in our hands now 1123 00:52:17,859 --> 00:52:20,483 are going to allow us to do that. 1124 00:52:20,483 --> 00:52:23,574 Now it's like no wait a minute, this is our place 1125 00:52:23,574 --> 00:52:27,792 and we're going to figure out how to make it blossom. 1126 00:52:27,792 --> 00:52:32,797 (dramatic music) 1127 00:53:15,806 --> 00:53:20,811 (mid tempo orchestral music) 91405

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.