All language subtitles for NOVA S49E13 Computers v Crime 1080p WEB h264-BAE (1)_track3_[eng]

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English Download
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek Download
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,900 --> 00:00:04,700 ♪ ♪ 2 00:00:06,800 --> 00:00:08,733 SARAH BRAYNE: We live in this era where 3 00:00:08,733 --> 00:00:11,033 we leave digital traces throughout the course 4 00:00:11,033 --> 00:00:13,133 of our everyday lives. 5 00:00:13,133 --> 00:00:14,866 ANDY CLARNO: What is this data, 6 00:00:14,866 --> 00:00:16,133 how is it collected, how is it being used? 7 00:00:16,133 --> 00:00:20,700 NARRATOR: One way it's being used is to make predictions 8 00:00:20,700 --> 00:00:22,033 about who might commit a crime... 9 00:00:22,033 --> 00:00:23,833 Hey, give me all your money, man! 10 00:00:23,833 --> 00:00:25,366 NARRATOR: ...and who should get bail. 11 00:00:25,366 --> 00:00:27,300 JUDGE: On count one, you're charged with felony intimidation... 12 00:00:27,300 --> 00:00:30,300 ANDREW FERGUSON: The idea is that if you look at past crimes, 13 00:00:30,300 --> 00:00:32,833 you might be able to predict the future. 14 00:00:32,833 --> 00:00:34,933 WILLIAM ISAAC: We want safer communities, 15 00:00:34,933 --> 00:00:36,633 we want societies that are less incarcerated. 16 00:00:36,633 --> 00:00:39,433 NARRATOR: But is that what we're getting? 17 00:00:39,433 --> 00:00:41,066 Are the predictions reliable? 18 00:00:41,066 --> 00:00:43,566 CATHY O'NEIL: I think algorithms can, 19 00:00:43,566 --> 00:00:44,833 in many cases, be better than people. 20 00:00:44,833 --> 00:00:47,333 But, of course, algorithms don't have consciousness. 21 00:00:47,333 --> 00:00:50,233 The algorithm only knows what it's been fed. 22 00:00:50,233 --> 00:00:51,300 RUHA BENJAMIN: Because it's technology, 23 00:00:51,300 --> 00:00:54,000 we don't question them as much as we might 24 00:00:54,000 --> 00:00:56,833 a racist judge or a racist officer. 25 00:00:56,833 --> 00:01:00,033 They're behind this veneer of neutrality. 26 00:01:00,033 --> 00:01:03,233 ISAAC: We need to know who's accountable 27 00:01:03,233 --> 00:01:06,333 when systems harm the communities 28 00:01:06,333 --> 00:01:07,600 that they're designed to serve. 29 00:01:07,600 --> 00:01:11,600 NARRATOR: Can we trust the justice of predictive algorithms? 30 00:01:11,600 --> 00:01:13,100 And should we? 31 00:01:13,100 --> 00:01:14,900 "Computers Vs. Crime," 32 00:01:14,900 --> 00:01:17,733 right now, on "NOVA." 33 00:01:39,900 --> 00:01:42,000 (computers booting up) 34 00:01:42,000 --> 00:01:44,766 ♪ ♪ 35 00:01:44,766 --> 00:01:47,433 NARRATOR: We live in a world of big data, 36 00:01:47,433 --> 00:01:49,666 where computers look for patterns 37 00:01:49,666 --> 00:01:52,200 in vast collections of information 38 00:01:52,200 --> 00:01:54,033 in order to predict the future. 39 00:01:54,033 --> 00:01:58,700 And we depend on their accuracy. 40 00:01:58,700 --> 00:02:00,500 Is it a good morning for jogging? 41 00:02:00,500 --> 00:02:02,966 Will this become cancer? 42 00:02:02,966 --> 00:02:05,366 What movie should I choose? 43 00:02:05,366 --> 00:02:08,033 The best way to beat traffic? 44 00:02:08,033 --> 00:02:09,766 Your computer can tell you. 45 00:02:09,766 --> 00:02:13,400 Similar computer programs, called predictive algorithms, 46 00:02:13,400 --> 00:02:17,000 are mining big data to make predictions 47 00:02:17,000 --> 00:02:19,266 about crime and punishment-- 48 00:02:19,266 --> 00:02:22,633 reinventing how our criminal legal system works. 49 00:02:22,633 --> 00:02:24,666 Policing agencies have used these computer algorithms 50 00:02:24,666 --> 00:02:30,200 in an effort to predict where the next crime will occur 51 00:02:30,200 --> 00:02:31,866 and even who the perpetrator will be. 52 00:02:31,866 --> 00:02:34,200 ASSISTANT DISTRICT ATTORNEY: Here, the state is recommending... 53 00:02:34,200 --> 00:02:35,000 NARRATOR: Judges use them 54 00:02:35,000 --> 00:02:37,100 to determine who should get bail 55 00:02:37,100 --> 00:02:38,566 and who shouldn't. 56 00:02:38,566 --> 00:02:41,200 JUDGE: If you fail to appear next time, you get no bond. 57 00:02:41,200 --> 00:02:44,166 NARRATOR: It may sound like the police of the future 58 00:02:44,166 --> 00:02:45,666 in the movie "Minority Report." I'm placing you under arrest 59 00:02:45,666 --> 00:02:47,633 for the future murder of Sarah Marks. 60 00:02:47,633 --> 00:02:49,566 NARRATOR: But fiction it's not. 61 00:02:49,566 --> 00:02:55,133 How do these predictions actually work? 62 00:02:55,133 --> 00:02:56,933 Can computer algorithms 63 00:02:56,933 --> 00:03:01,433 make our criminal legal system more equitable? 64 00:03:01,433 --> 00:03:08,300 Are these algorithms truly fair and free of human bias? 65 00:03:11,200 --> 00:03:12,300 ANDREW PAPACHRISTOS: I grew up in Chicago 66 00:03:12,300 --> 00:03:14,766 in the 1980s and early 1990s. 67 00:03:14,766 --> 00:03:21,666 ♪ ♪ 68 00:03:24,300 --> 00:03:26,466 My dad was an immigrant from Greece, 69 00:03:26,466 --> 00:03:29,700 we worked in my family's restaurant, 70 00:03:29,700 --> 00:03:31,266 called KaMar's. 71 00:03:31,266 --> 00:03:37,000 NARRATOR: Andrew Papachristos was a 16-year-old kid 72 00:03:37,000 --> 00:03:39,766 in the North Side of Chicago in the 1990s. 73 00:03:39,766 --> 00:03:42,933 I spent a lot of my formative years busing tables, 74 00:03:42,933 --> 00:03:47,400 serving people hamburgers and gyros. 75 00:03:47,400 --> 00:03:48,933 It kind of was a whole family affair. 76 00:03:48,933 --> 00:03:53,933 NARRATOR: Young Papachristos was aware the streets could be dangerous, 77 00:03:53,933 --> 00:04:00,033 but never imagined the violence would touch him or his family. 78 00:04:00,033 --> 00:04:02,866 REPORTER: Two more gang-related murders Monday night. 79 00:04:02,866 --> 00:04:05,166 PAPACHRISTOS: And of course, you know, the '80s and '90s in Chicago 80 00:04:05,166 --> 00:04:06,633 was some of the historically most violent periods in Chicago. 81 00:04:06,633 --> 00:04:11,700 Street corner drug markets, street organizations. 82 00:04:11,700 --> 00:04:14,933 And then like a lot of other businesses on our, on our block 83 00:04:14,933 --> 00:04:16,166 and in our neighborhood, 84 00:04:16,166 --> 00:04:19,466 local gangs tried to extort my family and the business. 85 00:04:19,466 --> 00:04:23,466 And my dad had been running KaMar's for 30 years 86 00:04:23,466 --> 00:04:24,866 and kind of just said no. 87 00:04:24,866 --> 00:04:27,066 ♪ ♪ 88 00:04:27,066 --> 00:04:32,166 (sirens blaring) 89 00:04:32,166 --> 00:04:37,066 NARRATOR: Then, one night, the family restaurant burned to the ground. 90 00:04:37,066 --> 00:04:40,066 Police suspected arson. 91 00:04:40,066 --> 00:04:42,033 PAPACHRISTOS: It was quite a shock to our family, 92 00:04:42,033 --> 00:04:43,300 'cause everybody in the neighborhood worked 93 00:04:43,300 --> 00:04:45,866 in the restaurant at one point in their life. 94 00:04:45,866 --> 00:04:51,533 And my parents lost 30 years of their lives. 95 00:04:51,533 --> 00:04:54,500 That was really one of the events that made me want to 96 00:04:54,500 --> 00:04:55,466 understand violence. 97 00:04:55,466 --> 00:04:56,900 Like, how could this happen? 98 00:04:56,900 --> 00:04:59,200 ♪ ♪ 99 00:04:59,200 --> 00:05:00,466 NARRATOR: About a decade later, 100 00:05:00,466 --> 00:05:06,066 Papachristos was a graduate student searching for answers. 101 00:05:06,066 --> 00:05:07,100 PAPACHRISTOS: In graduate school, 102 00:05:07,100 --> 00:05:11,166 I was working on a violence prevention program 103 00:05:11,166 --> 00:05:12,133 that brought together community members, 104 00:05:12,133 --> 00:05:17,300 including street outreach workers. 105 00:05:17,300 --> 00:05:19,333 And we were sitting at a table, 106 00:05:19,333 --> 00:05:21,566 and one of these outreach workers asked me, 107 00:05:21,566 --> 00:05:22,866 the university student, 108 00:05:22,866 --> 00:05:24,766 "Who's next? 109 00:05:24,766 --> 00:05:27,133 Who's going to get shot next?" 110 00:05:30,533 --> 00:05:32,900 And where that led was me sitting down 111 00:05:32,900 --> 00:05:36,566 with stacks of shooting and, and homicide files 112 00:05:36,566 --> 00:05:39,500 with a red pen and a legal pad, 113 00:05:39,500 --> 00:05:41,633 by hand creating these network images 114 00:05:41,633 --> 00:05:43,300 of, this person shot this person, 115 00:05:43,300 --> 00:05:46,033 and this person was involved with this group and this event, 116 00:05:46,033 --> 00:05:49,600 and creating a web of these relationships. 117 00:05:49,600 --> 00:05:50,933 And then I learned that 118 00:05:50,933 --> 00:05:52,966 there's this whole science about networks. 119 00:05:52,966 --> 00:05:55,366 I didn't have to invent anything. 120 00:05:55,366 --> 00:05:57,366 ♪ ♪ 121 00:05:57,366 --> 00:06:00,633 NARRATOR: Social network analysis was already influencing 122 00:06:00,633 --> 00:06:01,800 popular culture. 123 00:06:01,800 --> 00:06:07,200 "Six Degrees of Separation" was a play on Broadway. 124 00:06:07,200 --> 00:06:10,933 And then, there was Six Degrees of Kevin Bacon. 125 00:06:10,933 --> 00:06:12,366 PAPACHRISTOS: The idea was, you would play this game, 126 00:06:12,366 --> 00:06:15,100 and whoever got the shortest distance to Kevin Bacon 127 00:06:15,100 --> 00:06:16,733 would win. 128 00:06:16,733 --> 00:06:19,533 So Robert De Niro was in a movie with so-and-so, 129 00:06:19,533 --> 00:06:20,800 who was in a movie with Kevin Bacon. 130 00:06:20,800 --> 00:06:24,400 It was creating, essentially, a series of ties 131 00:06:24,400 --> 00:06:26,200 among movies and actors. 132 00:06:26,200 --> 00:06:28,900 And in fact, there's a mathematics 133 00:06:28,900 --> 00:06:30,033 behind that principle. 134 00:06:30,033 --> 00:06:34,566 It's actually old mathematical graph theory, right? 135 00:06:34,566 --> 00:06:36,466 That goes back to 1900s mathematics. 136 00:06:36,466 --> 00:06:41,000 And lots of scientists started seeing that there were 137 00:06:41,000 --> 00:06:42,066 mathematical principles, 138 00:06:42,066 --> 00:06:46,666 and computational resources-- computers, data-- 139 00:06:46,666 --> 00:06:48,933 were at a point that you could test those things. 140 00:06:48,933 --> 00:06:51,066 So it was in a very exciting time. 141 00:06:51,066 --> 00:06:54,433 We looked at arrest records, at police stops, 142 00:06:54,433 --> 00:06:56,533 and we looked at victimization records. 143 00:06:56,533 --> 00:06:57,700 Who was the victim of a homicide 144 00:06:57,700 --> 00:07:00,700 or a non-fatal shooting? 145 00:07:00,700 --> 00:07:02,033 ♪ ♪ 146 00:07:02,033 --> 00:07:08,033 The statistical model starts by creating the social networks of, 147 00:07:08,033 --> 00:07:09,433 say, everybody who may have been arrested 148 00:07:09,433 --> 00:07:10,700 in a, in a particular neighborhood. 149 00:07:10,700 --> 00:07:14,300 So Person A and Person B were in a robbery together, 150 00:07:14,300 --> 00:07:16,500 they have a tie, and then Person B and Person C 151 00:07:16,500 --> 00:07:19,966 were, were stopped by the police in another instance. 152 00:07:19,966 --> 00:07:22,200 And it creates networks of thousands of people. 153 00:07:22,200 --> 00:07:25,266 Understanding that events are connected, 154 00:07:25,266 --> 00:07:26,666 places are connected. 155 00:07:26,666 --> 00:07:28,433 That there are old things, like disputes between crews, 156 00:07:28,433 --> 00:07:35,133 which actually drive behavior for generations. 157 00:07:35,133 --> 00:07:37,166 What we saw was striking. 158 00:07:37,166 --> 00:07:38,933 (snaps): And you could see it immediately, 159 00:07:38,933 --> 00:07:40,433 and you could see it a mile away. 160 00:07:40,433 --> 00:07:42,833 Which was, gunshot victims clumped together. 161 00:07:42,833 --> 00:07:45,833 You, you very rarely see one victim. 162 00:07:45,833 --> 00:07:47,266 You see two, three, four. 163 00:07:47,266 --> 00:07:48,400 Sometimes they string across time and space. 164 00:07:48,400 --> 00:07:53,933 And then the model predicts, what's the probability 165 00:07:53,933 --> 00:07:56,566 that this is going to lead to a shooting 166 00:07:56,566 --> 00:07:59,033 on the same pathway in the future? 167 00:07:59,033 --> 00:08:02,433 (gun firing, people shouting) 168 00:08:02,433 --> 00:08:05,133 REPORTER: Another young man lies dead. 169 00:08:05,133 --> 00:08:08,100 NARRATOR: In Boston, Papachristos found 170 00:08:08,100 --> 00:08:10,500 that 85% of all gunshot injuries 171 00:08:10,500 --> 00:08:13,333 occurred within a single social network. 172 00:08:13,333 --> 00:08:15,266 Individuals in this network 173 00:08:15,266 --> 00:08:17,500 were less than five handshakes away 174 00:08:17,500 --> 00:08:20,200 from the victim of a gun homicide 175 00:08:20,200 --> 00:08:22,733 or non-fatal shooting. 176 00:08:22,733 --> 00:08:24,833 The closer a person was 177 00:08:24,833 --> 00:08:26,066 connected to a gunshot victim, he found, 178 00:08:26,066 --> 00:08:32,500 the greater the probability that that person would be shot. 179 00:08:32,500 --> 00:08:35,566 Around 2011, when Papachristos was presenting 180 00:08:35,566 --> 00:08:38,500 his groundbreaking work on social networks 181 00:08:38,500 --> 00:08:40,300 and gang violence, 182 00:08:40,300 --> 00:08:43,900 the Chicago Police Department wanted to know more. 183 00:08:43,900 --> 00:08:45,033 PAPACHRISTOS: We were at a conference. 184 00:08:45,033 --> 00:08:46,900 The then-superintendent of the police department, 185 00:08:46,900 --> 00:08:49,200 he was asking me a bunch of questions. 186 00:08:49,200 --> 00:08:50,700 He had clearly read the paper. 187 00:08:50,700 --> 00:08:53,266 NARRATOR: The Chicago Police Department 188 00:08:53,266 --> 00:08:55,766 was working on its own predictive policing program 189 00:08:55,766 --> 00:08:57,566 to fight crime. 190 00:08:57,566 --> 00:09:00,266 They were convinced that Papachristos's model 191 00:09:00,266 --> 00:09:05,966 could make their new policing model even more effective. 192 00:09:01,800 --> 00:09:05,966 could make their new policing model even more effective. 193 00:09:05,966 --> 00:09:07,166 LOGAN KOEPKE: Predictive policing involves 194 00:09:07,166 --> 00:09:12,600 looking to historical crime data to predict future events, 195 00:09:12,600 --> 00:09:14,800 either where police believe crime may occur 196 00:09:14,800 --> 00:09:18,600 or who might be involved in certain crimes. 197 00:09:18,600 --> 00:09:21,366 ♪ ♪ 198 00:09:21,366 --> 00:09:22,566 So it's the use of historical data to forecast a future event. 199 00:09:22,566 --> 00:09:27,133 NARRATOR: At the core of these programs is software, 200 00:09:27,133 --> 00:09:30,133 which, like all computer programs, 201 00:09:30,133 --> 00:09:33,400 is built around an algorithm. 202 00:09:33,400 --> 00:09:35,366 So, think of an algorithm like a recipe. 203 00:09:35,366 --> 00:09:40,000 ♪ ♪ 204 00:09:40,000 --> 00:09:41,800 You have inputs, 205 00:09:41,800 --> 00:09:44,600 which are your ingredients, you have the algorithm, 206 00:09:44,600 --> 00:09:45,433 which is the steps. 207 00:09:45,433 --> 00:09:50,266 ♪ ♪ 208 00:09:50,266 --> 00:09:51,766 And then there's the output, 209 00:09:51,766 --> 00:09:53,866 which is hopefully the delicious cake you're making. 210 00:09:57,400 --> 00:09:59,100 GROUP: Happy birthday! 211 00:09:59,100 --> 00:10:02,533 So one way to think about algorithms is to think about 212 00:10:02,533 --> 00:10:03,800 the hiring process. 213 00:10:03,800 --> 00:10:07,166 In fact, recruiters have been studied for a hundred years. 214 00:10:07,166 --> 00:10:10,533 And it turns out many human recruiters 215 00:10:10,533 --> 00:10:13,366 have a standard algorithm 216 00:10:13,366 --> 00:10:16,033 when they're looking at a risumi. 217 00:10:16,033 --> 00:10:19,466 So they start with your name, and then they look to see 218 00:10:19,466 --> 00:10:21,266 where you went to school, and then finally, 219 00:10:21,266 --> 00:10:23,900 they look at what your last job was. 220 00:10:23,900 --> 00:10:26,433 If they don't see the pattern they're looking for... 221 00:10:26,433 --> 00:10:28,133 (bell dings) ...that's all the time you get. 222 00:10:28,133 --> 00:10:32,100 And in a sense, that's exactly what artificial intelligence 223 00:10:32,100 --> 00:10:35,366 is doing, as well, in a very basic level. 224 00:10:35,366 --> 00:10:37,733 It's recognizing sets of patterns and using that 225 00:10:37,733 --> 00:10:42,333 to decide what the next step in its decision process would be. 226 00:10:42,333 --> 00:10:47,366 ♪ ♪ 227 00:10:47,366 --> 00:10:48,566 NARRATOR: What is commonly referred to 228 00:10:48,566 --> 00:10:51,600 as artificial intelligence, or A.I., 229 00:10:51,600 --> 00:10:54,333 is a process called machine learning, 230 00:10:54,333 --> 00:10:56,866 where a computer algorithm will adjust on its own, 231 00:10:56,866 --> 00:10:58,000 without human instructions, 232 00:10:58,000 --> 00:11:03,200 in response to the patterns it finds in the data. 233 00:11:03,200 --> 00:11:07,000 These powerful processes can analyze more data 234 00:11:07,000 --> 00:11:08,000 than any person can, 235 00:11:08,000 --> 00:11:13,266 and find patterns never recognized before. 236 00:11:13,266 --> 00:11:15,066 The principles for machine learning 237 00:11:15,066 --> 00:11:16,133 were invented in the 1950s, 238 00:11:16,133 --> 00:11:22,266 but began proliferating only after about 2010. 239 00:11:22,266 --> 00:11:23,366 What we consider machine learning today 240 00:11:23,366 --> 00:11:28,066 came about because hard drives became very cheap. 241 00:11:28,066 --> 00:11:31,233 So it was really easy to get a lot of data on everyone 242 00:11:31,233 --> 00:11:32,933 in every aspect of life. 243 00:11:32,933 --> 00:11:35,700 And the question is, what can we do with all of that data? 244 00:11:35,700 --> 00:11:39,300 Those new uses are things like predictive policing, 245 00:11:39,300 --> 00:11:42,233 they are things like deciding whether or not a person's 246 00:11:42,233 --> 00:11:44,333 going to get a job or not, 247 00:11:44,333 --> 00:11:45,800 or be invited for a job interview. 248 00:11:45,800 --> 00:11:51,266 NARRATOR: So how does such a powerful tool like machine learning work? 249 00:11:51,266 --> 00:11:53,766 Take the case of a hiring algorithm. 250 00:11:53,766 --> 00:11:56,733 First, a computer needs to understand the objective. 251 00:11:56,733 --> 00:11:59,266 Here, the objective is identifying 252 00:11:59,266 --> 00:12:01,600 the best candidate for the job. 253 00:12:01,600 --> 00:12:04,666 The algorithm looks at risumis of former job candidates 254 00:12:04,666 --> 00:12:10,566 and searches for keywords in risumis of successful hires. 255 00:12:10,566 --> 00:12:14,433 The risumis are what's called training data. 256 00:12:14,433 --> 00:12:18,033 The algorithm assigns values to each keyword. 257 00:12:18,033 --> 00:12:20,466 Words that appear more frequently in the risumis 258 00:12:20,466 --> 00:12:23,833 of successful candidates are given more value. 259 00:12:23,833 --> 00:12:27,433 The system learns from past risumis the patterns 260 00:12:27,433 --> 00:12:31,133 of qualities that are associated with successful hires. 261 00:12:31,133 --> 00:12:33,266 Then it makes its predictions by identifying these 262 00:12:33,266 --> 00:12:36,500 same patterns from the risumis of potential candidates. 263 00:12:36,500 --> 00:12:39,366 ♪ ♪ 264 00:12:39,366 --> 00:12:41,300 In a similar way, 265 00:12:41,300 --> 00:12:43,666 the Chicago police wanted to find patterns in crime reports 266 00:12:43,666 --> 00:12:47,533 and arrest records to predict who would be connected 267 00:12:47,533 --> 00:12:48,666 to violence in the future. 268 00:12:48,666 --> 00:12:54,800 They thought Papachristos's model could help. 269 00:12:54,800 --> 00:12:58,066 Obviously we wanted to, and tried, and framed and wrote 270 00:12:58,066 --> 00:12:59,233 all the caveats and made our recommendations to say, 271 00:12:59,233 --> 00:13:03,733 "This research should be in this public health space." 272 00:13:03,733 --> 00:13:06,300 But once the math is out there, 273 00:13:06,300 --> 00:13:07,500 once the statistics are out there, 274 00:13:07,500 --> 00:13:10,933 people can also take it and do what they want with it. 275 00:13:13,033 --> 00:13:16,700 NARRATOR: While Papachristos saw the model as a tool to identify 276 00:13:16,700 --> 00:13:17,866 future victims of gun violence, 277 00:13:17,866 --> 00:13:21,966 CPD saw the chance to identify not only future victims, 278 00:13:21,966 --> 00:13:24,833 but future criminals. 279 00:13:26,800 --> 00:13:28,866 First it took me, you know, by, by surprise, 280 00:13:28,866 --> 00:13:30,200 and then it got me worried. 281 00:13:30,200 --> 00:13:31,466 What is it gonna do? 282 00:13:31,466 --> 00:13:33,733 Who is it gonna harm? 283 00:13:33,733 --> 00:13:35,966 ♪ ♪ 284 00:13:35,966 --> 00:13:39,000 NARRATOR: What the police wanted to predict was who was at risk 285 00:13:39,000 --> 00:13:42,400 for being involved in future violence. 286 00:13:42,400 --> 00:13:43,866 Gimme all your money, man. 287 00:13:43,866 --> 00:13:47,466 NARRATOR: Training on hundreds of thousands of arrest records, 288 00:13:47,466 --> 00:13:51,366 the computer algorithm looks for patterns or factors 289 00:13:51,366 --> 00:13:53,833 associated with violent crime 290 00:13:53,833 --> 00:13:55,266 to calculate the risk that an individual 291 00:13:55,266 --> 00:14:00,566 will be connected to future violence. 292 00:14:00,566 --> 00:14:03,200 Using social network analysis, 293 00:14:03,200 --> 00:14:05,233 arrest records of associates 294 00:14:05,233 --> 00:14:08,366 are also included in that calculation. 295 00:14:08,366 --> 00:14:14,033 The program was called the Strategic Subject List, or SSL. 296 00:14:14,033 --> 00:14:16,400 It would be one of the most controversial 297 00:14:16,400 --> 00:14:17,866 in Chicago policing history. 298 00:14:17,866 --> 00:14:20,933 ANDY CLARNO: The idea behind the Strategic Subjects List, 299 00:14:20,933 --> 00:14:22,866 or the SSL, 300 00:14:22,866 --> 00:14:24,433 was to try to identify the people who would be 301 00:14:24,433 --> 00:14:29,666 most likely to become involved as what they called 302 00:14:29,666 --> 00:14:32,666 a "party to violence," either as a shooter or a victim. 303 00:14:32,666 --> 00:14:34,400 ♪ ♪ 304 00:14:34,400 --> 00:14:36,666 NARRATOR: Chicago police would use Papachristos's research 305 00:14:36,666 --> 00:14:40,433 to evaluate what was called an individual's 306 00:14:40,433 --> 00:14:43,500 "co-arrest network." 307 00:14:43,500 --> 00:14:45,600 And the way that the Chicago Police Department 308 00:14:45,600 --> 00:14:49,133 calculated an individual's network was through 309 00:14:49,133 --> 00:14:50,633 kind of two degrees of removal. 310 00:14:50,633 --> 00:14:53,966 Anybody that I'd been arrested with and anybody that they 311 00:14:53,966 --> 00:14:56,600 would, had been arrested with counted as people who were 312 00:14:56,600 --> 00:14:58,266 within my network. 313 00:14:58,266 --> 00:15:01,666 So my risk score would be based on my individual history 314 00:15:01,666 --> 00:15:03,866 of arrest and victimization, as well as the histories 315 00:15:03,866 --> 00:15:06,600 of arrest and victimization of people within that 316 00:15:06,600 --> 00:15:10,066 two-degree network of mine. 317 00:15:10,066 --> 00:15:11,566 It was colloquially known as the "heat list." 318 00:15:11,566 --> 00:15:12,700 If you were hot, you were on it. 319 00:15:12,700 --> 00:15:15,566 And they gave you literally a risk score. 320 00:15:15,566 --> 00:15:16,666 At one time, it was zero to 500-plus. 321 00:15:16,666 --> 00:15:18,866 If you're 500-plus, you are a high-risk person. 322 00:15:18,866 --> 00:15:21,700 ♪ ♪ 323 00:15:21,700 --> 00:15:22,900 And if you made this heat list, 324 00:15:22,900 --> 00:15:26,933 you might find a detective knocking on your front door. 325 00:15:26,933 --> 00:15:32,133 ♪ ♪ 326 00:15:32,133 --> 00:15:33,166 NARRATOR: Trying to predict 327 00:15:33,166 --> 00:15:38,533 future criminal activity is not a new idea. 328 00:15:38,533 --> 00:15:41,166 Scotland Yard in London began using this approach 329 00:15:41,166 --> 00:15:45,166 by mapping crime events in the 1930s. 330 00:15:48,866 --> 00:15:50,100 But in the 1990s, 331 00:15:50,100 --> 00:15:55,133 it was New York City Police Commissioner William Bratton 332 00:15:55,133 --> 00:15:57,933 who took crime mapping to another level. 333 00:15:57,933 --> 00:16:00,766 BRATTON: I run the New York City Police Department. 334 00:16:00,766 --> 00:16:03,166 My competition is the criminal element. 335 00:16:03,166 --> 00:16:06,766 NARRATOR: Bratton convinced policing agencies across the country 336 00:16:06,766 --> 00:16:09,200 that data-driven policing was the key 337 00:16:09,200 --> 00:16:10,566 to successful policing strategies. 338 00:16:10,566 --> 00:16:12,566 Part of this is to prevent crime in the first place. 339 00:16:12,566 --> 00:16:17,566 ♪ ♪ 340 00:16:17,566 --> 00:16:18,733 NARRATOR: Bratton was inspired 341 00:16:18,733 --> 00:16:22,066 by the work of his own New York City Transit Police. 342 00:16:22,066 --> 00:16:24,066 As you see all those, 343 00:16:24,066 --> 00:16:25,933 uh, dots on the map, 344 00:16:25,933 --> 00:16:27,300 that's our opponents. 345 00:16:27,300 --> 00:16:29,166 NARRATOR: It was called Charts of the Future, 346 00:16:29,166 --> 00:16:34,000 and credited with cutting subway felonies by 27% 347 00:16:34,000 --> 00:16:34,633 and robberies by a third. 348 00:16:34,633 --> 00:16:39,433 Bratton saw potential. 349 00:16:39,433 --> 00:16:42,300 He ordered all New York City precincts 350 00:16:42,300 --> 00:16:43,266 to systematically map crime, 351 00:16:43,266 --> 00:16:48,533 collect data, find patterns, report back. 352 00:16:48,533 --> 00:16:50,933 The new approach was called CompStat. 353 00:16:50,933 --> 00:16:54,733 BRAYNE: CompStat, I think, in a way, is kind of a precessor 354 00:16:54,733 --> 00:16:55,900 of predictive policing, 355 00:16:55,900 --> 00:17:00,600 in the sense that many of the same principles there-- 356 00:17:00,600 --> 00:17:03,066 you know, using data tracking, year-to-dates, 357 00:17:03,066 --> 00:17:06,433 identifying places where law enforcement interventions 358 00:17:06,433 --> 00:17:07,566 could be effective, et cetera-- 359 00:17:07,566 --> 00:17:10,266 really laid the groundwork for predictive policing. 360 00:17:10,266 --> 00:17:12,700 ♪ ♪ 361 00:17:12,700 --> 00:17:14,533 NARRATOR: By the early 2000s, 362 00:17:14,533 --> 00:17:16,966 as computational power increased, 363 00:17:16,966 --> 00:17:18,766 criminologists were convinced this new data trove 364 00:17:18,766 --> 00:17:23,700 could be used in machine learning to create models 365 00:17:23,700 --> 00:17:24,666 that predict when and where 366 00:17:24,666 --> 00:17:27,833 crime would happen in the future. 367 00:17:27,833 --> 00:17:30,033 ♪ ♪ 368 00:17:30,033 --> 00:17:32,100 REPORTER: L.A. police now say the gunmen opened fire 369 00:17:32,100 --> 00:17:33,800 with a semi-automatic weapon. 370 00:17:33,800 --> 00:17:35,400 NARRATOR: In 2008, 371 00:17:35,400 --> 00:17:38,800 now as chief of the Los Angeles Police Department, 372 00:17:38,800 --> 00:17:41,466 Bratton joined with academics at U.C.L.A. 373 00:17:41,466 --> 00:17:44,700 to help launch a predictive policing system 374 00:17:44,700 --> 00:17:46,166 called PredPol, 375 00:17:46,166 --> 00:17:49,100 powered by a machine learning algorithm. 376 00:17:49,100 --> 00:17:52,533 ♪ ♪ 377 00:17:52,533 --> 00:17:54,166 ISAAC: PredPol started 378 00:17:54,166 --> 00:17:56,100 as a spin-off of a set of, like, 379 00:17:56,100 --> 00:18:00,033 government contracts that were related to military work. 380 00:18:00,033 --> 00:18:02,166 They were developing 381 00:18:02,166 --> 00:18:05,566 a form of an algorithm that was used to predict I.E.Ds. 382 00:18:05,566 --> 00:18:07,766 (device explodes) 383 00:18:07,766 --> 00:18:08,966 And it was a technique that was used 384 00:18:08,966 --> 00:18:13,233 to also detect aftershocks and seismographic activity. 385 00:18:13,233 --> 00:18:15,866 (dogs barking and whining, objects clattering) 386 00:18:15,866 --> 00:18:17,133 And after those contracts ended, 387 00:18:17,133 --> 00:18:19,300 the company decided they wanted to apply this 388 00:18:19,300 --> 00:18:20,466 in the domain of, of policing 389 00:18:20,466 --> 00:18:22,400 domestically in the United States. 390 00:18:22,400 --> 00:18:25,266 (radio beeping) 391 00:18:25,266 --> 00:18:27,266 NARRATOR: The PredPol model 392 00:18:27,266 --> 00:18:28,666 relies on three types of historical data: 393 00:18:28,666 --> 00:18:35,266 type of crime, crime location, and time of crime, 394 00:18:35,266 --> 00:18:37,400 going back two to five years. 395 00:18:37,400 --> 00:18:38,966 The algorithm is looking for patterns 396 00:18:38,966 --> 00:18:44,266 to identify locations where crime is most likely to occur. 397 00:18:44,266 --> 00:18:46,866 As new crime incidents are reported, 398 00:18:46,866 --> 00:18:51,733 they get folded into the calculation. 399 00:18:51,733 --> 00:18:52,800 The predictions are displayed on a map 400 00:18:52,800 --> 00:18:57,400 as 500 x 500 foot areas that officers are then 401 00:18:57,400 --> 00:18:59,833 directed to patrol. 402 00:18:59,833 --> 00:19:01,700 ISAAC: And then from there, the algorithm says, 403 00:19:01,700 --> 00:19:04,800 "Okay, based on what we know about the kind of 404 00:19:04,800 --> 00:19:06,633 "very recent history, 405 00:19:06,633 --> 00:19:08,766 "where is likely that we'll see crime 406 00:19:08,766 --> 00:19:11,100 in the next day or the next hour?" 407 00:19:11,100 --> 00:19:14,233 ♪ ♪ 408 00:19:14,233 --> 00:19:15,766 BRAYNE: One of the key reasons 409 00:19:15,766 --> 00:19:16,800 that police start using these tools 410 00:19:16,800 --> 00:19:20,066 is the efficient and even, to a certain extent, 411 00:19:20,066 --> 00:19:21,366 like in their logic, 412 00:19:21,366 --> 00:19:24,300 more fair, um, and, and justifiable allocation 413 00:19:24,300 --> 00:19:25,533 of their police resources. 414 00:19:25,533 --> 00:19:28,133 ♪ ♪ 415 00:19:28,133 --> 00:19:29,800 NARRATOR: By 2013, 416 00:19:29,800 --> 00:19:33,400 in addition to PredPol, predictive policing systems 417 00:19:33,400 --> 00:19:37,100 developed by companies like HunchLab, IBM, and Palantir 418 00:19:37,100 --> 00:19:39,500 were in use across the country. 419 00:19:39,500 --> 00:19:42,000 (radios running in background) 420 00:19:42,000 --> 00:19:44,900 And computer algorithms 421 00:19:44,900 --> 00:19:46,166 were also being adopted in courtrooms. 422 00:19:46,166 --> 00:19:52,866 BAILIFF: 21CF3810, State of Wisconsin versus Chantille... 423 00:19:52,866 --> 00:19:55,833 KATHERINE FORREST: These tools are used in pretrial determinations, 424 00:19:55,833 --> 00:19:58,966 they're used in sentencing determinations, 425 00:19:58,966 --> 00:20:00,300 and they're used in housing determinations. 426 00:20:00,300 --> 00:20:05,733 They're also used, importantly, in the plea bargaining phase. 427 00:20:05,733 --> 00:20:08,133 They're used really throughout the entire process 428 00:20:08,133 --> 00:20:13,000 to try to do what judges have been doing, 429 00:20:13,000 --> 00:20:15,200 which is the very, very difficult task 430 00:20:15,200 --> 00:20:16,400 of trying to understand and predict 431 00:20:16,400 --> 00:20:21,666 what will a human being do tomorrow, or the next day, 432 00:20:21,666 --> 00:20:23,366 or next month, or three years from now. 433 00:20:23,366 --> 00:20:25,333 ASSISTANT DISTRICT ATTORNEY: Bail forfeited. 434 00:20:25,333 --> 00:20:27,666 He failed to appear 12/13/21. 435 00:20:27,666 --> 00:20:29,533 Didn't even make it to preliminary hearing. 436 00:20:29,533 --> 00:20:33,200 The software tools are an attempt to try to predict 437 00:20:33,200 --> 00:20:34,366 it better than humans can. 438 00:20:34,366 --> 00:20:36,366 MICHELLE HAVAS: On count one, you're charged with 439 00:20:36,366 --> 00:20:38,266 felony intimidation of a victim. 440 00:20:38,266 --> 00:20:40,866 SWEENEY: So, in the United States, you're innocent 441 00:20:40,866 --> 00:20:44,300 until you've been proven guilty, but you've been arrested. 442 00:20:44,300 --> 00:20:45,800 Now that you've been arrested, 443 00:20:45,800 --> 00:20:48,166 a judge has to decide whether or not 444 00:20:48,166 --> 00:20:49,533 you get out on bail, 445 00:20:49,533 --> 00:20:51,700 or how high or low that bail should be. 446 00:20:51,700 --> 00:20:55,200 You're charged with driving on a suspended license. 447 00:20:55,200 --> 00:20:56,400 I've set that bond at $1,000. 448 00:20:56,400 --> 00:20:58,833 No insurance, I've set that bond at $1,000. 449 00:20:58,833 --> 00:21:01,633 ALISON SHAMES: One of the problems is, 450 00:21:01,633 --> 00:21:04,400 judges often are relying on money bond 451 00:21:04,400 --> 00:21:05,600 or financial conditions of release. 452 00:21:05,600 --> 00:21:08,200 JUDGE: So I'm going to lower his fine 453 00:21:08,200 --> 00:21:10,000 to make it a bit more reasonable. 454 00:21:10,000 --> 00:21:13,166 So instead of $250,000 cash, 455 00:21:13,166 --> 00:21:15,300 surety is $100,000. 456 00:21:15,300 --> 00:21:17,866 SHAMES: It allows people who have access to money to be released. 457 00:21:17,866 --> 00:21:20,566 If you are poor, you are often being detained pretrial. 458 00:21:20,566 --> 00:21:27,066 Approximately 70% of the people in jail are there on pretrial. 459 00:21:27,066 --> 00:21:29,500 These are people who are presumed innocent, 460 00:21:29,500 --> 00:21:32,533 but are detained during the pretrial stage of their case. 461 00:21:32,533 --> 00:21:38,200 NARRATOR: Many jurisdictions use pretrial assessment algorithms 462 00:21:38,200 --> 00:21:42,100 with a goal to reduce jail populations and decrease 463 00:21:42,100 --> 00:21:43,566 the impact of judicial bias. 464 00:21:43,566 --> 00:21:49,233 SHAMES: The use of a tool like this takes historical data 465 00:21:49,233 --> 00:21:51,366 and assesses, based on research, 466 00:21:51,366 --> 00:21:56,866 associates factors that are predictive of the two outcomes 467 00:21:56,866 --> 00:21:59,133 that the judge is concerned with. 468 00:21:59,133 --> 00:22:02,066 That's community safety and whether that person 469 00:22:02,066 --> 00:22:05,866 will appear back in court during the pretrial period. 470 00:22:05,866 --> 00:22:08,300 ♪ ♪ 471 00:22:08,300 --> 00:22:11,900 NARRATOR: Many of these algorithms are based on a concept called 472 00:22:11,900 --> 00:22:12,833 a regression model. 473 00:22:12,833 --> 00:22:17,033 The earliest, called linear regression, 474 00:22:17,033 --> 00:22:24,333 dates back to 19th-century mathematics. 475 00:22:24,333 --> 00:22:27,066 O'NEIL: At the end of the day, machine learning algorithms 476 00:22:27,066 --> 00:22:29,900 do exactly what linear regression does, 477 00:22:29,900 --> 00:22:31,966 which is predict-- 478 00:22:31,966 --> 00:22:34,600 based on the initial conditions, the situation they're seeing-- 479 00:22:34,600 --> 00:22:36,333 predict what will happen in the future, 480 00:22:36,333 --> 00:22:38,033 whether that's, like, in the next one minute 481 00:22:38,033 --> 00:22:40,433 or the next four years. 482 00:22:41,800 --> 00:22:44,566 NARRATOR: Throughout the United States, over 60 jurisdictions 483 00:22:44,566 --> 00:22:49,366 use predictive algorithms as part of the legal process. 484 00:22:49,366 --> 00:22:53,100 One of the most widely used is COMPAS. 485 00:22:53,100 --> 00:22:55,166 The COMPAS algorithm weighs factors, 486 00:22:55,166 --> 00:22:57,266 including a defendant's answers to a questionnaire, 487 00:22:57,266 --> 00:23:02,566 to provide a risk assessment score. 488 00:23:02,566 --> 00:23:07,266 These scores are used every day by judges to guide decisions 489 00:23:07,266 --> 00:23:12,566 about pretrial detention, bail, and even sentencing. 490 00:23:12,566 --> 00:23:15,700 But the reliability of the COMPAS algorithm 491 00:23:15,700 --> 00:23:17,266 has been questioned. 492 00:23:17,266 --> 00:23:24,066 In 2016, ProPublica published an investigative report 493 00:23:24,066 --> 00:23:25,666 on the COMPAS risk assessment tool. 494 00:23:25,666 --> 00:23:31,300 BENJAMIN: Investigators wanted to see if the scores were accurate 495 00:23:31,300 --> 00:23:33,233 in predicting whether these individuals 496 00:23:33,233 --> 00:23:35,866 would commit a future crime. 497 00:23:35,866 --> 00:23:38,166 And they found two things that were interesting. 498 00:23:38,166 --> 00:23:42,933 One was that the score was remarkably unreliable 499 00:23:42,933 --> 00:23:46,833 in predicting who would commit a, a crime in the future 500 00:23:46,833 --> 00:23:48,133 over this two-year period. 501 00:23:48,133 --> 00:23:52,333 But then the other thing that ProPublica investigators found 502 00:23:52,333 --> 00:23:57,700 was that Black people were much more likely to be deemed 503 00:23:57,700 --> 00:24:01,600 high risk and white people low risk. 504 00:24:01,600 --> 00:24:04,833 NARRATOR: This was true even in cases when the Black person 505 00:24:04,833 --> 00:24:06,866 was arrested for a minor offense and the white person 506 00:24:06,866 --> 00:24:12,233 in question was arrested for a more serious crime. 507 00:24:12,233 --> 00:24:17,900 BENJAMIN: This ProPublica study was one of the first to begin 508 00:24:17,900 --> 00:24:22,100 to burst the bubble of technology 509 00:24:22,100 --> 00:24:24,733 as somehow objective and neutral. 510 00:24:24,733 --> 00:24:31,566 NARRATOR: The article created a national controversy. 511 00:24:31,566 --> 00:24:35,166 But at Dartmouth, a student convinced her professor 512 00:24:35,166 --> 00:24:36,533 they should both be more than stunned. 513 00:24:36,533 --> 00:24:40,566 HANY FARID: As it turns out, one of my students, Julia Dressel, 514 00:24:40,566 --> 00:24:42,400 reads the same article and said, 515 00:24:42,400 --> 00:24:44,200 "This is terrible. 516 00:24:44,200 --> 00:24:45,533 We should do something about it." (chuckles) 517 00:24:45,533 --> 00:24:48,266 This is the difference between an awesome idealistic student 518 00:24:48,266 --> 00:24:50,333 and a jaded, uh, professor. 519 00:24:50,333 --> 00:24:52,400 And I thought, "I think you're right." 520 00:24:52,400 --> 00:24:54,733 And as we were sort of struggling to understand 521 00:24:54,733 --> 00:24:58,766 the underlying roots of the bias in the algorithms, 522 00:24:58,766 --> 00:25:00,933 we asked ourselves a really simple question: 523 00:25:00,933 --> 00:25:05,100 are the algorithms today, are they doing better than humans? 524 00:25:05,100 --> 00:25:07,900 Because presumably, that's why you have these algorithms, 525 00:25:07,900 --> 00:25:11,566 is that they eliminate some of the bias and the prejudices, 526 00:25:11,566 --> 00:25:14,066 either implicit or explicit, in the human judgment. 527 00:25:14,066 --> 00:25:17,866 NARRATOR: To analyze COMPAS's risk assessment accuracy, 528 00:25:17,866 --> 00:25:20,866 they used the crowdsourcing platform Mechanical Turk. 529 00:25:20,866 --> 00:25:25,566 Their online study included 400 participants 530 00:25:25,566 --> 00:25:29,166 who evaluated 1,000 defendants. 531 00:25:29,166 --> 00:25:30,766 FARID: We asked participants to 532 00:25:30,766 --> 00:25:34,433 read a very short paragraph about an actual defendant. 533 00:25:34,433 --> 00:25:35,866 How old they were, 534 00:25:35,866 --> 00:25:37,866 whether they were male or female, 535 00:25:37,866 --> 00:25:40,400 what their prior juvenile conviction record was, 536 00:25:40,400 --> 00:25:42,433 and their prior adult conviction record. 537 00:25:42,433 --> 00:25:45,066 And, importantly, we didn't tell people their race. 538 00:25:45,066 --> 00:25:46,366 And then we ask a very simple question, 539 00:25:46,366 --> 00:25:48,066 "Do you think this person will commit a crime 540 00:25:48,066 --> 00:25:49,966 in the next two years?" 541 00:25:49,966 --> 00:25:50,933 Yes, no. 542 00:25:50,933 --> 00:25:53,933 And again, these are non-experts. 543 00:25:53,933 --> 00:25:55,600 These are people being paid 544 00:25:55,600 --> 00:25:58,433 a couple of bucks online to answer a survey. 545 00:25:58,433 --> 00:26:00,633 No criminal justice experience, 546 00:26:00,633 --> 00:26:02,500 don't know anything about the defendants. 547 00:26:02,500 --> 00:26:05,566 They were as accurate as the commercial software 548 00:26:05,566 --> 00:26:07,366 being used in the courts today, 549 00:26:07,366 --> 00:26:09,233 one particular piece of software. 550 00:26:09,233 --> 00:26:11,500 That was really surprising. 551 00:26:11,500 --> 00:26:13,766 We would've expected a little bit of improvement. 552 00:26:13,766 --> 00:26:14,966 After all, the algorithm has access 553 00:26:14,966 --> 00:26:17,833 to huge amounts of training data. 554 00:26:19,333 --> 00:26:22,200 NARRATOR: And something else puzzled the researchers. 555 00:26:22,200 --> 00:26:24,500 The MTurk workers' answers to questions 556 00:26:24,500 --> 00:26:27,833 about who would commit crimes in the future and who wouldn't 557 00:26:27,833 --> 00:26:30,733 showed a surprising pattern of racial bias, 558 00:26:30,733 --> 00:26:33,866 even though race wasn't indicated 559 00:26:33,866 --> 00:26:35,933 in any of the profiles. 560 00:26:35,933 --> 00:26:38,600 They were more likely to say a person of color 561 00:26:38,600 --> 00:26:41,333 will be high risk when they weren't, 562 00:26:41,333 --> 00:26:44,066 and they were more likely to say that a white person 563 00:26:44,066 --> 00:26:46,833 would not be high risk when in fact they were. 564 00:26:46,833 --> 00:26:49,700 And this made no sense to us at all. 565 00:26:49,700 --> 00:26:50,933 You don't know the race of the person. 566 00:26:50,933 --> 00:26:53,066 How is it possible that you're biased against them? 567 00:26:53,066 --> 00:26:54,833 (radios running in background) 568 00:26:54,833 --> 00:26:56,866 In this country, if you are a person of color, 569 00:26:56,866 --> 00:26:59,733 you are significantly more likely, historically, 570 00:26:59,733 --> 00:27:00,866 to be arrested, 571 00:27:00,866 --> 00:27:03,566 to be charged, and to be convicted of a crime. 572 00:27:03,566 --> 00:27:06,300 So in fact, prior convictions 573 00:27:06,300 --> 00:27:09,466 is a proxy for your race. 574 00:27:09,466 --> 00:27:11,500 Not a perfect proxy, but it is correlated, 575 00:27:11,500 --> 00:27:13,633 because of the historical inequities 576 00:27:13,633 --> 00:27:14,800 in the criminal justice system 577 00:27:14,800 --> 00:27:17,766 and policing in this country. 578 00:27:17,766 --> 00:27:19,166 (siren blaring) 579 00:27:19,166 --> 00:27:22,100 MAN: It's my car, bro, come on, what are y'all doing? 580 00:27:22,100 --> 00:27:23,700 Like, this, this is racial profiling. 581 00:27:23,700 --> 00:27:25,733 NARRATOR: Research indicates a Black person 582 00:27:25,733 --> 00:27:29,266 is five times more likely to be stopped without cause 583 00:27:29,266 --> 00:27:30,766 than a white person. 584 00:27:30,766 --> 00:27:33,133 Black people are at least twice as likely 585 00:27:33,133 --> 00:27:35,833 as white people to be arrested for drug offenses, 586 00:27:35,833 --> 00:27:37,933 even though Black and white people 587 00:27:37,933 --> 00:27:39,866 use drugs at the same rate. 588 00:27:39,866 --> 00:27:43,666 Black people are also about 12 times 589 00:27:43,666 --> 00:27:45,966 more likely to be wrongly convicted of drug crimes. 590 00:27:45,966 --> 00:27:51,700 FORREST: Historically, Black men have been arrested at higher levels 591 00:27:51,700 --> 00:27:53,000 than other populations. 592 00:27:53,000 --> 00:27:58,333 Therefore, the tool predicts that a Black man, for instance, 593 00:27:58,333 --> 00:28:00,666 will be arrested at a rate and recidivate at a rate 594 00:28:00,666 --> 00:28:04,266 that is higher than a white individual. 595 00:28:06,400 --> 00:28:09,133 FARID: And so what was happening is, you know, the big data, 596 00:28:09,133 --> 00:28:10,333 the big machine learning folks are saying, 597 00:28:10,333 --> 00:28:13,233 "Look, we're not giving it race-- it can't be racist." 598 00:28:13,233 --> 00:28:15,633 But that is spectacularly naive, 599 00:28:15,633 --> 00:28:18,433 because we know that other things correlate with race. 600 00:28:18,433 --> 00:28:19,600 In this case, number of prior convictions. 601 00:28:19,600 --> 00:28:23,366 And so when you train an algorithm on historical data, 602 00:28:23,366 --> 00:28:24,533 well, guess what. 603 00:28:24,533 --> 00:28:26,900 It's going to reproduce history-- of course it will. 604 00:28:28,500 --> 00:28:31,233 NARRATOR: Compounding the problem is the fact that 605 00:28:31,233 --> 00:28:32,766 predictive algorithms can't be put on the witness stand 606 00:28:32,766 --> 00:28:38,233 and interrogated about their decision-making processes. 607 00:28:38,233 --> 00:28:39,733 FORREST: Many defendants have had difficulty 608 00:28:39,733 --> 00:28:45,200 getting access to the underlying information 609 00:28:45,200 --> 00:28:47,300 that tells them, 610 00:28:47,300 --> 00:28:50,633 what was the data set that was used to assess me? 611 00:28:50,633 --> 00:28:53,300 What were the inputs that were used? 612 00:28:53,300 --> 00:28:55,200 How were those inputs weighted? 613 00:28:55,200 --> 00:28:57,633 So you've got what can be, these days, 614 00:28:57,633 --> 00:28:58,866 increasingly, a black box. 615 00:28:58,866 --> 00:29:02,433 A lack of transparency. 616 00:29:04,266 --> 00:29:05,666 NARRATOR: Some black box algorithms get their name 617 00:29:05,666 --> 00:29:08,500 from a lack of transparency about the code 618 00:29:08,500 --> 00:29:10,500 and data inputs they use, 619 00:29:10,500 --> 00:29:13,566 which can be deemed proprietary. 620 00:29:13,566 --> 00:29:17,700 But that's not the only kind of black box. 621 00:29:17,700 --> 00:29:21,066 A black box is any system which is so complicated 622 00:29:21,066 --> 00:29:24,166 that you can see what goes in and you can see what comes out, 623 00:29:24,166 --> 00:29:26,966 but it's impossible to understand 624 00:29:26,966 --> 00:29:29,266 what's going on inside it. 625 00:29:29,266 --> 00:29:32,233 All of those steps in the algorithm 626 00:29:32,233 --> 00:29:37,233 are hidden inside phenomenally complex math 627 00:29:37,233 --> 00:29:39,600 and processes. 628 00:29:39,600 --> 00:29:43,066 FARID: And I would argue that when you are using algorithms 629 00:29:43,066 --> 00:29:45,900 in mission-critical applications, 630 00:29:45,900 --> 00:29:47,066 like criminal justice system, 631 00:29:47,066 --> 00:29:49,433 we should not be deploying black box algorithms. 632 00:29:55,400 --> 00:29:58,366 NARRATOR: PredPol, like many predictive platforms, 633 00:29:58,366 --> 00:30:00,733 claimed a proven record for crime reduction. 634 00:30:00,733 --> 00:30:05,533 In 2015, PredPol published its algorithm 635 00:30:01,800 --> 00:30:05,533 In 2015, PredPol published its algorithm 636 00:30:05,533 --> 00:30:08,500 in a peer-reviewed journal. 637 00:30:08,500 --> 00:30:11,133 William Isaac and Kristian Lum, 638 00:30:11,133 --> 00:30:13,000 research scientists who investigate 639 00:30:13,000 --> 00:30:14,900 predictive policing platforms, 640 00:30:14,900 --> 00:30:17,366 analyzed the algorithm. 641 00:30:17,366 --> 00:30:19,633 ISAAC: We just kind of saw the algorithm 642 00:30:19,633 --> 00:30:21,266 as going back to the same one or two blocks 643 00:30:21,266 --> 00:30:24,433 every single time. 644 00:30:26,200 --> 00:30:27,766 And that's kind of strange, 645 00:30:27,766 --> 00:30:30,833 because if you had a truly predictive policing system, 646 00:30:30,833 --> 00:30:33,366 you wouldn't necessarily see it going to the same locations 647 00:30:33,366 --> 00:30:35,500 over and over again. 648 00:30:38,333 --> 00:30:40,266 NARRATOR: For their experiment, 649 00:30:40,266 --> 00:30:41,633 Isaac and Lum used a different data set, 650 00:30:41,633 --> 00:30:47,466 public health data, to map illicit drug use in Oakland. 651 00:30:47,466 --> 00:30:50,933 ISAAC: So, a good chunk of the city was kind of 652 00:30:50,933 --> 00:30:53,666 evenly distributed in terms of where 653 00:30:53,666 --> 00:30:55,400 potential illicit drug use might be. 654 00:30:55,400 --> 00:30:59,333 But the police predictions were clustering around areas 655 00:30:59,333 --> 00:31:02,133 where police had, you know, 656 00:31:02,133 --> 00:31:04,233 historically found incidents of illicit drug use. 657 00:31:04,233 --> 00:31:08,066 Specifically, we saw significant numbers of neighborhoods 658 00:31:08,066 --> 00:31:10,166 that were predominantly non-white and lower-income 659 00:31:10,166 --> 00:31:16,133 being deliberate targets of the predictions. 660 00:31:16,133 --> 00:31:19,466 NARRATOR: Even though illicit drug use was a citywide problem, 661 00:31:19,466 --> 00:31:21,133 the algorithm focused its predictions 662 00:31:21,133 --> 00:31:25,300 on low-income neighborhoods and communities of color. 663 00:31:26,466 --> 00:31:29,733 ISAAC: The reason why is actually really important. 664 00:31:29,733 --> 00:31:31,000 It's very hard to divorce 665 00:31:31,000 --> 00:31:33,066 these predictions from those histories 666 00:31:33,066 --> 00:31:35,933 and legacies of over-policing. 667 00:31:37,133 --> 00:31:41,333 As a result of that, they manifest themselves in the data. 668 00:31:41,333 --> 00:31:43,566 NARRATOR: In an area where there is more police presence, 669 00:31:43,566 --> 00:31:46,166 more crime is uncovered. 670 00:31:47,466 --> 00:31:49,766 The crime data indicates to the algorithm 671 00:31:49,766 --> 00:31:52,433 that the heavily policed neighborhood 672 00:31:52,433 --> 00:31:55,200 is where future crime will be found, 673 00:31:55,200 --> 00:31:56,466 even though there may be other neighborhoods 674 00:31:56,466 --> 00:32:02,100 where crimes are being committed at the same or higher rate. 675 00:32:03,300 --> 00:32:05,733 ISAAC: Every new prediction that you generate 676 00:32:05,733 --> 00:32:08,100 is going to be increasingly dependent 677 00:32:08,100 --> 00:32:11,466 on the behavior of the algorithm in the past. 678 00:32:11,466 --> 00:32:13,366 So, you know, if you go ten days, 20 days, 679 00:32:13,366 --> 00:32:15,833 30 days into the future, right, after using an algorithm, 680 00:32:15,833 --> 00:32:19,366 all of those predictions have changed the behavior 681 00:32:19,366 --> 00:32:20,700 of the police department 682 00:32:20,700 --> 00:32:23,733 and are now being folded back into the next day's prediction. 683 00:32:26,166 --> 00:32:27,566 NARRATOR: The result can be a feedback loop 684 00:32:27,566 --> 00:32:32,000 that reinforces historical policing practices. 685 00:32:34,433 --> 00:32:35,866 SWEENEY: All of these different types 686 00:32:35,866 --> 00:32:38,200 of machine learning algorithms are all trying to help us 687 00:32:38,200 --> 00:32:41,066 figure out, are there some patterns in this data? 688 00:32:41,066 --> 00:32:43,733 It's up to us to then figure out, 689 00:32:43,733 --> 00:32:46,233 are those legitimate patterns, do they, 690 00:32:46,233 --> 00:32:47,400 are they useful patterns? 691 00:32:47,400 --> 00:32:49,100 Because the computer has no idea. 692 00:32:49,100 --> 00:32:51,366 It didn't make a logical association. 693 00:32:51,366 --> 00:32:54,800 It just made it, made a correlation. 694 00:32:56,066 --> 00:32:59,333 MING: My favorite definition of artificial intelligence 695 00:32:59,333 --> 00:33:02,800 is, it's any autonomous system 696 00:33:02,800 --> 00:33:04,366 that can make decisions under uncertainty. 697 00:33:04,366 --> 00:33:09,900 You can't make decisions under uncertainty without bias. 698 00:33:11,300 --> 00:33:14,766 In fact, it's impossible to escape from having bias. 699 00:33:14,766 --> 00:33:16,466 It's a mathematical reality 700 00:33:16,466 --> 00:33:21,333 about any intelligent system, even us. 701 00:33:21,333 --> 00:33:23,033 (siren blaring in distance) 702 00:33:23,033 --> 00:33:26,166 NARRATOR: And even if the goal is to get rid of prejudice, 703 00:33:26,166 --> 00:33:31,700 bias in the historical data can undermine that objective. 704 00:33:33,733 --> 00:33:35,500 Amazon discovered this 705 00:33:35,500 --> 00:33:38,066 when they began a search for top talent 706 00:33:38,066 --> 00:33:40,733 with a hiring algorithm whose training data 707 00:33:40,733 --> 00:33:45,066 depended on hiring successes from the past. 708 00:33:45,066 --> 00:33:49,333 MING: Amazon, somewhat famously within the A.I. industry, 709 00:33:49,333 --> 00:33:54,200 they tried to build a hiring algorithm. 710 00:33:54,200 --> 00:33:56,866 They had a massive data set. 711 00:33:56,866 --> 00:33:58,566 They had all the right answers, 712 00:33:58,566 --> 00:34:00,966 because they knew literally who got hired 713 00:34:00,966 --> 00:34:04,033 and who got that promotion in their first year. 714 00:34:04,033 --> 00:34:05,766 (typing) 715 00:34:05,766 --> 00:34:07,866 NARRATOR: The company created multiple models 716 00:34:07,866 --> 00:34:10,133 to review past candidates' risumis 717 00:34:10,133 --> 00:34:15,833 and identify some 50,000 key terms. 718 00:34:15,833 --> 00:34:18,466 MING: What Amazon actually wanted to achieve 719 00:34:18,466 --> 00:34:21,100 was to diversify their hiring. 720 00:34:21,100 --> 00:34:24,766 Amazon, just like every other tech company, 721 00:34:24,766 --> 00:34:25,833 and a lot of other companies, as well, 722 00:34:25,833 --> 00:34:29,966 has enormous bias built into its hiring history. 723 00:34:29,966 --> 00:34:35,533 It was always biased, strongly biased, in favor of men, 724 00:34:35,533 --> 00:34:37,833 in favor, generally, 725 00:34:37,833 --> 00:34:40,766 of white or sometimes Asian men. 726 00:34:40,766 --> 00:34:44,033 Well, they went and built a hiring algorithm. 727 00:34:44,033 --> 00:34:45,366 And sure enough, this thing was 728 00:34:45,366 --> 00:34:50,000 the most sexist recruiter you could imagine. 729 00:34:50,000 --> 00:34:52,633 If you said the word "women's" in your risumi, 730 00:34:52,633 --> 00:34:53,966 then it wouldn't hire you. 731 00:34:53,966 --> 00:34:54,966 If you went to a women's college, 732 00:34:54,966 --> 00:34:57,833 it didn't want to hire you. 733 00:34:57,833 --> 00:35:00,400 So they take out all the gender markers, 734 00:35:00,400 --> 00:35:02,266 and all of the women's colleges-- 735 00:35:02,266 --> 00:35:04,300 all the things that explicitly says, 736 00:35:04,300 --> 00:35:05,600 "This is a man," and, "This is a woman," 737 00:35:05,600 --> 00:35:08,666 or even the ones that, obviously, implicitly say it. 738 00:35:08,666 --> 00:35:11,100 So they did that. 739 00:35:11,100 --> 00:35:13,666 And then they trained up their new deep neural network 740 00:35:13,666 --> 00:35:16,166 to decide who Amazon would hire. 741 00:35:16,166 --> 00:35:18,400 And it did something amazing. 742 00:35:18,400 --> 00:35:19,633 It did something no human could do. 743 00:35:19,633 --> 00:35:22,933 It figured out who was a woman and it wouldn't hire them. 744 00:35:24,566 --> 00:35:26,266 It was able to look through 745 00:35:26,266 --> 00:35:29,566 all of the correlations that existed 746 00:35:29,566 --> 00:35:30,766 in that massive data set 747 00:35:30,766 --> 00:35:35,633 and figure out which ones most strongly correlated 748 00:35:35,633 --> 00:35:37,433 with someone getting a promotion. 749 00:35:37,433 --> 00:35:40,866 And the single biggest correlate 750 00:35:40,866 --> 00:35:42,666 of getting a promotion was being a man. 751 00:35:42,666 --> 00:35:46,733 And it figured those patterns out and didn't hire women. 752 00:35:47,966 --> 00:35:54,033 NARRATOR: Amazon abandoned its hiring algorithm in 2017. 753 00:35:54,033 --> 00:35:55,933 Remember the way machine learning works, right? 754 00:35:55,933 --> 00:35:57,900 It's like a student who doesn't really understand 755 00:35:57,900 --> 00:35:59,166 the material in the class. 756 00:35:59,166 --> 00:36:03,100 They got a bunch of questions, they got a bunch of answers. 757 00:36:00,433 --> 00:36:03,100 They got a bunch of questions, they got a bunch of answers. 758 00:36:03,100 --> 00:36:04,600 And now they're trying to pattern match 759 00:36:04,600 --> 00:36:06,166 for a new question and say, "Oh, wait. 760 00:36:06,166 --> 00:36:08,066 "Let me find an answer that looks pretty much 761 00:36:08,066 --> 00:36:09,833 like the questions and answers I saw before." 762 00:36:09,833 --> 00:36:13,066 The algorithm only worked because someone has said, 763 00:36:13,066 --> 00:36:16,600 "Oh, this person whose data you have, 764 00:36:16,600 --> 00:36:18,166 "they were a good employee. 765 00:36:18,166 --> 00:36:19,400 This other person was a bad employee," or, "This person 766 00:36:19,400 --> 00:36:22,466 performed well," or, "This person did not perform well." 767 00:36:24,966 --> 00:36:27,200 O'NEIL: Because algorithms don't just look for patterns, 768 00:36:27,200 --> 00:36:29,300 they look for patterns of success, however it's defined. 769 00:36:29,300 --> 00:36:32,800 But the definition of success is really critically important 770 00:36:32,800 --> 00:36:35,800 to what that end up, ends up being. 771 00:36:35,800 --> 00:36:37,733 And a lot of, a lot of opinion 772 00:36:37,733 --> 00:36:40,733 is embedded in, what, what does success look like? 773 00:36:43,066 --> 00:36:44,600 NARRATOR: In the case of algorithms, 774 00:36:44,600 --> 00:36:47,500 human choices play a critical role. 775 00:36:47,500 --> 00:36:50,166 O'NEIL: The data itself was curated. 776 00:36:50,166 --> 00:36:52,600 Someone decided what data to collect. 777 00:36:52,600 --> 00:36:55,700 Somebody decided what data was not relevant, right? 778 00:36:55,700 --> 00:36:58,233 And they don't exclude it necessarily 779 00:36:58,233 --> 00:36:59,833 intentionally-- they could be blind spots. 780 00:36:59,833 --> 00:37:03,400 NARRATOR: The need to identify such oversights 781 00:37:03,400 --> 00:37:04,400 becomes more urgent 782 00:37:04,400 --> 00:37:09,566 as technology takes on more decision making. 783 00:37:09,566 --> 00:37:11,966 ♪ ♪ 784 00:37:11,966 --> 00:37:15,200 Consider facial recognition technology, 785 00:37:15,200 --> 00:37:16,366 used by law enforcement 786 00:37:16,366 --> 00:37:19,100 in cities around the world for surveillance. 787 00:37:22,933 --> 00:37:24,633 In Detroit, 2018, 788 00:37:24,633 --> 00:37:28,466 law enforcement looked to facial recognition technology 789 00:37:28,466 --> 00:37:29,866 when $3,800 worth of watches 790 00:37:29,866 --> 00:37:34,433 were stolen from an upscale boutique. 791 00:37:35,766 --> 00:37:38,066 Police ran a still frame from the shop's surveillance video 792 00:37:38,066 --> 00:37:42,833 through their facial recognition system to find a match. 793 00:37:42,833 --> 00:37:46,233 How do I turn a face into numbers 794 00:37:46,233 --> 00:37:47,866 that equations can act with? 795 00:37:47,866 --> 00:37:50,666 You turn the individual pixels in the picture of that face 796 00:37:50,666 --> 00:37:53,933 into values. 797 00:37:56,133 --> 00:37:59,466 What it's really looking for are complex patterns 798 00:37:59,466 --> 00:38:01,733 across those pixels. 799 00:38:01,733 --> 00:38:04,700 The sequence of taking a pattern of numbers 800 00:38:04,700 --> 00:38:06,666 and transforming it 801 00:38:06,666 --> 00:38:08,366 into little edges and angles, 802 00:38:08,366 --> 00:38:14,100 then transforming that into eyes and cheekbones and mustaches. 803 00:38:15,633 --> 00:38:16,766 NARRATOR: To find that match, 804 00:38:16,766 --> 00:38:21,900 the system can be trained on billions of photographs. 805 00:38:23,533 --> 00:38:26,100 Facial recognition uses a class of machine learning 806 00:38:26,100 --> 00:38:28,433 called deep learning. 807 00:38:28,433 --> 00:38:30,666 The models built by deep learning techniques 808 00:38:30,666 --> 00:38:35,833 are called neural networks. 809 00:38:35,833 --> 00:38:37,033 VENKATASUBRAMANIAN: A neural network 810 00:38:37,033 --> 00:38:39,266 is, you know, stylized as, you know, trying to model 811 00:38:39,266 --> 00:38:41,800 how neural pathways work in the brain. 812 00:38:43,566 --> 00:38:44,566 You can think of a neural network 813 00:38:44,566 --> 00:38:50,033 as a collection of neurons. 814 00:38:50,033 --> 00:38:51,633 So you put some values into a neuron, 815 00:38:51,633 --> 00:38:55,900 and if they're, sufficiently, they add up to some number, 816 00:38:55,900 --> 00:38:57,233 or they cross some threshold, 817 00:38:57,233 --> 00:39:01,200 this one will fire and send off a new number to the next neuron. 818 00:39:01,200 --> 00:39:03,666 NARRATOR: At a certain threshold, 819 00:39:03,666 --> 00:39:05,966 the neuron will fire to the next neuron. 820 00:39:05,966 --> 00:39:10,666 If it's below the threshold, the neuron doesn't fire. 821 00:39:10,666 --> 00:39:12,766 This process repeats and repeats 822 00:39:12,766 --> 00:39:14,900 across hundreds, possibly thousands of layers, 823 00:39:14,900 --> 00:39:18,066 making connections like the neurons in our brain. 824 00:39:18,066 --> 00:39:19,833 ♪ ♪ 825 00:39:19,833 --> 00:39:24,133 The output is a predictive match. 826 00:39:27,266 --> 00:39:28,433 Based on a facial recognition match, 827 00:39:28,433 --> 00:39:32,466 in January 2020, the police arrested Robert Williams 828 00:39:32,466 --> 00:39:35,000 for the theft of the watches. 829 00:39:37,033 --> 00:39:38,833 The next day, he was released. 830 00:39:38,833 --> 00:39:42,733 Not only did Williams have an alibi, 831 00:39:42,733 --> 00:39:46,233 but it wasn't his face. 832 00:39:46,233 --> 00:39:49,066 MING: To be very blunt about it, these algorithms are probably 833 00:39:49,066 --> 00:39:51,666 dramatically over-trained on white faces. 834 00:39:51,666 --> 00:39:57,033 ♪ ♪ 835 00:39:57,033 --> 00:39:59,300 So, of course, algorithms that start out bad 836 00:39:59,300 --> 00:40:01,133 can be improved, in general. 837 00:40:01,133 --> 00:40:04,400 The Gender Shades project found that 838 00:40:04,400 --> 00:40:07,400 certain facial recognition technology, 839 00:40:07,400 --> 00:40:09,166 when they actually tested it on Black women, 840 00:40:09,166 --> 00:40:15,233 it was 65% accurate, whereas for white men, it was 99% accurate. 841 00:40:16,600 --> 00:40:19,733 How did they improve it? Because they did. 842 00:40:19,733 --> 00:40:21,566 They built an algorithm 843 00:40:21,566 --> 00:40:23,633 that was trained on more diverse data. 844 00:40:23,633 --> 00:40:26,433 So I don't think it's completely a lost cause 845 00:40:26,433 --> 00:40:30,233 to improve algorithms to be better. 846 00:40:31,700 --> 00:40:36,300 MAN (in ad voiceover): I used to think my job was all about arrests. 847 00:40:36,300 --> 00:40:38,033 NEDY: LESLIE KEN There was a commercial a few years ago 848 00:40:38,033 --> 00:40:40,633 that showed a police officer going to a gas station 849 00:40:40,633 --> 00:40:42,700 and then waiting for the criminal to show up. 850 00:40:42,700 --> 00:40:44,233 MAN: We analyze crime data, 851 00:40:44,233 --> 00:40:46,300 spot patterns, 852 00:40:46,300 --> 00:40:49,166 and figure out where to send patrols. 853 00:40:49,166 --> 00:40:51,433 They said, "Well, our algorithm will tell you exactly 854 00:40:51,433 --> 00:40:53,466 where the crime, the next crime is going to take place." 855 00:40:53,466 --> 00:40:56,266 Well, that's just silly, uh, it, that's not how it works. 856 00:40:56,266 --> 00:40:59,566 MAN: By stopping it before it happens. 857 00:40:59,566 --> 00:41:00,433 (sighs) 858 00:41:00,433 --> 00:41:03,100 MAN: Let's build a smarter planet. 859 00:41:08,233 --> 00:41:10,566 ♪ ♪ 860 00:41:10,566 --> 00:41:12,966 JOEL CAPLAN: Understanding what it is about these places 861 00:41:12,966 --> 00:41:15,666 that enable crime problems 862 00:41:15,666 --> 00:41:18,400 to emerge and/or persist. 863 00:41:18,400 --> 00:41:21,433 NARRATOR: At Rutgers University, 864 00:41:21,433 --> 00:41:22,666 the researchers who invented 865 00:41:22,666 --> 00:41:26,366 the crime mapping platform called Risk Terrain Modeling, 866 00:41:26,366 --> 00:41:28,366 or RTM, 867 00:41:28,366 --> 00:41:30,866 bristle at the term "predictive policing." 868 00:41:30,866 --> 00:41:36,100 CAPLAN (voiceover): We don't want to predict, we want to prevent. 869 00:41:37,333 --> 00:41:40,000 I worked as a police officer a long time ago, 870 00:41:40,000 --> 00:41:41,266 in the early 2000s. 871 00:41:41,266 --> 00:41:46,866 Police collected data for as long as police have existed. 872 00:41:46,866 --> 00:41:50,100 Now there's a greater recognition 873 00:41:50,100 --> 00:41:52,600 that data can have value. 874 00:41:52,600 --> 00:41:55,300 But it's not just about the data. 875 00:41:55,300 --> 00:41:56,766 It's about how you analyze it, how you use those results. 876 00:41:56,766 --> 00:42:00,400 There's only two data sets that risk terrain modeling uses. 877 00:42:00,400 --> 00:42:02,666 These data sets are local, 878 00:42:02,666 --> 00:42:07,866 current information about crime incidents within a given area 879 00:42:07,866 --> 00:42:11,033 and information about environmental features 880 00:42:11,033 --> 00:42:12,566 that exist in that landscape, 881 00:42:12,566 --> 00:42:13,733 such as bars, fast food restaurants, 882 00:42:13,733 --> 00:42:17,900 convenience stores, schools, parks, alleyways. 883 00:42:19,400 --> 00:42:20,866 KENNEDY: The algorithm is basically 884 00:42:20,866 --> 00:42:24,100 the relationship between these environmental features 885 00:42:24,100 --> 00:42:27,066 and the, the outcome data, which in this case is crime. 886 00:42:27,066 --> 00:42:28,966 The algorithm provides you with a map 887 00:42:28,966 --> 00:42:32,000 of the distribution of the risk values. 888 00:42:33,266 --> 00:42:35,600 ALEJANDRO GIME NEZ-SANTANA: This is the highest-risk area, 889 00:42:35,600 --> 00:42:37,000 on this commercial corridor on Bloomfield Avenue. 890 00:42:37,000 --> 00:42:41,733 NARRATOR: But the algorithm isn't intended for use just by police. 891 00:42:41,733 --> 00:42:44,866 Criminologist Alejandro Giminez-Santana 892 00:42:44,866 --> 00:42:47,233 leads the Newark Public Safety Collaborative, 893 00:42:47,233 --> 00:42:51,033 a collection of 40 community organizations. 894 00:42:51,033 --> 00:42:53,266 They use RTM as a diagnostic tool 895 00:42:53,266 --> 00:42:58,266 to understand not just where crime may happen next, 896 00:42:58,266 --> 00:43:00,833 but why. 897 00:43:00,833 --> 00:43:03,466 GIME NEZ-SANTANA: Through RTM, we identify this commercial corridor 898 00:43:03,466 --> 00:43:05,000 on Bloomfield Avenue, which is where we are right now, 899 00:43:05,000 --> 00:43:08,733 as a risky area for auto theft due to car idling. 900 00:43:08,733 --> 00:43:09,866 So why is this space 901 00:43:09,866 --> 00:43:12,633 particularly problematic when it comes to auto theft? 902 00:43:14,166 --> 00:43:16,033 One is because we're in a commercial corridor, 903 00:43:16,033 --> 00:43:17,533 where there's high density of people 904 00:43:17,533 --> 00:43:20,900 who go to the beauty salon or to go to a restaurant. 905 00:43:20,900 --> 00:43:22,533 Uber delivery and Uber Eats, 906 00:43:22,533 --> 00:43:24,766 delivery people who come to grab orders that also, 907 00:43:24,766 --> 00:43:26,833 and leave their cars running 908 00:43:26,833 --> 00:43:28,433 create the conditions for this crime 909 00:43:28,433 --> 00:43:31,233 to be concentrated in this particular area. 910 00:43:31,233 --> 00:43:33,166 What the data showed us was, 911 00:43:33,166 --> 00:43:35,966 there was a tremendous rise in auto vehicle thefts. 912 00:43:35,966 --> 00:43:39,233 But we convinced the police department 913 00:43:39,233 --> 00:43:41,500 to take a more social service approach. 914 00:43:41,500 --> 00:43:43,966 NARRATOR: Community organizers convinced police 915 00:43:43,966 --> 00:43:46,833 not to ticket idling cars, 916 00:43:46,833 --> 00:43:48,366 and let organizers create 917 00:43:48,366 --> 00:43:51,666 an effective public awareness poster campaign instead. 918 00:43:51,666 --> 00:43:54,166 And we put it out to the Newark students 919 00:43:54,166 --> 00:43:57,166 to submit in this flyer campaign, 920 00:43:57,166 --> 00:44:00,200 and have their artwork on the actual flyer. 921 00:44:00,200 --> 00:44:02,866 GIME NEZ-SANTANA: As you can see, this is the commercial corridor 922 00:44:02,866 --> 00:44:04,533 on Bloomfield Avenue. 923 00:44:04,533 --> 00:44:05,766 The site score shows a six, 924 00:44:05,766 --> 00:44:07,333 which means that we are at the highest risk of auto theft 925 00:44:07,333 --> 00:44:09,533 in this particular location. 926 00:44:09,533 --> 00:44:11,100 And as I move closer to the end 927 00:44:11,100 --> 00:44:14,800 of the commercial corridor, the site risk score is coming down. 928 00:44:14,800 --> 00:44:17,066 NARRATOR: This is the first time in Newark 929 00:44:17,066 --> 00:44:19,500 that police data for crime occurrences 930 00:44:19,500 --> 00:44:23,300 have been shared widely with community members. 931 00:44:23,300 --> 00:44:26,233 ELVIS PEREZ: The kind of data we share is incident-related data-- 932 00:44:26,233 --> 00:44:29,333 sort of time, location, that sort of information. 933 00:44:29,333 --> 00:44:31,466 We don't discuss any private arrest information. 934 00:44:31,466 --> 00:44:35,533 We're trying to avoid a crime. 935 00:44:37,900 --> 00:44:39,633 NARRATOR: In 2019, 936 00:44:39,633 --> 00:44:42,600 Caplan and Kennedy formed a start-up at Rutgers 937 00:44:42,600 --> 00:44:45,533 to meet the rising demand for their technology. 938 00:44:45,533 --> 00:44:49,133 Despite the many possible applications for RTM, 939 00:44:49,133 --> 00:44:51,866 from tracking public health issues 940 00:44:51,866 --> 00:44:53,566 to understanding vehicle crashes, 941 00:44:53,566 --> 00:44:59,733 law enforcement continues to be its principal application. 942 00:44:59,733 --> 00:45:01,433 Like any other technology, 943 00:45:01,433 --> 00:45:04,333 risk terrain modeling can be used for the public good 944 00:45:04,333 --> 00:45:06,433 when people use it wisely. 945 00:45:08,833 --> 00:45:14,866 ♪ ♪ 946 00:45:14,866 --> 00:45:17,833 We as academics and scientists, we actually need to be critical, 947 00:45:17,833 --> 00:45:20,333 because it could be the best model in the world, 948 00:45:20,333 --> 00:45:21,733 it could be very good predictions, 949 00:45:21,733 --> 00:45:22,900 but how you use those predictions 950 00:45:22,900 --> 00:45:24,766 matters, in some ways, even more. 951 00:45:24,766 --> 00:45:26,100 REPORTER: The police department 952 00:45:26,100 --> 00:45:28,800 had revised the SSL numerous times... 953 00:45:28,800 --> 00:45:30,100 NARRATOR: In 2019, 954 00:45:30,100 --> 00:45:34,666 Chicago's inspector general contracted the RAND Corporation 955 00:45:34,666 --> 00:45:38,033 to evaluate the Strategic Subject List, 956 00:45:38,033 --> 00:45:39,566 the predictive policing platform 957 00:45:39,566 --> 00:45:45,733 that incorporated Papachristos's research on social networks. 958 00:45:45,733 --> 00:45:47,400 PAPACHRISTOS: I never wanted to go down this path 959 00:45:47,400 --> 00:45:51,100 of who was the person that was the potential suspect. 960 00:45:51,100 --> 00:45:53,066 And that problem is not necessarily 961 00:45:53,066 --> 00:45:55,033 with the statistical model, it's the fact that someone 962 00:45:55,033 --> 00:45:57,166 took victim and made him an offender. 963 00:45:57,166 --> 00:46:00,133 You've criminalized someone who is at risk, 964 00:46:00,133 --> 00:46:01,666 that you should be prioritizing saving their life. 965 00:46:01,666 --> 00:46:07,266 NARRATOR: It turned out that some 400,000 people were included on the SSL. 966 00:46:07,266 --> 00:46:13,600 Of those, 77% were Black or Hispanic. 967 00:46:15,133 --> 00:46:18,100 The inspector general's audit revealed 968 00:46:18,100 --> 00:46:20,766 that SSL scores were unreliable. 969 00:46:20,766 --> 00:46:23,633 The Rand Corporation found the program had no impact 970 00:46:23,633 --> 00:46:28,233 on homicide or victimization rates. 971 00:46:28,233 --> 00:46:31,366 (protesters chanting) 972 00:46:31,366 --> 00:46:34,300 NARRATOR: The program was shut down. 973 00:46:37,166 --> 00:46:38,266 But data collection continues 974 00:46:38,266 --> 00:46:41,666 to be essential to law enforcement. 975 00:46:41,666 --> 00:46:45,300 ♪ ♪ 976 00:46:45,300 --> 00:46:47,933 O'NEIL: There are things about us that we might not even 977 00:46:47,933 --> 00:46:51,300 be aware of that are sort of being collected 978 00:46:51,300 --> 00:46:52,933 by the data brokers 979 00:46:52,933 --> 00:46:55,066 and will be held against us for the rest of our lives-- 980 00:46:55,066 --> 00:46:58,600 held against people forever, digitally. 981 00:46:59,733 --> 00:47:03,133 NARRATOR: Data is produced and collected. 982 00:47:03,133 --> 00:47:05,500 But is it accurate? 983 00:47:05,500 --> 00:47:08,500 And can the data be properly vetted? 984 00:47:08,500 --> 00:47:09,633 PAPACHRISTOS: And that was one of the critiques 985 00:47:09,633 --> 00:47:12,866 of not just the Strategic Subjects List, 986 00:47:12,866 --> 00:47:14,166 but the gang database in Chicago. 987 00:47:14,166 --> 00:47:18,066 Any data source that treats data as a stagnant, forever condition 988 00:47:18,066 --> 00:47:20,366 is a problem. 989 00:47:23,400 --> 00:47:25,433 WOMAN: The gang database has been around for four years. 990 00:47:25,433 --> 00:47:28,266 It'll be five in January. 991 00:47:28,266 --> 00:47:30,900 We want to get rid of surveillance 992 00:47:30,900 --> 00:47:33,000 in Black and brown communities. 993 00:47:33,000 --> 00:47:34,833 BENJAMIN: In places like Chicago, 994 00:47:34,833 --> 00:47:37,366 in places like L.A., where I grew up, 995 00:47:37,366 --> 00:47:40,666 there are gang databases with tens of thousands 996 00:47:40,666 --> 00:47:43,766 of people listed, their names listed in these databases. 997 00:47:43,766 --> 00:47:45,900 Just by simply having a certain name 998 00:47:45,900 --> 00:47:47,900 and coming from a certain ZIP code 999 00:47:47,900 --> 00:47:50,500 could land you in these databases. 1000 00:47:50,500 --> 00:47:53,033 Do you all feel safe in Chicago? 1001 00:47:53,033 --> 00:47:54,333 DARRELL DACRES: The cops pulled up out of nowhere. 1002 00:47:54,333 --> 00:47:59,233 Didn't ask any questions, just immediately start beating on us. 1003 00:47:59,233 --> 00:48:00,466 And basically were saying, like, 1004 00:48:00,466 --> 00:48:02,833 what are, what are we doing over here, you know, like, 1005 00:48:02,833 --> 00:48:04,500 in this, in this gangbang area? 1006 00:48:04,500 --> 00:48:07,033 I was already labeled as a gangbanger 1007 00:48:07,033 --> 00:48:08,866 from that area because of where I lived. 1008 00:48:08,866 --> 00:48:10,733 I, I just happened to live there. 1009 00:48:12,733 --> 00:48:13,900 NARRATOR: The Chicago gang database 1010 00:48:13,900 --> 00:48:17,700 is shared with hundreds of law enforcement agencies. 1011 00:48:17,700 --> 00:48:19,533 Even if someone is wrongly included, 1012 00:48:19,533 --> 00:48:24,733 there is no mechanism to have their name removed. 1013 00:48:24,733 --> 00:48:27,166 If you try to apply for an apartment, 1014 00:48:27,166 --> 00:48:29,633 or if you try to apply for a job or a college, 1015 00:48:29,633 --> 00:48:34,833 or even in a, um, a house, it will show 1016 00:48:34,833 --> 00:48:37,800 that you are in this record of a gang database. 1017 00:48:37,800 --> 00:48:39,600 I was arrested for peacefully protesting. 1018 00:48:39,600 --> 00:48:42,100 And they told me that, "Well, 1019 00:48:42,100 --> 00:48:43,966 you're in the gang database." 1020 00:48:43,966 --> 00:48:46,333 But I was never in no gang. 1021 00:48:46,333 --> 00:48:47,466 MAN: Because you have a gang designation, 1022 00:48:47,466 --> 00:48:49,500 you're a security threat group, 1023 00:48:49,500 --> 00:48:51,200 right? 1024 00:48:51,200 --> 00:48:52,500 NARRATOR: Researchers and activists 1025 00:48:52,500 --> 00:48:54,900 have been instrumental in dismantling 1026 00:48:54,900 --> 00:48:57,366 some of these systems. 1027 00:48:57,366 --> 00:48:58,466 And so we continue to push back. 1028 00:48:58,466 --> 00:48:59,333 I mean, the fight is not going to finish 1029 00:48:59,333 --> 00:49:01,200 until we get rid of the database. 1030 00:49:01,200 --> 00:49:02,600 ♪ ♪ 1031 00:49:02,600 --> 00:49:05,233 FERGUSON: I think what we're seeing now 1032 00:49:05,233 --> 00:49:07,766 is not a move away from data. 1033 00:49:07,766 --> 00:49:11,300 It's just a move away from this term "predictive policing." 1034 00:49:11,300 --> 00:49:14,233 But we're seeing big companies, 1035 00:49:14,233 --> 00:49:15,666 big tech, enter the policing space. 1036 00:49:15,666 --> 00:49:19,566 We're seeing the reality that almost all policing now 1037 00:49:19,566 --> 00:49:21,900 is data-driven. 1038 00:49:21,900 --> 00:49:23,633 You're seeing these same police departments 1039 00:49:23,633 --> 00:49:25,266 invest heavily in the technology, 1040 00:49:25,266 --> 00:49:28,400 including other forms of surveillance technology, 1041 00:49:28,400 --> 00:49:30,600 including other forms of databases 1042 00:49:30,600 --> 00:49:32,433 to sort of manage policing. 1043 00:49:32,433 --> 00:49:33,833 (chanting): We want you out! 1044 00:49:33,833 --> 00:49:37,366 NARRATOR: More citizens are calling for regulations 1045 00:49:37,366 --> 00:49:38,566 to audit algorithms 1046 00:49:38,566 --> 00:49:42,200 and guarantee they're accomplishing what they promise 1047 00:49:42,200 --> 00:49:43,433 without harm. 1048 00:49:43,433 --> 00:49:46,900 BRAYNE: Ironically, there is very little data 1049 00:49:46,900 --> 00:49:49,300 on police use of big data. 1050 00:49:49,300 --> 00:49:52,333 And there is no systematic data 1051 00:49:52,333 --> 00:49:54,766 at a national level on how these tools are used. 1052 00:49:54,766 --> 00:49:57,800 The deployment of these tools 1053 00:49:57,800 --> 00:50:00,100 so far outpaces 1054 00:50:00,100 --> 00:50:02,633 legal and regulatory responses to them. 1055 00:50:02,633 --> 00:50:03,600 What you have happening 1056 00:50:03,600 --> 00:50:06,633 is essentially this regulatory Wild West. 1057 00:50:08,000 --> 00:50:09,333 O'NEIL: And we're, like, "Well, it's an algorithm, 1058 00:50:09,333 --> 00:50:11,366 let's, let's just throw it into production." 1059 00:50:11,366 --> 00:50:13,066 Without testing it to whether 1060 00:50:13,066 --> 00:50:18,966 it "works" sufficiently, um, at all. 1061 00:50:20,333 --> 00:50:22,833 NARRATOR: Multiple requests for comment 1062 00:50:22,833 --> 00:50:23,966 from police agencies and law enforcement officials 1063 00:50:23,966 --> 00:50:27,766 in several cities, including Chicago and New York, 1064 00:50:27,766 --> 00:50:31,933 were either declined or went unanswered. 1065 00:50:31,933 --> 00:50:36,966 ♪ ♪ 1066 00:50:36,966 --> 00:50:40,533 Artificial intelligence must serve people, 1067 00:50:40,533 --> 00:50:42,666 and therefore artificial intelligence 1068 00:50:42,666 --> 00:50:44,333 must always comply with people's rights. 1069 00:50:44,333 --> 00:50:49,533 NARRATOR: The European Union is preparing to implement legislation 1070 00:50:49,533 --> 00:50:51,433 to regulate artificial intelligence. 1071 00:50:51,433 --> 00:50:56,833 In 2021, bills to regulate data science algorithms 1072 00:50:56,833 --> 00:50:59,666 were introduced in 17 states, 1073 00:50:59,666 --> 00:51:03,500 and enacted in Alabama, Colorado, 1074 00:51:03,500 --> 00:51:05,466 Illinois, and Mississippi. 1075 00:51:05,466 --> 00:51:07,666 SWEENEY: If you look carefully on electrical devices, 1076 00:51:07,666 --> 00:51:10,766 you'll see "U.L.," for Underwriters Laboratory. 1077 00:51:10,766 --> 00:51:11,933 That's a process that came about 1078 00:51:11,933 --> 00:51:13,533 so that things, when you plugged them in, 1079 00:51:13,533 --> 00:51:14,900 didn't blow up in your hand. 1080 00:51:14,900 --> 00:51:16,366 That's the same kind of idea 1081 00:51:16,366 --> 00:51:18,966 that we need in these algorithms. 1082 00:51:21,300 --> 00:51:24,266 O'NEIL: We can adjust it to make it better than the past, 1083 00:51:24,266 --> 00:51:25,900 and we can do it carefully, 1084 00:51:25,900 --> 00:51:28,100 and we can do it with, with precision 1085 00:51:28,100 --> 00:51:30,833 in an ongoing conversation about what it means to us 1086 00:51:30,833 --> 00:51:34,066 that it is, it's biased in the right way. 1087 00:51:34,066 --> 00:51:35,400 I don't think you remove bias, 1088 00:51:35,400 --> 00:51:37,900 but you get to a bias that you can live with, 1089 00:51:37,900 --> 00:51:40,633 that you, you think is moral. 1090 00:51:40,633 --> 00:51:43,000 To be clear, like, I, I think we can do better, 1091 00:51:43,000 --> 00:51:44,566 but often doing better 1092 00:51:44,566 --> 00:51:47,566 would look like we don't use this at all. 1093 00:51:47,566 --> 00:51:48,766 (radio running) 1094 00:51:48,766 --> 00:51:51,066 FARID: There's nothing fundamentally wrong 1095 00:51:51,066 --> 00:51:52,333 with trying to predict the future, 1096 00:51:52,333 --> 00:51:54,500 as long as you understand how the algorithms are working, 1097 00:51:54,500 --> 00:51:55,533 how are they being deployed. 1098 00:51:55,533 --> 00:51:59,133 What is the consequence of getting it right? 1099 00:51:59,133 --> 00:52:00,466 And most importantly is, 1100 00:52:00,466 --> 00:52:03,133 what is the consequence of getting it wrong? 1101 00:52:03,133 --> 00:52:04,333 OFFICER: Keep your hands on the steering wheel! 1102 00:52:04,333 --> 00:52:06,000 MAN: My hands haven't moved off the steering wheel! 1103 00:52:06,000 --> 00:52:07,833 MAN 2: Are you gonna arrest me? 1104 00:52:07,833 --> 00:52:08,700 MAN 1: Officer, what are we here for? 1105 00:52:08,700 --> 00:52:09,500 OFFICER: We just want to talk with... 1106 00:52:31,333 --> 00:52:34,533 ♪ ♪ 1107 00:52:55,500 --> 00:52:58,566 ANNOUNCER: This program is available with PBS Passport 1108 00:52:58,566 --> 00:53:00,866 and on Amazon Prime Video. 1109 00:53:00,866 --> 00:53:04,833 ♪ ♪ 1110 00:53:15,366 --> 00:53:20,933 ♪ ♪ 88818

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.