Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,040 --> 00:00:03,030
- Hi, I'm William Lidwell and this
2
00:00:03,040 --> 00:00:05,030
is Universal Principles of Design.
3
00:00:05,040 --> 00:00:08,030
In this movie, Selection Bias.
4
00:00:08,040 --> 00:00:14,050
How the dots we collect influence the dots we connect.
5
00:00:14,060 --> 00:00:16,080
During World War II, a statistician
6
00:00:16,090 --> 00:00:18,990
by the name of Abraham Wald
7
00:00:19,000 --> 00:00:21,000
was tasked to research how allied bombers
8
00:00:21,010 --> 00:00:23,010
were being felled by enemy fire.
9
00:00:23,020 --> 00:00:26,990
The idea was that by identifying areas of vulnerability
10
00:00:27,000 --> 00:00:29,990
in the bombers, we would know where to add armor,
11
00:00:30,000 --> 00:00:33,010
increasing the survivability of bombers on future missions.
12
00:00:33,020 --> 00:00:36,990
So Wald beautifully collected data based on bomber damage
13
00:00:37,000 --> 00:00:39,020
and then rendered the results of his analysis
14
00:00:39,030 --> 00:00:42,080
on a diagram of an aircraft that looks something like this.
15
00:00:42,090 --> 00:00:46,040
The red dots represent bulletholes and flak damage.
16
00:00:46,050 --> 00:00:49,060
Looking at his data, where would you add the armor?
17
00:00:49,070 --> 00:00:53,990
If you're like most people, the answer seems obvious.
18
00:00:54,000 --> 00:00:56,030
You add the armor where the damage is.
19
00:00:56,040 --> 00:00:57,990
Where the red dots are.
20
00:00:58,000 --> 00:01:00,080
But remember, Walt's data were based
21
00:01:00,090 --> 00:01:02,990
on aircraft that had survived.
22
00:01:03,000 --> 00:01:05,010
That had successfully made it back.
23
00:01:05,020 --> 00:01:09,020
The data did not include the bombers that were shot down.
24
00:01:09,030 --> 00:01:11,030
Walt of course knew this.
25
00:01:11,040 --> 00:01:13,990
He knew the data were biased in this way.
26
00:01:14,000 --> 00:01:16,060
A bias called selection bias.
27
00:01:16,070 --> 00:01:19,070
And because of this understanding, he drew to me,
28
00:01:19,080 --> 00:01:24,000
a mind-blowingly ingenious and counterintuitive conclusion.
29
00:01:24,010 --> 00:01:26,070
Wald said we should add armor
30
00:01:26,080 --> 00:01:30,060
to the areas with no damage, with no red dots.
31
00:01:30,070 --> 00:01:34,080
Because the bombers hit in the red areas came home.
32
00:01:34,090 --> 00:01:36,050
The ones that didn't come home
33
00:01:36,060 --> 00:01:39,030
must have been hit in the non-red areas.
34
00:01:39,040 --> 00:01:42,070
Brilliant.
35
00:01:42,080 --> 00:01:45,080
Why do people so consistently draw the wrong conclusion
36
00:01:45,090 --> 00:01:48,020
when confronted with data like this?
37
00:01:48,030 --> 00:01:50,040
The answer is simple.
38
00:01:50,050 --> 00:01:54,010
Humans are pattern-detecting and pattern-making machines.
39
00:01:54,020 --> 00:01:57,060
When we see dots, we try to connect them.
40
00:01:57,070 --> 00:01:59,000
It's reflexive.
41
00:01:59,010 --> 00:02:02,010
It is only when we, like Wald,
42
00:02:02,020 --> 00:02:05,030
understand the perils of biases like selection bias
43
00:02:05,040 --> 00:02:07,070
that we pause and make sure the dots
44
00:02:07,080 --> 00:02:11,000
that have been collected are worthy of connecting.
45
00:02:11,010 --> 00:02:13,040
So what is selection bias?
46
00:02:13,050 --> 00:02:16,030
Stated simply, selection bias is a bias
47
00:02:16,040 --> 00:02:18,010
in the way evidence is collected
48
00:02:18,020 --> 00:02:21,060
that distorts our analysis and conclusions.
49
00:02:21,070 --> 00:02:24,010
In the case of Wald's damaged bombers,
50
00:02:24,020 --> 00:02:27,070
the evidence was biased through no bad intentions or faults.
51
00:02:27,080 --> 00:02:29,990
The downed planes simply were not
52
00:02:30,000 --> 00:02:31,060
available for consideration.
53
00:02:31,070 --> 00:02:34,020
But this is often not the case.
54
00:02:34,030 --> 00:02:37,080
People who want to persuade often cherry pick data
55
00:02:37,090 --> 00:02:41,050
that support their position and exclude data that negate it.
56
00:02:41,060 --> 00:02:44,010
Resulting in evidence that appears convincing
57
00:02:44,020 --> 00:02:47,020
but that is not representative of the truth.
58
00:02:47,030 --> 00:02:49,080
For example, climate change denialists
59
00:02:49,090 --> 00:02:53,050
typically cherry-pick climate data from just the last decade
60
00:02:53,060 --> 00:02:56,080
which indicates flat or declining global temperatures
61
00:02:56,090 --> 00:03:00,060
ignoring the clear longer term historical trend.
62
00:03:00,070 --> 00:03:04,080
How can we prevent selection bias?
63
00:03:04,090 --> 00:03:07,040
It's surprisingly simple in theory,
64
00:03:07,050 --> 00:03:09,050
a little harder in practice.
65
00:03:09,060 --> 00:03:12,010
When you're dealing with a small population,
66
00:03:12,020 --> 00:03:13,060
meaning a small number of things
67
00:03:13,070 --> 00:03:15,030
about which you're collecting data,
68
00:03:15,040 --> 00:03:17,010
like a classroom of students.
69
00:03:17,020 --> 00:03:20,990
You collect data from everyone, all of the students.
70
00:03:21,000 --> 00:03:23,060
If all members of a population are represented
71
00:03:23,070 --> 00:03:26,070
in your analysis, there can be no selection bias.
72
00:03:26,080 --> 00:03:30,050
Unfortunately, in most real world situations,
73
00:03:30,060 --> 00:03:33,070
it is neither possible nor practical to do this.
74
00:03:33,080 --> 00:03:37,000
A number of members in a population is too large,
75
00:03:37,010 --> 00:03:39,060
or as with the bombers, just not available.
76
00:03:39,070 --> 00:03:44,020
In these cases, the key is random sampling.
77
00:03:44,030 --> 00:03:46,020
You must randomly select members
78
00:03:46,030 --> 00:03:48,010
from the available population.
79
00:03:48,020 --> 00:03:51,050
A truly random sample prevents selection bias.
80
00:03:51,060 --> 00:03:54,010
You just need to make sure that you sample
81
00:03:54,020 --> 00:03:57,000
from the full set of things you're generalizing about.
82
00:03:57,010 --> 00:04:01,010
For example, Walt collected data from a subset of bombers.
83
00:04:01,020 --> 00:04:02,990
The ones that returned.
84
00:04:03,000 --> 00:04:05,080
But needed to generalize the results to all bombers.
85
00:04:05,090 --> 00:04:09,070
This is why adding armor to the red areas was wrong.
86
00:04:09,080 --> 00:04:13,000
The sample of damaged bombers was not random,
87
00:04:13,010 --> 00:04:15,080
and so did not generalize the full set.
88
00:04:15,090 --> 00:04:19,070
Similarly, if you surveyed a random group of Mac users,
89
00:04:19,080 --> 00:04:25,000
you couldn't generalize to PC users.
90
00:04:25,010 --> 00:04:27,030
So whether your knowledge of selection bias
91
00:04:27,040 --> 00:04:29,060
helps you do better design research
92
00:04:29,070 --> 00:04:32,060
or think more critically about how data and statistics
93
00:04:32,070 --> 00:04:35,040
are used to persuade and drive decision making.
94
00:04:35,050 --> 00:04:39,010
Remember the only way to connect the right dots
95
00:04:39,020 --> 00:00:00,000
is to collect the right dots.
7643
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.