Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,930 --> 00:00:03,330
Instructor: Hi, and welcome back.
2
00:00:03,330 --> 00:00:04,590
This section is based
3
00:00:04,590 --> 00:00:06,689
on the knowledge that you acquired previously.
4
00:00:06,689 --> 00:00:08,430
So, if you haven't been through it
5
00:00:08,430 --> 00:00:10,230
you may have a hard time keeping up.
6
00:00:11,460 --> 00:00:13,080
Make sure you have seen all the videos
7
00:00:13,080 --> 00:00:15,720
about confidence intervals, distributions,
8
00:00:15,720 --> 00:00:17,940
Z tables and T tables,
9
00:00:17,940 --> 00:00:20,310
and have done all the exercises.
10
00:00:20,310 --> 00:00:23,193
If you've completed them already, you are good to go.
11
00:00:24,030 --> 00:00:26,370
Confidence intervals provide us with an estimation
12
00:00:26,370 --> 00:00:28,950
of where the parameters are located.
13
00:00:28,950 --> 00:00:31,170
However, when you are making a decision
14
00:00:31,170 --> 00:00:32,793
you need a yes or no answer.
15
00:00:33,900 --> 00:00:36,633
The correct approach in this case is to use a test.
16
00:00:37,800 --> 00:00:40,320
In this section, we will learn how to perform
17
00:00:40,320 --> 00:00:43,350
one of the fundamental tasks and statistics,
18
00:00:43,350 --> 00:00:44,733
hypothesis testing.
19
00:00:46,140 --> 00:00:50,433
Okay. There are four steps in data-driven decision making.
20
00:00:51,480 --> 00:00:54,483
First, you must formulate a hypothesis.
21
00:00:56,190 --> 00:00:58,980
Second, once you have formulated a hypothesis,
22
00:00:58,980 --> 00:01:01,863
you will have to find the right test for your hypothesis.
23
00:01:03,150 --> 00:01:05,834
Third, you execute the test.
24
00:01:05,834 --> 00:01:09,183
And fourth, you make a decision based on the result.
25
00:01:10,836 --> 00:01:13,200
Let's start from the beginning.
26
00:01:13,200 --> 00:01:15,033
What is a hypothesis?
27
00:01:16,050 --> 00:01:18,150
Though, there are many ways to define it.
28
00:01:18,150 --> 00:01:20,820
The most intuitive I've seen is
29
00:01:20,820 --> 00:01:24,243
a hypothesis is an idea that can be tested.
30
00:01:25,770 --> 00:01:27,540
And this is not the formal definition,
31
00:01:27,540 --> 00:01:30,750
but it explains the point very well.
32
00:01:30,750 --> 00:01:34,530
So, if I tell you that apples in New York are expensive,
33
00:01:34,530 --> 00:01:38,190
this is an idea or a statement, but is not testable
34
00:01:38,190 --> 00:01:40,290
until I have something to compare it with.
35
00:01:41,880 --> 00:01:43,950
For instance, if I define expensive
36
00:01:43,950 --> 00:01:48,240
as any price higher than a dollar 75 cents per pound,
37
00:01:48,240 --> 00:01:51,033
then it immediately becomes a hypothesis.
38
00:01:52,950 --> 00:01:56,433
All right, what's something that cannot be a hypothesis?
39
00:01:57,270 --> 00:01:58,890
An example may be,
40
00:01:58,890 --> 00:02:01,230
would the USA do better or worse
41
00:02:01,230 --> 00:02:02,970
under a Clinton administration
42
00:02:02,970 --> 00:02:04,833
compared to a Trump administration?
43
00:02:05,940 --> 00:02:08,580
Statistically speaking, this is an idea,
44
00:02:08,580 --> 00:02:10,293
but there is no data to test it.
45
00:02:11,130 --> 00:02:14,673
Therefore, it cannot be a hypothesis of a statistical test.
46
00:02:15,930 --> 00:02:18,000
Actually, it is more likely to be a topic
47
00:02:18,000 --> 00:02:19,100
of another discipline.
48
00:02:21,000 --> 00:02:23,430
Conversely, in statistics, we may compare
49
00:02:23,430 --> 00:02:26,490
different US presidencies that have already been completed,
50
00:02:26,490 --> 00:02:28,200
such as the Obama administration
51
00:02:28,200 --> 00:02:30,850
and the Bush administration, as we have data on both.
52
00:02:32,082 --> 00:02:34,590
All right, let's get out of politics
53
00:02:34,590 --> 00:02:36,183
and get into hypotheses.
54
00:02:37,372 --> 00:02:41,400
Here's a simple topic that can be tested.
55
00:02:41,400 --> 00:02:42,870
According to Glassdoor,
56
00:02:42,870 --> 00:02:45,600
the popular salary information website,
57
00:02:45,600 --> 00:02:50,433
the mean data scientist salary in the US is $113,000.
58
00:02:51,480 --> 00:02:54,573
So, we want to test if their estimate is correct.
59
00:02:55,860 --> 00:02:58,380
There are two hypotheses that are made.
60
00:02:58,380 --> 00:03:01,080
The null hypothesis, denoted H zero,
61
00:03:01,080 --> 00:03:06,080
and the alternative hypothesis denoted H one or H A.
62
00:03:08,160 --> 00:03:10,770
The null hypothesis is the one to be tested,
63
00:03:10,770 --> 00:03:13,143
and the alternative is everything else.
64
00:03:14,490 --> 00:03:17,740
In our example, the null hypothesis would be
65
00:03:18,660 --> 00:03:23,077
the mean data scientist salary is $113,000.
66
00:03:24,390 --> 00:03:25,950
While the alternative,
67
00:03:25,950 --> 00:03:30,757
the mean data scientist salary is not $113,000.
68
00:03:31,890 --> 00:03:35,640
Now, you would wanna check if 113,000 is close enough
69
00:03:35,640 --> 00:03:38,940
to the true mean predicted by our sample.
70
00:03:38,940 --> 00:03:42,720
In case it is, you would accept the null hypothesis,
71
00:03:42,720 --> 00:03:45,633
otherwise you would reject the null hypothesis.
72
00:03:47,220 --> 00:03:50,310
The concept of the null hypothesis is similar to
73
00:03:50,310 --> 00:03:52,353
innocent until proven guilty.
74
00:03:53,400 --> 00:03:56,609
We assume that the mean salary is $113,000
75
00:03:56,609 --> 00:03:59,013
and we try to prove otherwise.
76
00:04:00,870 --> 00:04:03,630
Okay, this was an example of a two-sided
77
00:04:03,630 --> 00:04:05,043
or a two-tailed test.
78
00:04:05,910 --> 00:04:09,724
You can also form one-sided or one-tail test.
79
00:04:09,724 --> 00:04:13,080
Say your friend, Paul, told you that he thinks
80
00:04:13,080 --> 00:04:17,343
data scientists earn more than $125,000 per year.
81
00:04:18,480 --> 00:04:21,483
You doubt him, so you design a test to see who's right.
82
00:04:22,350 --> 00:04:25,200
The null hypothesis of this test would be
83
00:04:25,200 --> 00:04:29,583
the mean data scientist salary is more or equal to $125,000.
84
00:04:31,860 --> 00:04:34,170
The alternative will cover everything else.
85
00:04:34,170 --> 00:04:38,343
Thus, the mean data scientist salary is less than $125,000.
86
00:04:42,000 --> 00:04:44,400
It is important to note that outcomes of tests
87
00:04:44,400 --> 00:04:46,170
refer to the population parameter
88
00:04:46,170 --> 00:04:48,360
rather than the sample statistic.
89
00:04:48,360 --> 00:04:51,483
So, the result that we get is for the population.
90
00:04:53,340 --> 00:04:56,370
Another crucial consideration is that generally
91
00:04:56,370 --> 00:04:59,283
the researcher is trying to reject the null hypothesis.
92
00:05:00,360 --> 00:05:02,910
Think about the null hypothesis as the status quo
93
00:05:02,910 --> 00:05:05,370
and the alternative as the change or innovation
94
00:05:05,370 --> 00:05:07,113
that challenges that status quo.
95
00:05:08,790 --> 00:05:11,640
In our example, Paul was representing the status quo,
96
00:05:11,640 --> 00:05:12,940
which we were challenging.
97
00:05:14,670 --> 00:05:16,890
Let me emphasize this once again.
98
00:05:16,890 --> 00:05:19,590
In statistics, the null hypothesis is the statement
99
00:05:19,590 --> 00:05:21,270
we are trying to reject.
100
00:05:21,270 --> 00:05:23,070
Therefore, the null hypothesis
101
00:05:23,070 --> 00:05:24,750
is the present state of affairs,
102
00:05:24,750 --> 00:05:27,183
while the alternative, is our personal opinion.
103
00:05:29,010 --> 00:05:31,380
It surely is counterintuitive in the beginning,
104
00:05:31,380 --> 00:05:34,080
but later on, when you start doing the exercises,
105
00:05:34,080 --> 00:05:35,793
you will understand the mechanics.
106
00:05:37,680 --> 00:05:39,750
Okay, after this lecture
107
00:05:39,750 --> 00:05:42,543
there will be a detailed comment on these two examples.
108
00:05:43,530 --> 00:05:46,710
In addition, make sure you complete the quiz questions
109
00:05:46,710 --> 00:05:49,473
so you become confident with forming hypotheses.
110
00:05:50,550 --> 00:05:51,550
Thanks for watching.
8760
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.