All language subtitles for 001 Null vs Alternative Hypothesis_en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian Download
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,930 --> 00:00:03,330 Instructor: Hi, and welcome back. 2 00:00:03,330 --> 00:00:04,590 This section is based 3 00:00:04,590 --> 00:00:06,689 on the knowledge that you acquired previously. 4 00:00:06,689 --> 00:00:08,430 So, if you haven't been through it 5 00:00:08,430 --> 00:00:10,230 you may have a hard time keeping up. 6 00:00:11,460 --> 00:00:13,080 Make sure you have seen all the videos 7 00:00:13,080 --> 00:00:15,720 about confidence intervals, distributions, 8 00:00:15,720 --> 00:00:17,940 Z tables and T tables, 9 00:00:17,940 --> 00:00:20,310 and have done all the exercises. 10 00:00:20,310 --> 00:00:23,193 If you've completed them already, you are good to go. 11 00:00:24,030 --> 00:00:26,370 Confidence intervals provide us with an estimation 12 00:00:26,370 --> 00:00:28,950 of where the parameters are located. 13 00:00:28,950 --> 00:00:31,170 However, when you are making a decision 14 00:00:31,170 --> 00:00:32,793 you need a yes or no answer. 15 00:00:33,900 --> 00:00:36,633 The correct approach in this case is to use a test. 16 00:00:37,800 --> 00:00:40,320 In this section, we will learn how to perform 17 00:00:40,320 --> 00:00:43,350 one of the fundamental tasks and statistics, 18 00:00:43,350 --> 00:00:44,733 hypothesis testing. 19 00:00:46,140 --> 00:00:50,433 Okay. There are four steps in data-driven decision making. 20 00:00:51,480 --> 00:00:54,483 First, you must formulate a hypothesis. 21 00:00:56,190 --> 00:00:58,980 Second, once you have formulated a hypothesis, 22 00:00:58,980 --> 00:01:01,863 you will have to find the right test for your hypothesis. 23 00:01:03,150 --> 00:01:05,834 Third, you execute the test. 24 00:01:05,834 --> 00:01:09,183 And fourth, you make a decision based on the result. 25 00:01:10,836 --> 00:01:13,200 Let's start from the beginning. 26 00:01:13,200 --> 00:01:15,033 What is a hypothesis? 27 00:01:16,050 --> 00:01:18,150 Though, there are many ways to define it. 28 00:01:18,150 --> 00:01:20,820 The most intuitive I've seen is 29 00:01:20,820 --> 00:01:24,243 a hypothesis is an idea that can be tested. 30 00:01:25,770 --> 00:01:27,540 And this is not the formal definition, 31 00:01:27,540 --> 00:01:30,750 but it explains the point very well. 32 00:01:30,750 --> 00:01:34,530 So, if I tell you that apples in New York are expensive, 33 00:01:34,530 --> 00:01:38,190 this is an idea or a statement, but is not testable 34 00:01:38,190 --> 00:01:40,290 until I have something to compare it with. 35 00:01:41,880 --> 00:01:43,950 For instance, if I define expensive 36 00:01:43,950 --> 00:01:48,240 as any price higher than a dollar 75 cents per pound, 37 00:01:48,240 --> 00:01:51,033 then it immediately becomes a hypothesis. 38 00:01:52,950 --> 00:01:56,433 All right, what's something that cannot be a hypothesis? 39 00:01:57,270 --> 00:01:58,890 An example may be, 40 00:01:58,890 --> 00:02:01,230 would the USA do better or worse 41 00:02:01,230 --> 00:02:02,970 under a Clinton administration 42 00:02:02,970 --> 00:02:04,833 compared to a Trump administration? 43 00:02:05,940 --> 00:02:08,580 Statistically speaking, this is an idea, 44 00:02:08,580 --> 00:02:10,293 but there is no data to test it. 45 00:02:11,130 --> 00:02:14,673 Therefore, it cannot be a hypothesis of a statistical test. 46 00:02:15,930 --> 00:02:18,000 Actually, it is more likely to be a topic 47 00:02:18,000 --> 00:02:19,100 of another discipline. 48 00:02:21,000 --> 00:02:23,430 Conversely, in statistics, we may compare 49 00:02:23,430 --> 00:02:26,490 different US presidencies that have already been completed, 50 00:02:26,490 --> 00:02:28,200 such as the Obama administration 51 00:02:28,200 --> 00:02:30,850 and the Bush administration, as we have data on both. 52 00:02:32,082 --> 00:02:34,590 All right, let's get out of politics 53 00:02:34,590 --> 00:02:36,183 and get into hypotheses. 54 00:02:37,372 --> 00:02:41,400 Here's a simple topic that can be tested. 55 00:02:41,400 --> 00:02:42,870 According to Glassdoor, 56 00:02:42,870 --> 00:02:45,600 the popular salary information website, 57 00:02:45,600 --> 00:02:50,433 the mean data scientist salary in the US is $113,000. 58 00:02:51,480 --> 00:02:54,573 So, we want to test if their estimate is correct. 59 00:02:55,860 --> 00:02:58,380 There are two hypotheses that are made. 60 00:02:58,380 --> 00:03:01,080 The null hypothesis, denoted H zero, 61 00:03:01,080 --> 00:03:06,080 and the alternative hypothesis denoted H one or H A. 62 00:03:08,160 --> 00:03:10,770 The null hypothesis is the one to be tested, 63 00:03:10,770 --> 00:03:13,143 and the alternative is everything else. 64 00:03:14,490 --> 00:03:17,740 In our example, the null hypothesis would be 65 00:03:18,660 --> 00:03:23,077 the mean data scientist salary is $113,000. 66 00:03:24,390 --> 00:03:25,950 While the alternative, 67 00:03:25,950 --> 00:03:30,757 the mean data scientist salary is not $113,000. 68 00:03:31,890 --> 00:03:35,640 Now, you would wanna check if 113,000 is close enough 69 00:03:35,640 --> 00:03:38,940 to the true mean predicted by our sample. 70 00:03:38,940 --> 00:03:42,720 In case it is, you would accept the null hypothesis, 71 00:03:42,720 --> 00:03:45,633 otherwise you would reject the null hypothesis. 72 00:03:47,220 --> 00:03:50,310 The concept of the null hypothesis is similar to 73 00:03:50,310 --> 00:03:52,353 innocent until proven guilty. 74 00:03:53,400 --> 00:03:56,609 We assume that the mean salary is $113,000 75 00:03:56,609 --> 00:03:59,013 and we try to prove otherwise. 76 00:04:00,870 --> 00:04:03,630 Okay, this was an example of a two-sided 77 00:04:03,630 --> 00:04:05,043 or a two-tailed test. 78 00:04:05,910 --> 00:04:09,724 You can also form one-sided or one-tail test. 79 00:04:09,724 --> 00:04:13,080 Say your friend, Paul, told you that he thinks 80 00:04:13,080 --> 00:04:17,343 data scientists earn more than $125,000 per year. 81 00:04:18,480 --> 00:04:21,483 You doubt him, so you design a test to see who's right. 82 00:04:22,350 --> 00:04:25,200 The null hypothesis of this test would be 83 00:04:25,200 --> 00:04:29,583 the mean data scientist salary is more or equal to $125,000. 84 00:04:31,860 --> 00:04:34,170 The alternative will cover everything else. 85 00:04:34,170 --> 00:04:38,343 Thus, the mean data scientist salary is less than $125,000. 86 00:04:42,000 --> 00:04:44,400 It is important to note that outcomes of tests 87 00:04:44,400 --> 00:04:46,170 refer to the population parameter 88 00:04:46,170 --> 00:04:48,360 rather than the sample statistic. 89 00:04:48,360 --> 00:04:51,483 So, the result that we get is for the population. 90 00:04:53,340 --> 00:04:56,370 Another crucial consideration is that generally 91 00:04:56,370 --> 00:04:59,283 the researcher is trying to reject the null hypothesis. 92 00:05:00,360 --> 00:05:02,910 Think about the null hypothesis as the status quo 93 00:05:02,910 --> 00:05:05,370 and the alternative as the change or innovation 94 00:05:05,370 --> 00:05:07,113 that challenges that status quo. 95 00:05:08,790 --> 00:05:11,640 In our example, Paul was representing the status quo, 96 00:05:11,640 --> 00:05:12,940 which we were challenging. 97 00:05:14,670 --> 00:05:16,890 Let me emphasize this once again. 98 00:05:16,890 --> 00:05:19,590 In statistics, the null hypothesis is the statement 99 00:05:19,590 --> 00:05:21,270 we are trying to reject. 100 00:05:21,270 --> 00:05:23,070 Therefore, the null hypothesis 101 00:05:23,070 --> 00:05:24,750 is the present state of affairs, 102 00:05:24,750 --> 00:05:27,183 while the alternative, is our personal opinion. 103 00:05:29,010 --> 00:05:31,380 It surely is counterintuitive in the beginning, 104 00:05:31,380 --> 00:05:34,080 but later on, when you start doing the exercises, 105 00:05:34,080 --> 00:05:35,793 you will understand the mechanics. 106 00:05:37,680 --> 00:05:39,750 Okay, after this lecture 107 00:05:39,750 --> 00:05:42,543 there will be a detailed comment on these two examples. 108 00:05:43,530 --> 00:05:46,710 In addition, make sure you complete the quiz questions 109 00:05:46,710 --> 00:05:49,473 so you become confident with forming hypotheses. 110 00:05:50,550 --> 00:05:51,550 Thanks for watching. 8760

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.