subtitlecat.com

All language subtitles for 001 Null vs Alternative Hypothesis_en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,930 --> 00:00:03,330 Instructor: Hi, and welcome back. 2 00:00:03,330 --> 00:00:04,590 This section is based 3 00:00:04,590 --> 00:00:06,689 on the knowledge that you acquired previously. 4 00:00:06,689 --> 00:00:08,430 So, if you haven't been through it 5 00:00:08,430 --> 00:00:10,230 you may have a hard time keeping up. 6 00:00:11,460 --> 00:00:13,080 Make sure you have seen all the videos 7 00:00:13,080 --> 00:00:15,720 about confidence intervals, distributions, 8 00:00:15,720 --> 00:00:17,940 Z tables and T tables, 9 00:00:17,940 --> 00:00:20,310 and have done all the exercises. 10 00:00:20,310 --> 00:00:23,193 If you've completed them already, you are good to go. 11 00:00:24,030 --> 00:00:26,370 Confidence intervals provide us with an estimation 12 00:00:26,370 --> 00:00:28,950 of where the parameters are located. 13 00:00:28,950 --> 00:00:31,170 However, when you are making a decision 14 00:00:31,170 --> 00:00:32,793 you need a yes or no answer. 15 00:00:33,900 --> 00:00:36,633 The correct approach in this case is to use a test. 16 00:00:37,800 --> 00:00:40,320 In this section, we will learn how to perform 17 00:00:40,320 --> 00:00:43,350 one of the fundamental tasks and statistics, 18 00:00:43,350 --> 00:00:44,733 hypothesis testing. 19 00:00:46,140 --> 00:00:50,433 Okay. There are four steps in data-driven decision making. 20 00:00:51,480 --> 00:00:54,483 First, you must formulate a hypothesis. 21 00:00:56,190 --> 00:00:58,980 Second, once you have formulated a hypothesis, 22 00:00:58,980 --> 00:01:01,863 you will have to find the right test for your hypothesis. 23 00:01:03,150 --> 00:01:05,834 Third, you execute the test. 24 00:01:05,834 --> 00:01:09,183 And fourth, you make a decision based on the result. 25 00:01:10,836 --> 00:01:13,200 Let's start from the beginning. 26 00:01:13,200 --> 00:01:15,033 What is a hypothesis? 27 00:01:16,050 --> 00:01:18,150 Though, there are many ways to define it. 28 00:01:18,150 --> 00:01:20,820 The most intuitive I've seen is 29 00:01:20,820 --> 00:01:24,243 a hypothesis is an idea that can be tested. 30 00:01:25,770 --> 00:01:27,540 And this is not the formal definition, 31 00:01:27,540 --> 00:01:30,750 but it explains the point very well. 32 00:01:30,750 --> 00:01:34,530 So, if I tell you that apples in New York are expensive, 33 00:01:34,530 --> 00:01:38,190 this is an idea or a statement, but is not testable 34 00:01:38,190 --> 00:01:40,290 until I have something to compare it with. 35 00:01:41,880 --> 00:01:43,950 For instance, if I define expensive 36 00:01:43,950 --> 00:01:48,240 as any price higher than a dollar 75 cents per pound, 37 00:01:48,240 --> 00:01:51,033 then it immediately becomes a hypothesis. 38 00:01:52,950 --> 00:01:56,433 All right, what's something that cannot be a hypothesis? 39 00:01:57,270 --> 00:01:58,890 An example may be, 40 00:01:58,890 --> 00:02:01,230 would the USA do better or worse 41 00:02:01,230 --> 00:02:02,970 under a Clinton administration 42 00:02:02,970 --> 00:02:04,833 compared to a Trump administration? 43 00:02:05,940 --> 00:02:08,580 Statistically speaking, this is an idea, 44 00:02:08,580 --> 00:02:10,293 but there is no data to test it. 45 00:02:11,130 --> 00:02:14,673 Therefore, it cannot be a hypothesis of a statistical test. 46 00:02:15,930 --> 00:02:18,000 Actually, it is more likely to be a topic 47 00:02:18,000 --> 00:02:19,100 of another discipline. 48 00:02:21,000 --> 00:02:23,430 Conversely, in statistics, we may compare 49 00:02:23,430 --> 00:02:26,490 different US presidencies that have already been completed, 50 00:02:26,490 --> 00:02:28,200 such as the Obama administration 51 00:02:28,200 --> 00:02:30,850 and the Bush administration, as we have data on both. 52 00:02:32,082 --> 00:02:34,590 All right, let's get out of politics 53 00:02:34,590 --> 00:02:36,183 and get into hypotheses. 54 00:02:37,372 --> 00:02:41,400 Here's a simple topic that can be tested. 55 00:02:41,400 --> 00:02:42,870 According to Glassdoor, 56 00:02:42,870 --> 00:02:45,600 the popular salary information website, 57 00:02:45,600 --> 00:02:50,433 the mean data scientist salary in the US is $113,000. 58 00:02:51,480 --> 00:02:54,573 So, we want to test if their estimate is correct. 59 00:02:55,860 --> 00:02:58,380 There are two hypotheses that are made. 60 00:02:58,380 --> 00:03:01,080 The null hypothesis, denoted H zero, 61 00:03:01,080 --> 00:03:06,080 and the alternative hypothesis denoted H one or H A. 62 00:03:08,160 --> 00:03:10,770 The null hypothesis is the one to be tested, 63 00:03:10,770 --> 00:03:13,143 and the alternative is everything else. 64 00:03:14,490 --> 00:03:17,740 In our example, the null hypothesis would be 65 00:03:18,660 --> 00:03:23,077 the mean data scientist salary is $113,000. 66 00:03:24,390 --> 00:03:25,950 While the alternative, 67 00:03:25,950 --> 00:03:30,757 the mean data scientist salary is not $113,000. 68 00:03:31,890 --> 00:03:35,640 Now, you would wanna check if 113,000 is close enough 69 00:03:35,640 --> 00:03:38,940 to the true mean predicted by our sample. 70 00:03:38,940 --> 00:03:42,720 In case it is, you would accept the null hypothesis, 71 00:03:42,720 --> 00:03:45,633 otherwise you would reject the null hypothesis. 72 00:03:47,220 --> 00:03:50,310 The concept of the null hypothesis is similar to 73 00:03:50,310 --> 00:03:52,353 innocent until proven guilty. 74 00:03:53,400 --> 00:03:56,609 We assume that the mean salary is $113,000 75 00:03:56,609 --> 00:03:59,013 and we try to prove otherwise. 76 00:04:00,870 --> 00:04:03,630 Okay, this was an example of a two-sided 77 00:04:03,630 --> 00:04:05,043 or a two-tailed test. 78 00:04:05,910 --> 00:04:09,724 You can also form one-sided or one-tail test. 79 00:04:09,724 --> 00:04:13,080 Say your friend, Paul, told you that he thinks 80 00:04:13,080 --> 00:04:17,343 data scientists earn more than $125,000 per year. 81 00:04:18,480 --> 00:04:21,483 You doubt him, so you design a test to see who's right. 82 00:04:22,350 --> 00:04:25,200 The null hypothesis of this test would be 83 00:04:25,200 --> 00:04:29,583 the mean data scientist salary is more or equal to $125,000. 84 00:04:31,860 --> 00:04:34,170 The alternative will cover everything else. 85 00:04:34,170 --> 00:04:38,343 Thus, the mean data scientist salary is less than $125,000. 86 00:04:42,000 --> 00:04:44,400 It is important to note that outcomes of tests 87 00:04:44,400 --> 00:04:46,170 refer to the population parameter 88 00:04:46,170 --> 00:04:48,360 rather than the sample statistic. 89 00:04:48,360 --> 00:04:51,483 So, the result that we get is for the population. 90 00:04:53,340 --> 00:04:56,370 Another crucial consideration is that generally 91 00:04:56,370 --> 00:04:59,283 the researcher is trying to reject the null hypothesis. 92 00:05:00,360 --> 00:05:02,910 Think about the null hypothesis as the status quo 93 00:05:02,910 --> 00:05:05,370 and the alternative as the change or innovation 94 00:05:05,370 --> 00:05:07,113 that challenges that status quo. 95 00:05:08,790 --> 00:05:11,640 In our example, Paul was representing the status quo, 96 00:05:11,640 --> 00:05:12,940 which we were challenging. 97 00:05:14,670 --> 00:05:16,890 Let me emphasize this once again. 98 00:05:16,890 --> 00:05:19,590 In statistics, the null hypothesis is the statement 99 00:05:19,590 --> 00:05:21,270 we are trying to reject. 100 00:05:21,270 --> 00:05:23,070 Therefore, the null hypothesis 101 00:05:23,070 --> 00:05:24,750 is the present state of affairs, 102 00:05:24,750 --> 00:05:27,183 while the alternative, is our personal opinion. 103 00:05:29,010 --> 00:05:31,380 It surely is counterintuitive in the beginning, 104 00:05:31,380 --> 00:05:34,080 but later on, when you start doing the exercises, 105 00:05:34,080 --> 00:05:35,793 you will understand the mechanics. 106 00:05:37,680 --> 00:05:39,750 Okay, after this lecture 107 00:05:39,750 --> 00:05:42,543 there will be a detailed comment on these two examples. 108 00:05:43,530 --> 00:05:46,710 In addition, make sure you complete the quiz questions 109 00:05:46,710 --> 00:05:49,473 so you become confident with forming hypotheses. 110 00:05:50,550 --> 00:05:51,550 Thanks for watching. 8760