subtitlecat.com

All language subtitles for 009 Cross Tables and Scatter Plots_en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,300 --> 00:00:01,800 -: So far we have covered graphs 2 00:00:01,800 --> 00:00:03,960 that represent only one variable. 3 00:00:03,960 --> 00:00:08,010 But how do we represent relationships between two variables? 4 00:00:08,010 --> 00:00:12,390 In this video, we'll explore cross tables and scatter plots. 5 00:00:12,390 --> 00:00:14,190 Once again, we have a division 6 00:00:14,190 --> 00:00:16,833 between categorical and numerical variables. 7 00:00:17,790 --> 00:00:20,610 Let's start with categorical variables. 8 00:00:20,610 --> 00:00:23,850 The most common way to represent them is using cross tables 9 00:00:23,850 --> 00:00:27,483 or as some statisticians call them contingency tables. 10 00:00:28,410 --> 00:00:30,120 Imagine you are an investment manager 11 00:00:30,120 --> 00:00:31,800 and you manage stocks, bonds 12 00:00:31,800 --> 00:00:35,220 and real estate investments for three different investors. 13 00:00:35,220 --> 00:00:38,160 Each of them has a different idea of risk, and hence 14 00:00:38,160 --> 00:00:40,290 their money is allocated in a different way 15 00:00:40,290 --> 00:00:42,570 among the three asset classes. 16 00:00:42,570 --> 00:00:44,850 A cross table representing all the data looks 17 00:00:44,850 --> 00:00:45,993 in the following way. 18 00:00:46,890 --> 00:00:48,810 You can clearly see the row showing the type 19 00:00:48,810 --> 00:00:50,280 of investment that's been made 20 00:00:50,280 --> 00:00:53,850 and the columns with each investor's allocation. 21 00:00:53,850 --> 00:00:56,130 It is a good practice to calculate the totals 22 00:00:56,130 --> 00:00:58,830 of each row and column as it is often useful 23 00:00:58,830 --> 00:01:00,750 in further analysis. 24 00:01:00,750 --> 00:01:02,970 Notice that the subtotals of the rows give us 25 00:01:02,970 --> 00:01:06,840 total investments in stocks, bonds and real estate. 26 00:01:06,840 --> 00:01:08,490 On the other hand, the subtotals 27 00:01:08,490 --> 00:01:11,253 of the columns give us the holdings of each investor. 28 00:01:12,420 --> 00:01:14,370 Once we have created a cross table 29 00:01:14,370 --> 00:01:17,853 we can proceed by visualizing the data onto a plane. 30 00:01:18,930 --> 00:01:21,444 A very useful chart in such cases is a variation 31 00:01:21,444 --> 00:01:24,933 of the bar chart called the side by side bar chart. 32 00:01:25,860 --> 00:01:28,140 It represents the holdings of each investor 33 00:01:28,140 --> 00:01:30,120 in the different types of assets. 34 00:01:30,120 --> 00:01:32,700 Stocks are in green, bonds are in red 35 00:01:32,700 --> 00:01:34,293 and real estate is in blue. 36 00:01:35,250 --> 00:01:36,690 The name of this type of chart comes 37 00:01:36,690 --> 00:01:38,820 from the fact that for each investor, 38 00:01:38,820 --> 00:01:42,060 the categories of assets are represented side by side. 39 00:01:42,060 --> 00:01:44,940 In this way, we can easily compare asset holdings 40 00:01:44,940 --> 00:01:47,880 for a specific investor or among investors. 41 00:01:47,880 --> 00:01:49,230 Easy, right? 42 00:01:49,230 --> 00:01:51,810 All graphs are very easy to create and read 43 00:01:51,810 --> 00:01:54,180 once you have identified the type of data you are 44 00:01:54,180 --> 00:01:57,093 dealing with and decided on the best way to visualize it. 45 00:01:58,500 --> 00:02:00,870 Finally, we would like to conclude with a very 46 00:02:00,870 --> 00:02:03,333 important graph, the scatter plot. 47 00:02:04,170 --> 00:02:08,190 It is used when representing two numerical variables. 48 00:02:08,190 --> 00:02:10,560 For this example, we have gathered the reading 49 00:02:10,560 --> 00:02:14,370 and writing SAT scores of 100 individuals. 50 00:02:14,370 --> 00:02:16,970 Let me first show you the graph before analyzing it. 51 00:02:18,060 --> 00:02:18,900 All right. 52 00:02:18,900 --> 00:02:22,260 First, SAT scores by component range from 200 to 53 00:02:22,260 --> 00:02:25,260 800 points, and that is why our data is bounded 54 00:02:25,260 --> 00:02:27,543 within the range of 200 to 800. 55 00:02:28,440 --> 00:02:31,680 Second, our vertical access shows the writing scores 56 00:02:31,680 --> 00:02:34,623 while the horizontal axis contains reading scores. 57 00:02:35,910 --> 00:02:39,210 Third, there are 100 students, and the results correspond 58 00:02:39,210 --> 00:02:41,193 to a specific point on the graph. 59 00:02:42,060 --> 00:02:43,830 Each point gives us information about 60 00:02:43,830 --> 00:02:46,380 a particular student's performance. 61 00:02:46,380 --> 00:02:48,330 For example, this is Jane. 62 00:02:48,330 --> 00:02:52,473 She scored 300 on writing, but 550 on the reading part. 63 00:02:53,700 --> 00:02:55,740 Scatter plots usually represent lots 64 00:02:55,740 --> 00:02:57,243 and lots of observations. 65 00:02:58,080 --> 00:02:59,550 When interpreting a scatter plot, 66 00:02:59,550 --> 00:03:01,500 a statistician is not expected to look 67 00:03:01,500 --> 00:03:03,150 into single data points. 68 00:03:03,150 --> 00:03:04,470 He would be much more interested 69 00:03:04,470 --> 00:03:07,773 in getting the main idea of how the data is distributed. 70 00:03:08,790 --> 00:03:11,070 Okay, the first thing we see is 71 00:03:11,070 --> 00:03:13,440 that there is an obvious up trend. 72 00:03:13,440 --> 00:03:16,110 This is because lower writing scores are usually obtained 73 00:03:16,110 --> 00:03:18,240 by students with lower reading scores 74 00:03:18,240 --> 00:03:20,010 and higher writing scores have been achieved 75 00:03:20,010 --> 00:03:22,050 by students with higher reading scores. 76 00:03:22,050 --> 00:03:24,240 This is logical, right? 77 00:03:24,240 --> 00:03:25,620 Students are more likely to do well 78 00:03:25,620 --> 00:03:28,473 on both because the two tasks are closely related. 79 00:03:29,400 --> 00:03:32,370 Second, we notice a concentration of students in the middle 80 00:03:32,370 --> 00:03:34,140 of the graph with scores in the region 81 00:03:34,140 --> 00:03:38,100 of 450 to 550 on both reading and writing. 82 00:03:38,100 --> 00:03:39,660 Remember we said that scores can be 83 00:03:39,660 --> 00:03:42,060 anywhere between 200 and 800? 84 00:03:42,060 --> 00:03:45,330 Well, 500 is the average score one can get 85 00:03:45,330 --> 00:03:49,448 so it makes sense that a lot of people fall into that area. 86 00:03:49,448 --> 00:03:52,020 Third, there is this group of people 87 00:03:52,020 --> 00:03:55,470 with both very high writing and reading scores. 88 00:03:55,470 --> 00:03:57,660 The exceptional students tend to be excellent 89 00:03:57,660 --> 00:03:58,803 at both components. 90 00:03:59,640 --> 00:04:01,350 This is less true for bad students, 91 00:04:01,350 --> 00:04:02,670 as their performance tends to 92 00:04:02,670 --> 00:04:04,773 deviate when performing different tasks. 93 00:04:05,970 --> 00:04:08,520 Finally, we have Jane from a minute ago. 94 00:04:08,520 --> 00:04:11,730 She is far away from every other observation as she scored 95 00:04:11,730 --> 00:04:14,910 above average on reading, but poorly on writing. 96 00:04:14,910 --> 00:04:17,220 This observation is called an outlier 97 00:04:17,220 --> 00:04:20,370 as it goes against the logic of the whole data set. 98 00:04:20,370 --> 00:04:22,800 We will learn more about outliers and how to treat them 99 00:04:22,800 --> 00:04:24,873 in our analysis later on in this course. 100 00:04:25,980 --> 00:04:28,230 So we have gone through the basics. 101 00:04:28,230 --> 00:04:31,020 We have covered populations, samples, 102 00:04:31,020 --> 00:04:34,710 types of variables, graphs and tables. 103 00:04:34,710 --> 00:04:36,630 And it is time for us to dive into 104 00:04:36,630 --> 00:04:39,030 the heart of descriptive statistics, 105 00:04:39,030 --> 00:04:42,810 measurements of central tendency and variability. 106 00:04:42,810 --> 00:04:43,810 Thanks for watching. 8455