All language subtitles for 009 Cross Tables and Scatter Plots_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,300 --> 00:00:01,800 -: So far we have covered graphs 2 00:00:01,800 --> 00:00:03,960 that represent only one variable. 3 00:00:03,960 --> 00:00:08,010 But how do we represent relationships between two variables? 4 00:00:08,010 --> 00:00:12,390 In this video, we'll explore cross tables and scatter plots. 5 00:00:12,390 --> 00:00:14,190 Once again, we have a division 6 00:00:14,190 --> 00:00:16,833 between categorical and numerical variables. 7 00:00:17,790 --> 00:00:20,610 Let's start with categorical variables. 8 00:00:20,610 --> 00:00:23,850 The most common way to represent them is using cross tables 9 00:00:23,850 --> 00:00:27,483 or as some statisticians call them contingency tables. 10 00:00:28,410 --> 00:00:30,120 Imagine you are an investment manager 11 00:00:30,120 --> 00:00:31,800 and you manage stocks, bonds 12 00:00:31,800 --> 00:00:35,220 and real estate investments for three different investors. 13 00:00:35,220 --> 00:00:38,160 Each of them has a different idea of risk, and hence 14 00:00:38,160 --> 00:00:40,290 their money is allocated in a different way 15 00:00:40,290 --> 00:00:42,570 among the three asset classes. 16 00:00:42,570 --> 00:00:44,850 A cross table representing all the data looks 17 00:00:44,850 --> 00:00:45,993 in the following way. 18 00:00:46,890 --> 00:00:48,810 You can clearly see the row showing the type 19 00:00:48,810 --> 00:00:50,280 of investment that's been made 20 00:00:50,280 --> 00:00:53,850 and the columns with each investor's allocation. 21 00:00:53,850 --> 00:00:56,130 It is a good practice to calculate the totals 22 00:00:56,130 --> 00:00:58,830 of each row and column as it is often useful 23 00:00:58,830 --> 00:01:00,750 in further analysis. 24 00:01:00,750 --> 00:01:02,970 Notice that the subtotals of the rows give us 25 00:01:02,970 --> 00:01:06,840 total investments in stocks, bonds and real estate. 26 00:01:06,840 --> 00:01:08,490 On the other hand, the subtotals 27 00:01:08,490 --> 00:01:11,253 of the columns give us the holdings of each investor. 28 00:01:12,420 --> 00:01:14,370 Once we have created a cross table 29 00:01:14,370 --> 00:01:17,853 we can proceed by visualizing the data onto a plane. 30 00:01:18,930 --> 00:01:21,444 A very useful chart in such cases is a variation 31 00:01:21,444 --> 00:01:24,933 of the bar chart called the side by side bar chart. 32 00:01:25,860 --> 00:01:28,140 It represents the holdings of each investor 33 00:01:28,140 --> 00:01:30,120 in the different types of assets. 34 00:01:30,120 --> 00:01:32,700 Stocks are in green, bonds are in red 35 00:01:32,700 --> 00:01:34,293 and real estate is in blue. 36 00:01:35,250 --> 00:01:36,690 The name of this type of chart comes 37 00:01:36,690 --> 00:01:38,820 from the fact that for each investor, 38 00:01:38,820 --> 00:01:42,060 the categories of assets are represented side by side. 39 00:01:42,060 --> 00:01:44,940 In this way, we can easily compare asset holdings 40 00:01:44,940 --> 00:01:47,880 for a specific investor or among investors. 41 00:01:47,880 --> 00:01:49,230 Easy, right? 42 00:01:49,230 --> 00:01:51,810 All graphs are very easy to create and read 43 00:01:51,810 --> 00:01:54,180 once you have identified the type of data you are 44 00:01:54,180 --> 00:01:57,093 dealing with and decided on the best way to visualize it. 45 00:01:58,500 --> 00:02:00,870 Finally, we would like to conclude with a very 46 00:02:00,870 --> 00:02:03,333 important graph, the scatter plot. 47 00:02:04,170 --> 00:02:08,190 It is used when representing two numerical variables. 48 00:02:08,190 --> 00:02:10,560 For this example, we have gathered the reading 49 00:02:10,560 --> 00:02:14,370 and writing SAT scores of 100 individuals. 50 00:02:14,370 --> 00:02:16,970 Let me first show you the graph before analyzing it. 51 00:02:18,060 --> 00:02:18,900 All right. 52 00:02:18,900 --> 00:02:22,260 First, SAT scores by component range from 200 to 53 00:02:22,260 --> 00:02:25,260 800 points, and that is why our data is bounded 54 00:02:25,260 --> 00:02:27,543 within the range of 200 to 800. 55 00:02:28,440 --> 00:02:31,680 Second, our vertical access shows the writing scores 56 00:02:31,680 --> 00:02:34,623 while the horizontal axis contains reading scores. 57 00:02:35,910 --> 00:02:39,210 Third, there are 100 students, and the results correspond 58 00:02:39,210 --> 00:02:41,193 to a specific point on the graph. 59 00:02:42,060 --> 00:02:43,830 Each point gives us information about 60 00:02:43,830 --> 00:02:46,380 a particular student's performance. 61 00:02:46,380 --> 00:02:48,330 For example, this is Jane. 62 00:02:48,330 --> 00:02:52,473 She scored 300 on writing, but 550 on the reading part. 63 00:02:53,700 --> 00:02:55,740 Scatter plots usually represent lots 64 00:02:55,740 --> 00:02:57,243 and lots of observations. 65 00:02:58,080 --> 00:02:59,550 When interpreting a scatter plot, 66 00:02:59,550 --> 00:03:01,500 a statistician is not expected to look 67 00:03:01,500 --> 00:03:03,150 into single data points. 68 00:03:03,150 --> 00:03:04,470 He would be much more interested 69 00:03:04,470 --> 00:03:07,773 in getting the main idea of how the data is distributed. 70 00:03:08,790 --> 00:03:11,070 Okay, the first thing we see is 71 00:03:11,070 --> 00:03:13,440 that there is an obvious up trend. 72 00:03:13,440 --> 00:03:16,110 This is because lower writing scores are usually obtained 73 00:03:16,110 --> 00:03:18,240 by students with lower reading scores 74 00:03:18,240 --> 00:03:20,010 and higher writing scores have been achieved 75 00:03:20,010 --> 00:03:22,050 by students with higher reading scores. 76 00:03:22,050 --> 00:03:24,240 This is logical, right? 77 00:03:24,240 --> 00:03:25,620 Students are more likely to do well 78 00:03:25,620 --> 00:03:28,473 on both because the two tasks are closely related. 79 00:03:29,400 --> 00:03:32,370 Second, we notice a concentration of students in the middle 80 00:03:32,370 --> 00:03:34,140 of the graph with scores in the region 81 00:03:34,140 --> 00:03:38,100 of 450 to 550 on both reading and writing. 82 00:03:38,100 --> 00:03:39,660 Remember we said that scores can be 83 00:03:39,660 --> 00:03:42,060 anywhere between 200 and 800? 84 00:03:42,060 --> 00:03:45,330 Well, 500 is the average score one can get 85 00:03:45,330 --> 00:03:49,448 so it makes sense that a lot of people fall into that area. 86 00:03:49,448 --> 00:03:52,020 Third, there is this group of people 87 00:03:52,020 --> 00:03:55,470 with both very high writing and reading scores. 88 00:03:55,470 --> 00:03:57,660 The exceptional students tend to be excellent 89 00:03:57,660 --> 00:03:58,803 at both components. 90 00:03:59,640 --> 00:04:01,350 This is less true for bad students, 91 00:04:01,350 --> 00:04:02,670 as their performance tends to 92 00:04:02,670 --> 00:04:04,773 deviate when performing different tasks. 93 00:04:05,970 --> 00:04:08,520 Finally, we have Jane from a minute ago. 94 00:04:08,520 --> 00:04:11,730 She is far away from every other observation as she scored 95 00:04:11,730 --> 00:04:14,910 above average on reading, but poorly on writing. 96 00:04:14,910 --> 00:04:17,220 This observation is called an outlier 97 00:04:17,220 --> 00:04:20,370 as it goes against the logic of the whole data set. 98 00:04:20,370 --> 00:04:22,800 We will learn more about outliers and how to treat them 99 00:04:22,800 --> 00:04:24,873 in our analysis later on in this course. 100 00:04:25,980 --> 00:04:28,230 So we have gone through the basics. 101 00:04:28,230 --> 00:04:31,020 We have covered populations, samples, 102 00:04:31,020 --> 00:04:34,710 types of variables, graphs and tables. 103 00:04:34,710 --> 00:04:36,630 And it is time for us to dive into 104 00:04:36,630 --> 00:04:39,030 the heart of descriptive statistics, 105 00:04:39,030 --> 00:04:42,810 measurements of central tendency and variability. 106 00:04:42,810 --> 00:04:43,810 Thanks for watching. 8455

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.