All language subtitles for 003 Categorical Variables - Visualization Techniques_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,540 --> 00:00:02,550 Narrator: Now that we've seen the different types of data 2 00:00:02,550 --> 00:00:04,530 and levels of measurement we can have, 3 00:00:04,530 --> 00:00:07,470 we are ready to explore different graphs and tables, 4 00:00:07,470 --> 00:00:09,240 which will allow us to visually represent 5 00:00:09,240 --> 00:00:10,690 the data we are working with. 6 00:00:11,610 --> 00:00:14,820 Visualizing data is the most intuitive way to interpret it, 7 00:00:14,820 --> 00:00:17,253 so it's an invaluable skill. 8 00:00:17,253 --> 00:00:20,250 It is much easier to visualize data 9 00:00:20,250 --> 00:00:23,070 if you know its type and measurement level. 10 00:00:23,070 --> 00:00:25,860 As you may recall, there are two types of variables, 11 00:00:25,860 --> 00:00:28,380 categorical and numerical. 12 00:00:28,380 --> 00:00:31,593 In this video, we will focus on categorical variables. 13 00:00:32,518 --> 00:00:34,890 Some of the most common ways to visualize them 14 00:00:34,890 --> 00:00:38,640 are frequency distribution tables, bar charts, pie charts, 15 00:00:38,640 --> 00:00:40,443 and Pareto diagrams. 16 00:00:41,670 --> 00:00:42,630 First, let's see 17 00:00:42,630 --> 00:00:44,980 what a frequency distribution table looks like. 18 00:00:46,200 --> 00:00:48,900 It has two columns, the category itself 19 00:00:48,900 --> 00:00:50,583 and the corresponding frequency. 20 00:00:51,600 --> 00:00:55,650 Imagine you own a car shop and you sell only German cars. 21 00:00:55,650 --> 00:00:58,890 The table below shows the categories of cars, 22 00:00:58,890 --> 00:01:01,500 Audi, BMW, and Mercedes, 23 00:01:01,500 --> 00:01:02,580 and their frequency, 24 00:01:02,580 --> 00:01:05,462 or in plain English, the number of units sold. 25 00:01:06,930 --> 00:01:09,090 By organizing your data in this way, 26 00:01:09,090 --> 00:01:11,190 you can compare the different brands 27 00:01:11,190 --> 00:01:13,763 and see that Audi has been sold the most, 28 00:01:13,763 --> 00:01:17,449 so that is a frequency distribution table. 29 00:01:17,449 --> 00:01:21,300 However, tables aren't much fun, are they? 30 00:01:21,300 --> 00:01:22,620 Using the same table, 31 00:01:22,620 --> 00:01:26,103 we can construct a bar chart, also known as column chart. 32 00:01:27,450 --> 00:01:29,970 The vertical axis shows the number of units sold, 33 00:01:29,970 --> 00:01:32,490 while each bar represents a different category 34 00:01:32,490 --> 00:01:34,353 indicated on the horizontal axis. 35 00:01:35,190 --> 00:01:37,410 In this way, it is much, much clearer 36 00:01:37,410 --> 00:01:39,663 that Audi is the best selling brand. 37 00:01:40,837 --> 00:01:45,480 Okay, let's represent the same data as a pie chart. 38 00:01:45,480 --> 00:01:47,760 In order to build one, we need to calculate 39 00:01:47,760 --> 00:01:50,760 what percentage of the total each brand represents. 40 00:01:50,760 --> 00:01:53,703 In statistics, this is known as relative frequency. 41 00:01:54,660 --> 00:01:59,280 Naturally, all relative frequencies add up to 100%. 42 00:01:59,280 --> 00:02:00,900 Pie charts are especially useful 43 00:02:00,900 --> 00:02:03,840 when we want to not only compare items among each other, 44 00:02:03,840 --> 00:02:05,793 but also see their share of the total. 45 00:02:07,042 --> 00:02:10,350 Okay, this example could be easily transformed 46 00:02:10,350 --> 00:02:13,140 into a business example of market share. 47 00:02:13,140 --> 00:02:16,050 Market share is so predominantly represented by pie charts 48 00:02:16,050 --> 00:02:19,020 that if you search for market share in Google Images, 49 00:02:19,020 --> 00:02:20,883 you would only get pie charts. 50 00:02:21,721 --> 00:02:24,000 Imagine that the data in our table 51 00:02:24,000 --> 00:02:28,050 is representing the sales of Audi, BMW, and Mercedes 52 00:02:28,050 --> 00:02:31,620 in a single German city, say, Bonn. 53 00:02:31,620 --> 00:02:33,480 The chart will show us the market share 54 00:02:33,480 --> 00:02:35,253 that each of these brands has. 55 00:02:36,783 --> 00:02:40,185 Lastly, we have the Pareto diagram. 56 00:02:40,185 --> 00:02:42,106 In fact, a Pareto diagram 57 00:02:42,106 --> 00:02:44,880 is nothing more than a special type of bar chart 58 00:02:44,880 --> 00:02:48,646 where categories are shown in descending order of frequency. 59 00:02:48,646 --> 00:02:49,740 By frequency, 60 00:02:49,740 --> 00:02:53,310 statisticians mean the number of occurrences of each item. 61 00:02:53,310 --> 00:02:55,560 As we said earlier, in our example, 62 00:02:55,560 --> 00:02:58,785 that's exactly the number of units sold. 63 00:02:58,785 --> 00:03:01,920 Let's go back to our frequency distribution table 64 00:03:01,920 --> 00:03:04,487 and order the brands by frequency. 65 00:03:04,487 --> 00:03:07,020 Now we can create the bar chart 66 00:03:07,020 --> 00:03:08,670 based on the reordered table, 67 00:03:08,670 --> 00:03:12,867 and voila, we almost have a Pareto diagram. 68 00:03:12,867 --> 00:03:16,140 There is one last touch to make it one, 69 00:03:16,140 --> 00:03:20,193 a curve on the same graph, showing the cumulative frequency. 70 00:03:21,690 --> 00:03:22,830 The cumulative frequency 71 00:03:22,830 --> 00:03:24,930 is the sum of the relative frequencies. 72 00:03:24,930 --> 00:03:27,296 It starts as the frequency of the first brand 73 00:03:27,296 --> 00:03:31,110 and then we add the second, the third, and so on, 74 00:03:31,110 --> 00:03:34,560 until it finishes at 100%. 75 00:03:34,560 --> 00:03:37,710 This polygonal line is measured by a different vertical axis 76 00:03:37,710 --> 00:03:39,270 on the right of the graph. 77 00:03:39,270 --> 00:03:40,830 At each of its vertices, 78 00:03:40,830 --> 00:03:43,743 it shows the subtotal of the categories to its left. 79 00:03:45,030 --> 00:03:46,410 See how the Pareto diagram 80 00:03:46,410 --> 00:03:50,040 combines the strong sides of the bar and the pie chart? 81 00:03:50,040 --> 00:03:52,830 It is easy to compare the data both between categories 82 00:03:52,830 --> 00:03:54,750 and as a part of the total. 83 00:03:54,750 --> 00:03:57,240 Furthermore, if this was a market share graph, 84 00:03:57,240 --> 00:03:58,920 you could easily see the market share 85 00:03:58,920 --> 00:04:02,220 of the top two or top five companies. 86 00:04:02,220 --> 00:04:05,340 Finally, it is named after Vilfredo Pareto. 87 00:04:05,340 --> 00:04:07,320 You may have heard of another idea of his, 88 00:04:07,320 --> 00:04:11,730 the Pareto Principle, also known as the 80-20 rule. 89 00:04:11,730 --> 00:04:13,680 It states that 80% of the effects 90 00:04:13,680 --> 00:04:16,320 come from 20% of the causes. 91 00:04:16,320 --> 00:04:18,990 A real life example is a statement by Microsoft 92 00:04:18,990 --> 00:04:22,050 that by fixing 20% of its software bugs, 93 00:04:22,050 --> 00:04:23,010 they managed to solve 94 00:04:23,010 --> 00:04:25,953 80% of the problems customers experience. 95 00:04:26,846 --> 00:04:30,420 A Pareto diagram can reveal information like that. 96 00:04:30,420 --> 00:04:33,120 It is designed to show how subtotals change 97 00:04:33,120 --> 00:04:34,740 with each additional category 98 00:04:34,740 --> 00:04:37,533 and provide us with a better understanding of our data. 99 00:04:38,760 --> 00:04:40,830 Okay, these are the main ways 100 00:04:40,830 --> 00:04:44,400 in which we can visually represent categorical data. 101 00:04:44,400 --> 00:04:46,800 In our next lecture, we will get acquainted 102 00:04:46,800 --> 00:04:51,150 with the most useful graphs and tables for numerical data. 103 00:04:51,150 --> 00:04:52,150 Thanks for watching. 8193

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.