Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,540 --> 00:00:02,550
Narrator: Now that we've seen the different types of data
2
00:00:02,550 --> 00:00:04,530
and levels of measurement we can have,
3
00:00:04,530 --> 00:00:07,470
we are ready to explore different graphs and tables,
4
00:00:07,470 --> 00:00:09,240
which will allow us to visually represent
5
00:00:09,240 --> 00:00:10,690
the data we are working with.
6
00:00:11,610 --> 00:00:14,820
Visualizing data is the most intuitive way to interpret it,
7
00:00:14,820 --> 00:00:17,253
so it's an invaluable skill.
8
00:00:17,253 --> 00:00:20,250
It is much easier to visualize data
9
00:00:20,250 --> 00:00:23,070
if you know its type and measurement level.
10
00:00:23,070 --> 00:00:25,860
As you may recall, there are two types of variables,
11
00:00:25,860 --> 00:00:28,380
categorical and numerical.
12
00:00:28,380 --> 00:00:31,593
In this video, we will focus on categorical variables.
13
00:00:32,518 --> 00:00:34,890
Some of the most common ways to visualize them
14
00:00:34,890 --> 00:00:38,640
are frequency distribution tables, bar charts, pie charts,
15
00:00:38,640 --> 00:00:40,443
and Pareto diagrams.
16
00:00:41,670 --> 00:00:42,630
First, let's see
17
00:00:42,630 --> 00:00:44,980
what a frequency distribution table looks like.
18
00:00:46,200 --> 00:00:48,900
It has two columns, the category itself
19
00:00:48,900 --> 00:00:50,583
and the corresponding frequency.
20
00:00:51,600 --> 00:00:55,650
Imagine you own a car shop and you sell only German cars.
21
00:00:55,650 --> 00:00:58,890
The table below shows the categories of cars,
22
00:00:58,890 --> 00:01:01,500
Audi, BMW, and Mercedes,
23
00:01:01,500 --> 00:01:02,580
and their frequency,
24
00:01:02,580 --> 00:01:05,462
or in plain English, the number of units sold.
25
00:01:06,930 --> 00:01:09,090
By organizing your data in this way,
26
00:01:09,090 --> 00:01:11,190
you can compare the different brands
27
00:01:11,190 --> 00:01:13,763
and see that Audi has been sold the most,
28
00:01:13,763 --> 00:01:17,449
so that is a frequency distribution table.
29
00:01:17,449 --> 00:01:21,300
However, tables aren't much fun, are they?
30
00:01:21,300 --> 00:01:22,620
Using the same table,
31
00:01:22,620 --> 00:01:26,103
we can construct a bar chart, also known as column chart.
32
00:01:27,450 --> 00:01:29,970
The vertical axis shows the number of units sold,
33
00:01:29,970 --> 00:01:32,490
while each bar represents a different category
34
00:01:32,490 --> 00:01:34,353
indicated on the horizontal axis.
35
00:01:35,190 --> 00:01:37,410
In this way, it is much, much clearer
36
00:01:37,410 --> 00:01:39,663
that Audi is the best selling brand.
37
00:01:40,837 --> 00:01:45,480
Okay, let's represent the same data as a pie chart.
38
00:01:45,480 --> 00:01:47,760
In order to build one, we need to calculate
39
00:01:47,760 --> 00:01:50,760
what percentage of the total each brand represents.
40
00:01:50,760 --> 00:01:53,703
In statistics, this is known as relative frequency.
41
00:01:54,660 --> 00:01:59,280
Naturally, all relative frequencies add up to 100%.
42
00:01:59,280 --> 00:02:00,900
Pie charts are especially useful
43
00:02:00,900 --> 00:02:03,840
when we want to not only compare items among each other,
44
00:02:03,840 --> 00:02:05,793
but also see their share of the total.
45
00:02:07,042 --> 00:02:10,350
Okay, this example could be easily transformed
46
00:02:10,350 --> 00:02:13,140
into a business example of market share.
47
00:02:13,140 --> 00:02:16,050
Market share is so predominantly represented by pie charts
48
00:02:16,050 --> 00:02:19,020
that if you search for market share in Google Images,
49
00:02:19,020 --> 00:02:20,883
you would only get pie charts.
50
00:02:21,721 --> 00:02:24,000
Imagine that the data in our table
51
00:02:24,000 --> 00:02:28,050
is representing the sales of Audi, BMW, and Mercedes
52
00:02:28,050 --> 00:02:31,620
in a single German city, say, Bonn.
53
00:02:31,620 --> 00:02:33,480
The chart will show us the market share
54
00:02:33,480 --> 00:02:35,253
that each of these brands has.
55
00:02:36,783 --> 00:02:40,185
Lastly, we have the Pareto diagram.
56
00:02:40,185 --> 00:02:42,106
In fact, a Pareto diagram
57
00:02:42,106 --> 00:02:44,880
is nothing more than a special type of bar chart
58
00:02:44,880 --> 00:02:48,646
where categories are shown in descending order of frequency.
59
00:02:48,646 --> 00:02:49,740
By frequency,
60
00:02:49,740 --> 00:02:53,310
statisticians mean the number of occurrences of each item.
61
00:02:53,310 --> 00:02:55,560
As we said earlier, in our example,
62
00:02:55,560 --> 00:02:58,785
that's exactly the number of units sold.
63
00:02:58,785 --> 00:03:01,920
Let's go back to our frequency distribution table
64
00:03:01,920 --> 00:03:04,487
and order the brands by frequency.
65
00:03:04,487 --> 00:03:07,020
Now we can create the bar chart
66
00:03:07,020 --> 00:03:08,670
based on the reordered table,
67
00:03:08,670 --> 00:03:12,867
and voila, we almost have a Pareto diagram.
68
00:03:12,867 --> 00:03:16,140
There is one last touch to make it one,
69
00:03:16,140 --> 00:03:20,193
a curve on the same graph, showing the cumulative frequency.
70
00:03:21,690 --> 00:03:22,830
The cumulative frequency
71
00:03:22,830 --> 00:03:24,930
is the sum of the relative frequencies.
72
00:03:24,930 --> 00:03:27,296
It starts as the frequency of the first brand
73
00:03:27,296 --> 00:03:31,110
and then we add the second, the third, and so on,
74
00:03:31,110 --> 00:03:34,560
until it finishes at 100%.
75
00:03:34,560 --> 00:03:37,710
This polygonal line is measured by a different vertical axis
76
00:03:37,710 --> 00:03:39,270
on the right of the graph.
77
00:03:39,270 --> 00:03:40,830
At each of its vertices,
78
00:03:40,830 --> 00:03:43,743
it shows the subtotal of the categories to its left.
79
00:03:45,030 --> 00:03:46,410
See how the Pareto diagram
80
00:03:46,410 --> 00:03:50,040
combines the strong sides of the bar and the pie chart?
81
00:03:50,040 --> 00:03:52,830
It is easy to compare the data both between categories
82
00:03:52,830 --> 00:03:54,750
and as a part of the total.
83
00:03:54,750 --> 00:03:57,240
Furthermore, if this was a market share graph,
84
00:03:57,240 --> 00:03:58,920
you could easily see the market share
85
00:03:58,920 --> 00:04:02,220
of the top two or top five companies.
86
00:04:02,220 --> 00:04:05,340
Finally, it is named after Vilfredo Pareto.
87
00:04:05,340 --> 00:04:07,320
You may have heard of another idea of his,
88
00:04:07,320 --> 00:04:11,730
the Pareto Principle, also known as the 80-20 rule.
89
00:04:11,730 --> 00:04:13,680
It states that 80% of the effects
90
00:04:13,680 --> 00:04:16,320
come from 20% of the causes.
91
00:04:16,320 --> 00:04:18,990
A real life example is a statement by Microsoft
92
00:04:18,990 --> 00:04:22,050
that by fixing 20% of its software bugs,
93
00:04:22,050 --> 00:04:23,010
they managed to solve
94
00:04:23,010 --> 00:04:25,953
80% of the problems customers experience.
95
00:04:26,846 --> 00:04:30,420
A Pareto diagram can reveal information like that.
96
00:04:30,420 --> 00:04:33,120
It is designed to show how subtotals change
97
00:04:33,120 --> 00:04:34,740
with each additional category
98
00:04:34,740 --> 00:04:37,533
and provide us with a better understanding of our data.
99
00:04:38,760 --> 00:04:40,830
Okay, these are the main ways
100
00:04:40,830 --> 00:04:44,400
in which we can visually represent categorical data.
101
00:04:44,400 --> 00:04:46,800
In our next lecture, we will get acquainted
102
00:04:46,800 --> 00:04:51,150
with the most useful graphs and tables for numerical data.
103
00:04:51,150 --> 00:04:52,150
Thanks for watching.
8193
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.