Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,630 --> 00:00:02,670
Narrator: Okay, excellent.
2
00:00:02,670 --> 00:00:04,530
We already know how to create graphs
3
00:00:04,530 --> 00:00:06,930
and tables for categorical variables.
4
00:00:06,930 --> 00:00:09,030
In this lesson, we are going to do the same
5
00:00:09,030 --> 00:00:11,880
for numerical variables and given that numerical data
6
00:00:11,880 --> 00:00:13,680
is the main focus of this course,
7
00:00:13,680 --> 00:00:16,079
we will spend a couple of lessons on this topic.
8
00:00:16,950 --> 00:00:19,590
Whenever we wanna plot data, it is best to first order it
9
00:00:19,590 --> 00:00:23,400
in a table, so as we did with categorical variables,
10
00:00:23,400 --> 00:00:26,581
let's start by creating a frequency distribution table.
11
00:00:26,581 --> 00:00:29,022
Here's a list of 20 different numbers.
12
00:00:29,022 --> 00:00:31,440
If we arranged them in a frequency table
13
00:00:31,440 --> 00:00:33,780
like the one we use for categorical variables,
14
00:00:33,780 --> 00:00:36,090
we would obtain a table with 20 rows,
15
00:00:36,090 --> 00:00:37,800
each of them representing one number
16
00:00:37,800 --> 00:00:39,990
with a corresponding frequency of one
17
00:00:39,990 --> 00:00:43,230
as each number occurs exactly one time.
18
00:00:43,230 --> 00:00:47,024
This table would be impractical for any analysis, right?
19
00:00:47,024 --> 00:00:49,710
Well, when we deal with numerical variables,
20
00:00:49,710 --> 00:00:52,620
it makes much more sense to group the data into intervals
21
00:00:52,620 --> 00:00:55,050
and then find the corresponding frequencies.
22
00:00:55,050 --> 00:00:58,110
In this way, we make a summary of the data that allows
23
00:00:58,110 --> 00:01:00,303
for a meaningful visual representation.
24
00:01:01,350 --> 00:01:03,750
How do we choose these intervals?
25
00:01:03,750 --> 00:01:07,020
Generally, statisticians prefer working with groups of data
26
00:01:07,020 --> 00:01:09,870
that contain five to 20 intervals.
27
00:01:09,870 --> 00:01:12,360
This way, the summary can be useful.
28
00:01:12,360 --> 00:01:14,550
However, this varies from case to case
29
00:01:14,550 --> 00:01:16,770
and the correct choice of intervals largely depends
30
00:01:16,770 --> 00:01:19,560
on the amount of data we are working with.
31
00:01:19,560 --> 00:01:21,870
In our example, we will divide the data
32
00:01:21,870 --> 00:01:24,033
into five intervals of equal length.
33
00:01:25,140 --> 00:01:28,110
A simple formula that we use is as follows.
34
00:01:28,110 --> 00:01:30,660
The interval width is equal to the largest number
35
00:01:30,660 --> 00:01:32,970
minus the smallest number divided
36
00:01:32,970 --> 00:01:34,893
by the number of desired intervals.
37
00:01:36,030 --> 00:01:37,650
In our case, the length of the interval
38
00:01:37,650 --> 00:01:41,340
should be 100 minus one divided by five.
39
00:01:41,340 --> 00:01:43,383
The result is 19.8.
40
00:01:44,790 --> 00:01:46,950
Now we wanna round this number up
41
00:01:46,950 --> 00:01:49,980
in order to reach a neater representation.
42
00:01:49,980 --> 00:01:54,980
Therefore, our intervals will be as follows, one to 21,
43
00:01:55,170 --> 00:02:00,170
21 to 41, 41 to 61, 61 to 81, and 81 to 101.
44
00:02:02,760 --> 00:02:05,253
Each interval has a width of 20.
45
00:02:06,750 --> 00:02:07,800
Okay.
46
00:02:07,800 --> 00:02:10,953
Let's try to construct the frequency distribution table.
47
00:02:12,060 --> 00:02:14,610
A number is included in a particular interval
48
00:02:14,610 --> 00:02:17,100
if that number is greater than the lowest bound
49
00:02:17,100 --> 00:02:19,503
and equal to or less than the largest bound.
50
00:02:20,460 --> 00:02:21,990
As we can see from the table,
51
00:02:21,990 --> 00:02:23,811
there are two numbers in the first interval,
52
00:02:23,811 --> 00:02:27,480
four in the second, three in the third, six in the fourth,
53
00:02:27,480 --> 00:02:29,073
and five in the fifth interval.
54
00:02:30,570 --> 00:02:32,910
For many analyses, it is useful to calculate
55
00:02:32,910 --> 00:02:36,480
the relative frequency of the data points in each interval.
56
00:02:36,480 --> 00:02:38,340
As we said in a previous video,
57
00:02:38,340 --> 00:02:40,320
the relative frequency is the frequency
58
00:02:40,320 --> 00:02:42,453
of a given interval as part of the total.
59
00:02:43,470 --> 00:02:45,510
Let's add another column to our table
60
00:02:45,510 --> 00:02:47,850
and name it relative frequency.
61
00:02:47,850 --> 00:02:51,690
So the interval from one to 21 has an absolute frequency
62
00:02:51,690 --> 00:02:54,780
of two, but a relative frequency of two divided
63
00:02:54,780 --> 00:02:58,470
by the number of 20 numbers which gives us 10%
64
00:02:58,470 --> 00:03:00,573
and so on until we fill the table.
65
00:03:02,100 --> 00:03:02,940
All right!
66
00:03:02,940 --> 00:03:06,000
This is how we calculate relative frequencies.
67
00:03:06,000 --> 00:03:08,310
Now that we have summarized the raw data,
68
00:03:08,310 --> 00:03:09,573
we can start plotting it.
5428
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.