Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
1
00:00:05,250 --> 00:00:07,890
The algorithms we've looked at so far
2
2
00:00:07,890 --> 00:00:11,000
don't make any assumptions about the data they're sorting.
3
3
00:00:11,000 --> 00:00:15,170
We can sort integers, floats, strings, objects anything.
4
4
00:00:15,170 --> 00:00:18,380
The algorithms don't assume a certain data type.
5
5
00:00:18,380 --> 00:00:20,460
Specific implementations do
6
6
00:00:20,460 --> 00:00:22,700
but not the algorithms themselves.
7
7
00:00:22,700 --> 00:00:25,240
They also don't assume that the data being sorted
8
8
00:00:25,240 --> 00:00:27,290
is bound in any way.
9
9
00:00:27,290 --> 00:00:29,830
For example, they don't assume that all the values
10
10
00:00:29,830 --> 00:00:32,340
being sorted will be less than 100.
11
11
00:00:32,340 --> 00:00:35,100
There are algorithms that do make assumptions
12
12
00:00:35,100 --> 00:00:36,410
about the data being sorted
13
13
00:00:36,410 --> 00:00:38,190
and because they make assumptions
14
14
00:00:38,190 --> 00:00:40,730
they can sort the data more efficiently.
15
15
00:00:40,730 --> 00:00:45,730
In fact they can achieve linear time, O to the N complexity
16
16
00:00:46,170 --> 00:00:50,010
and so the next couple of algorithms we're going to look at
17
17
00:00:50,010 --> 00:00:52,980
make assumptions about the data they're sorting
18
18
00:00:52,980 --> 00:00:56,840
and the first algorithm is called counting sort.
19
19
00:00:56,840 --> 00:00:59,040
So counting sort makes assumptions
20
20
00:00:59,040 --> 00:01:00,890
about the data that it's sorting,
21
21
00:01:00,890 --> 00:01:04,550
it assumes that all the values are discrete
22
22
00:01:04,550 --> 00:01:07,530
and they're within a specific range.
23
23
00:01:07,530 --> 00:01:09,220
So this algorithm only works
24
24
00:01:09,220 --> 00:01:11,250
with non-negative discrete values.
25
25
00:01:11,250 --> 00:01:15,390
We can't sort floating point numbers, we can't sort strings
26
26
00:01:15,390 --> 00:01:18,980
because they don't have discrete values.
27
27
00:01:18,980 --> 00:01:21,550
So practically speaking, you'll usually use
28
28
00:01:21,550 --> 00:01:23,730
this algorithm with whole numbers.
29
29
00:01:23,730 --> 00:01:26,700
So the way it works is it doesn't use comparisons,
30
30
00:01:26,700 --> 00:01:29,090
so it doesn't actually compare values
31
31
00:01:29,090 --> 00:01:30,860
in the array against each other
32
32
00:01:30,860 --> 00:01:35,660
instead it counts the number of occurrences of each value
33
33
00:01:35,660 --> 00:01:37,940
and you'll see how it does this
34
34
00:01:37,940 --> 00:01:40,480
when we go through the algorithm
35
35
00:01:40,480 --> 00:01:43,290
for a specific array in just a minute.
36
36
00:01:43,290 --> 00:01:47,110
Now as I said, values must be within a specific range
37
37
00:01:47,110 --> 00:01:49,130
and the range has to be reasonable.
38
38
00:01:49,130 --> 00:01:50,970
It can't be huge.
39
39
00:01:50,970 --> 00:01:52,830
You're not going to use counting sort
40
40
00:01:52,830 --> 00:01:55,560
to sort values that are between
41
41
00:01:55,560 --> 00:01:57,930
one and a million, for example.
42
42
00:01:57,930 --> 00:02:01,050
Now our array, the usual array that we've been using,
43
43
00:02:01,050 --> 00:02:03,620
is not a good candidate for counting sort.
44
44
00:02:03,620 --> 00:02:06,400
It has negative numbers, so we're going to use a different
45
45
00:02:06,400 --> 00:02:10,260
array to go through this algorithm by hand
46
46
00:02:10,260 --> 00:02:11,940
and so this is the array we're going to use.
47
47
00:02:11,940 --> 00:02:15,270
We're going to assume that we're sorting values
48
48
00:02:15,270 --> 00:02:17,900
that are between one and 10 inclusive,
49
49
00:02:17,900 --> 00:02:20,900
so the values can be one, two, three, four,
50
50
00:02:20,900 --> 00:02:23,340
five, six, seven, eight, nine or 10.
51
51
00:02:23,340 --> 00:02:26,900
We have 10 possible values and so what we do
52
52
00:02:26,900 --> 00:02:30,150
is we create a counting array of length 10.
53
53
00:02:30,150 --> 00:02:32,160
So this is going to be a separate array
54
54
00:02:32,160 --> 00:02:34,310
from the one that we're sorting
55
55
00:02:34,310 --> 00:02:36,350
and then we traverse the input array,
56
56
00:02:36,350 --> 00:02:38,700
the one we're sorting, from left to right
57
57
00:02:38,700 --> 00:02:41,590
and using the counting array we track how many
58
58
00:02:41,590 --> 00:02:43,750
of each value are in the input array
59
59
00:02:43,750 --> 00:02:47,010
and then once we've traversed the entire input array,
60
60
00:02:47,010 --> 00:02:49,590
using the counts in the counting array,
61
61
00:02:49,590 --> 00:02:53,790
we write the values in sorted order back to the input array.
62
62
00:02:53,790 --> 00:02:55,780
And so we're going to sort the array
63
63
00:02:55,780 --> 00:02:57,350
that you see on the screen
64
64
00:02:57,350 --> 00:03:00,220
and so when i = 0, we're going to look
65
65
00:03:00,220 --> 00:03:04,010
at the value at position zero and we see it's two
66
66
00:03:04,010 --> 00:03:08,490
and so we add one to the counting array at position one
67
67
00:03:08,490 --> 00:03:11,290
because we know the values are between one and 10,
68
68
00:03:11,290 --> 00:03:13,750
in the counting array position zero
69
69
00:03:13,750 --> 00:03:16,520
is going to hold the number of one's we find.
70
70
00:03:16,520 --> 00:03:19,710
Position one will hold the number of two's we find.
71
71
00:03:19,710 --> 00:03:23,800
Position two will hold the number of three's we find etc
72
72
00:03:23,800 --> 00:03:26,380
and so in the first position we found a two
73
73
00:03:26,380 --> 00:03:30,550
so we increment counting array index one,
74
74
00:03:30,550 --> 00:03:31,930
we increment it by one.
75
75
00:03:31,930 --> 00:03:34,020
Then we're going to increment i to one
76
76
00:03:34,020 --> 00:03:38,160
and at the input array position one we have a five
77
77
00:03:38,160 --> 00:03:41,580
so we're going to increment counting array four
78
78
00:03:41,580 --> 00:03:44,640
because that's where we're counting the number of fives.
79
79
00:03:44,640 --> 00:03:47,960
We increment i to two, we have a nine,
80
80
00:03:47,960 --> 00:03:50,140
so we're going to increment counting array
81
81
00:03:50,140 --> 00:03:54,000
position eight by one because we found one nine.
82
82
00:03:54,000 --> 00:03:55,820
When i is three we have an eight,
83
83
00:03:55,820 --> 00:03:59,390
so we're going to increment counting array seven by one.
84
84
00:03:59,390 --> 00:04:01,980
At position four we found another two
85
85
00:04:01,980 --> 00:04:05,430
and so we're going to increment counting array one by one,
86
86
00:04:05,430 --> 00:04:06,960
so now we have two there because
87
87
00:04:06,960 --> 00:04:09,860
so far we've seen two two's in our array.
88
88
00:04:09,860 --> 00:04:13,110
Next we find another eight, so counting array seven
89
89
00:04:13,110 --> 00:04:15,700
will be incremented by one so we now have a two there
90
90
00:04:15,700 --> 00:04:17,550
because we've seen two eight's.
91
91
00:04:17,550 --> 00:04:19,270
Then we find a seven so we're going to
92
92
00:04:19,270 --> 00:04:21,940
increment counting array six.
93
93
00:04:21,940 --> 00:04:23,390
We found a 10 so we're going to
94
94
00:04:23,390 --> 00:04:25,660
increment counting array nine.
95
95
00:04:25,660 --> 00:04:27,210
We found a four, so we going to
96
96
00:04:27,210 --> 00:04:28,820
increment counting array three
97
97
00:04:29,680 --> 00:04:31,600
and finally we find a three
98
98
00:04:31,600 --> 00:04:34,800
and so we're going to increment counting array two by one.
99
99
00:04:34,800 --> 00:04:38,140
And so at this point we've traversed our entire input array
100
100
00:04:38,140 --> 00:04:40,250
and we have counted how many occurrences
101
101
00:04:40,250 --> 00:04:41,730
of each value we've seen
102
102
00:04:41,730 --> 00:04:46,090
and so what we want to do now is using the counting array,
103
103
00:04:46,090 --> 00:04:49,920
we're going to write back the values to the original array.
104
104
00:04:49,920 --> 00:04:53,430
So in this slide the counting array is at the top
105
105
00:04:53,430 --> 00:04:56,220
and the input array is at the bottom,
106
106
00:04:56,220 --> 00:04:57,310
after we've written it back.
107
107
00:04:57,310 --> 00:04:58,500
So the way this would work is,
108
108
00:04:58,500 --> 00:05:00,800
we'd traverse the counting array
109
109
00:05:00,800 --> 00:05:04,500
and we would say okay so there is zero one's
110
110
00:05:04,500 --> 00:05:06,200
and so we don't have to do anything.
111
111
00:05:06,200 --> 00:05:08,870
We just move to the next position in the counting array.
112
112
00:05:08,870 --> 00:05:10,510
Okay there's two two's
113
113
00:05:10,510 --> 00:05:12,330
and so at this point we're gonna write
114
114
00:05:12,330 --> 00:05:14,890
two two's into the input array
115
115
00:05:14,890 --> 00:05:16,220
and then there's one three
116
116
00:05:16,220 --> 00:05:19,100
so we'll write one three to the input array.
117
117
00:05:19,100 --> 00:05:23,260
There's one four so we'll write one four to the input array.
118
118
00:05:23,260 --> 00:05:27,330
There's one five so we'll write one five to the input array.
119
119
00:05:27,330 --> 00:05:30,360
There are zero sixes, so we don't do anything.
120
120
00:05:30,360 --> 00:05:32,550
There is one seven, so we write
121
121
00:05:32,550 --> 00:05:34,740
one seven to the input array.
122
122
00:05:34,740 --> 00:05:36,900
There are two eights, so we're gonna write
123
123
00:05:36,900 --> 00:05:39,070
two eight's to the input array.
124
124
00:05:39,070 --> 00:05:42,940
There's one nine, so we write one nine to the input array
125
125
00:05:42,940 --> 00:05:44,600
and finally there's one 10
126
126
00:05:44,600 --> 00:05:47,190
and so we write one 10 to the input array.
127
127
00:05:47,190 --> 00:05:49,800
So essentially counting sort has two phases.
128
128
00:05:49,800 --> 00:05:51,987
The first phase is we traverse the input array
129
129
00:05:51,987 --> 00:05:54,980
and we count how many of each value we have
130
130
00:05:54,980 --> 00:05:58,520
and then in the second phase using the counting array,
131
131
00:05:58,520 --> 00:06:01,380
we write values back into the input array
132
132
00:06:01,380 --> 00:06:04,920
and so as you can see our input array is now sorted
133
133
00:06:04,920 --> 00:06:06,790
and we didn't do any comparisons.
134
134
00:06:06,790 --> 00:06:10,100
We didn't compare any of the elements against each other.
135
135
00:06:10,100 --> 00:06:13,530
We're just merely counting, how many of each value we have
136
136
00:06:13,530 --> 00:06:16,480
and that's why the algorithm is called counting sort.
137
137
00:06:16,480 --> 00:06:19,070
Now this is not an in-place algorithm.
138
138
00:06:19,070 --> 00:06:21,850
As we've seen we have to create a counting array
139
139
00:06:21,850 --> 00:06:23,830
and the length of the counting array
140
140
00:06:23,830 --> 00:06:28,020
will depend on the range of values we have,
141
141
00:06:28,020 --> 00:06:31,770
and because of this counting sort is best used,
142
142
00:06:31,770 --> 00:06:34,250
when the range of values you have
143
143
00:06:34,250 --> 00:06:38,240
is around the same length of the input array.
144
144
00:06:38,240 --> 00:06:41,700
So if you have an input array of 20,
145
145
00:06:41,700 --> 00:06:43,930
ideally the number of values that there are,
146
146
00:06:43,930 --> 00:06:45,680
you want to be that to be around 20
147
147
00:06:45,680 --> 00:06:49,460
because think about if you have an array of 10 elements
148
148
00:06:49,460 --> 00:06:52,750
and the range of values can go from one to 10,000,
149
149
00:06:52,750 --> 00:06:55,100
then your going to have to create a counting array
150
150
00:06:55,100 --> 00:06:58,140
of length 10,000 to sort 10 elements
151
151
00:06:58,140 --> 00:07:00,610
which would obviously be ridiculous.
152
152
00:07:00,610 --> 00:07:03,630
So counting sort works best when
153
153
00:07:03,630 --> 00:07:06,530
the range of values that you can have,
154
154
00:07:06,530 --> 00:07:08,380
the number of values in that range
155
155
00:07:08,380 --> 00:07:10,670
is roughly equivalent to the number of values
156
156
00:07:10,670 --> 00:07:12,780
that you want to sort.
157
157
00:07:12,780 --> 00:07:15,750
So in our case we had a range of 10 values
158
158
00:07:15,750 --> 00:07:18,800
and we had an array of 10 so it was perfect.
159
159
00:07:18,800 --> 00:07:22,610
As I mentioned this algorithm can achieve linear time
160
160
00:07:22,610 --> 00:07:24,730
and it can do that because it's making
161
161
00:07:24,730 --> 00:07:26,340
assumptions about the data
162
162
00:07:26,340 --> 00:07:28,590
and so because it can make these assumptions,
163
163
00:07:28,590 --> 00:07:31,220
it can achieve linear time.
164
164
00:07:31,220 --> 00:07:34,330
Now the counting sort I showed you is unstable.
165
165
00:07:34,330 --> 00:07:37,690
If we want it to be stable there are extra steps
166
166
00:07:37,690 --> 00:07:39,790
we can take with counting sort
167
167
00:07:39,790 --> 00:07:42,990
instead of just incrementing a counter for example
168
168
00:07:42,990 --> 00:07:45,550
you would perhaps use what's called linked list
169
169
00:07:45,550 --> 00:07:48,530
in the counting array, we're gonna cover linked list later.
170
170
00:07:48,530 --> 00:07:51,050
Okay that's it for learning what
171
171
00:07:51,050 --> 00:07:52,990
the counting sort algorithm is.
172
172
00:07:52,990 --> 00:07:54,380
Let's go ahead and implement it.
173
173
00:07:54,380 --> 00:07:55,930
I'll see you in the next video.
15419
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.