Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:01:17,345
[MUSIC PLAYING]
1
00:01:17,345 --> 00:01:22,170
DAVID J. MALAN: This is CS50, and this is already week three.
2
00:01:22,170 --> 00:01:25,695
And even as we've gotten much more into the minutia of programming
3
00:01:25,695 --> 00:01:27,570
and some of the C stuff that we've been doing
4
00:01:27,570 --> 00:01:30,840
is all the more cryptic looking, recall that at the end of the day,
5
00:01:30,840 --> 00:01:34,647
like, everything we've been doing ultimately fits into to this model.
6
00:01:34,647 --> 00:01:36,480
So keep that in mind, particularly as things
7
00:01:36,480 --> 00:01:39,230
seem like they're getting more complicated and more sophisticated.
8
00:01:39,230 --> 00:01:41,850
It's just a process of learning a new language that ultimately
9
00:01:41,850 --> 00:01:44,430
lets us express this process.
10
00:01:44,430 --> 00:01:47,490
And of course, last week we really went into the weeds of like how
11
00:01:47,490 --> 00:01:49,240
inputs and outputs are represented.
12
00:01:49,240 --> 00:01:53,580
And this thing here, a photograph thereof, is called what?
13
00:01:53,580 --> 00:01:54,742
This is what?
14
00:01:54,742 --> 00:01:55,628
AUDIENCE: RAM.
15
00:01:55,628 --> 00:01:56,670
DAVID J. MALAN: RAM, I heard--
16
00:01:56,670 --> 00:01:59,670
Random Access Memory or just generally known as memory.
17
00:01:59,670 --> 00:02:02,280
And recall that we looked at one of these little black chips
18
00:02:02,280 --> 00:02:04,500
that contains all of the bytes--
19
00:02:04,500 --> 00:02:05,790
all of the bits, ultimately.
20
00:02:05,790 --> 00:02:08,400
It's just kind of a grid, sort of an artist grid, that
21
00:02:08,400 --> 00:02:11,520
allows us to think about every one of these memory locations
22
00:02:11,520 --> 00:02:14,040
as just having a number or an address, so to speak.
23
00:02:14,040 --> 00:02:16,770
Like, this might be byte number 0 and then 1 and then 2
24
00:02:16,770 --> 00:02:19,680
and then, maybe way down here again, something like 2 billion
25
00:02:19,680 --> 00:02:22,080
if you have 2 gigabytes of memory.
26
00:02:22,080 --> 00:02:26,070
And so as we did that, we started to explore how we could use this canvas
27
00:02:26,070 --> 00:02:29,760
to create kind of our own information, our own inputs and outputs, not
28
00:02:29,760 --> 00:02:32,380
just the basics like ints and floats and so forth.
29
00:02:32,380 --> 00:02:34,470
But we also talked about strings.
30
00:02:34,470 --> 00:02:37,080
And what is a string as you now know it?
31
00:02:37,080 --> 00:02:40,200
How would you describe in layperson's terms a string?
32
00:02:40,200 --> 00:02:41,010
Yeah, over there.
33
00:02:41,010 --> 00:02:41,990
AUDIENCE: I was gonna say--
34
00:02:41,990 --> 00:02:42,480
[AUDIO OUT]
35
00:02:42,480 --> 00:02:43,897
DAVID J. MALAN: An array of characters.
36
00:02:43,897 --> 00:02:45,690
And an array, meanwhile-- let's go there.
37
00:02:45,690 --> 00:02:50,580
How might someone else define an array in more familiar now terms?
38
00:02:50,580 --> 00:02:53,160
What would be an array?
39
00:02:53,160 --> 00:02:53,670
Yeah.
40
00:02:53,670 --> 00:02:57,070
AUDIENCE: Kind of like an indexed set of things.
41
00:02:57,070 --> 00:02:59,060
DAVID J. MALAN: An indexed set of things-- not bad.
42
00:02:59,060 --> 00:03:02,210
And I think a key characteristic to keep in mind with an array is that it
43
00:03:02,210 --> 00:03:03,890
does actually pertain to memory.
44
00:03:03,890 --> 00:03:05,810
And it's contiguous memory.
45
00:03:05,810 --> 00:03:09,200
Byte after byte after byte is what constitutes an array.
46
00:03:09,200 --> 00:03:11,810
And we'll see in a couple of weeks time that there's actually
47
00:03:11,810 --> 00:03:15,770
more interesting ways to use this same primitive Canvas to stitch together
48
00:03:15,770 --> 00:03:19,240
things that are sort of two directional even that have some kind of shape
49
00:03:19,240 --> 00:03:19,740
to them.
50
00:03:19,740 --> 00:03:22,970
But for now, all we've talked about is arrays and just using these things
51
00:03:22,970 --> 00:03:27,240
from left to right, top to bottom, contiguously to represent information.
52
00:03:27,240 --> 00:03:29,900
So today, we'll consider still an array.
53
00:03:29,900 --> 00:03:33,020
But we won't focus so much on representation
54
00:03:33,020 --> 00:03:34,460
of strings or other data types.
55
00:03:34,460 --> 00:03:36,918
We'll actually now focus on the other part of that process,
56
00:03:36,918 --> 00:03:39,950
of inputs becoming outputs, namely the thing in the middle--
57
00:03:39,950 --> 00:03:40,940
algorithms.
58
00:03:40,940 --> 00:03:45,200
But we have to keep in mind, even though every time we've looked at an array
59
00:03:45,200 --> 00:03:48,610
thus far, certainly on the board like this, you as a human certainly
60
00:03:48,610 --> 00:03:50,360
have the luxury of just kind of eyeballing
61
00:03:50,360 --> 00:03:53,820
the whole thing with a bird's eye view and seeing where all of those numbers
62
00:03:53,820 --> 00:03:54,320
are.
63
00:03:54,320 --> 00:03:56,153
If I asked you where a particular number is,
64
00:03:56,153 --> 00:03:59,150
like zero, odds are your eyes would go right to where it is,
65
00:03:59,150 --> 00:04:02,150
and boom, problem solved in sort of one step.
66
00:04:02,150 --> 00:04:07,260
But the catch is, with a computer that has this memory, even though you,
67
00:04:07,260 --> 00:04:10,730
the human, can [INAUDIBLE] see everything at once, a computer cannot.
68
00:04:10,730 --> 00:04:14,090
It's better to think of your computer's memory, your phone's memory,
69
00:04:14,090 --> 00:04:17,000
or more specifically an array of memory like this
70
00:04:17,000 --> 00:04:21,500
as really being a set of closed doors, not unlike lockers in a school.
71
00:04:21,500 --> 00:04:24,260
And only by opening each of those doors can
72
00:04:24,260 --> 00:04:26,090
the computer actually see what's in there,
73
00:04:26,090 --> 00:04:28,610
which is to say that the computer, unlike you, doesn't
74
00:04:28,610 --> 00:04:32,480
have this bird's eye view of all of the data in all these locations.
75
00:04:32,480 --> 00:04:35,400
It has to much more methodically look here,
76
00:04:35,400 --> 00:04:39,590
maybe look here, maybe look here, and so forth in order to find something.
77
00:04:39,590 --> 00:04:43,670
Now fortunately, we already have some building blocks-- loops, conditions,
78
00:04:43,670 --> 00:04:45,210
Boolean expressions, and the like--
79
00:04:45,210 --> 00:04:47,120
where you could imagine writing some code
80
00:04:47,120 --> 00:04:51,680
that very methodically goes from left to right or right to left or something
81
00:04:51,680 --> 00:04:55,490
more sophisticated that actually finds something you're looking for.
82
00:04:55,490 --> 00:04:58,850
And just remember that the conventions we've had since last week
83
00:04:58,850 --> 00:05:03,320
now is that these arrays are zero indexed, so to speak.
84
00:05:03,320 --> 00:05:08,150
To be zero indexed just means that the data type starts counting from zero.
85
00:05:08,150 --> 00:05:11,930
So this is location 0, 1, 2, 3, 4, 5, 6.
86
00:05:11,930 --> 00:05:15,320
And notice even though there are seven total doors here,
87
00:05:15,320 --> 00:05:17,930
the right-most one, of course, is called 6
88
00:05:17,930 --> 00:05:19,940
just because we've started counting at 0.
89
00:05:19,940 --> 00:05:24,680
So in the general case, if you had n doors or n bytes of memory,
90
00:05:24,680 --> 00:05:29,480
0 would always be at the left, and n minus 1 would always be at the right.
91
00:05:29,480 --> 00:05:33,860
That's sort of a generalization of just thinking about this kind of convention.
92
00:05:33,860 --> 00:05:37,790
All right, so let's revisit the problem that we started the whole term off
93
00:05:37,790 --> 00:05:40,523
with in week zero, which was this notion of searching.
94
00:05:40,523 --> 00:05:42,440
And what does it mean to search for something?
95
00:05:42,440 --> 00:05:45,218
Well, to find information-- and this, of course, is omnipresent.
96
00:05:45,218 --> 00:05:48,260
Anytime you take out your phone, you're searching for a friend's contact.
97
00:05:48,260 --> 00:05:51,140
Any time you pull up a browser, you're googling for this or that.
98
00:05:51,140 --> 00:05:55,310
So search is kind of one of the most omnipresent topics and features
99
00:05:55,310 --> 00:05:56,877
of any device these days.
100
00:05:56,877 --> 00:05:59,960
So let's consider how the Googles, the Apples, the Microsofts of the world
101
00:05:59,960 --> 00:06:03,660
are implementing something as seemingly familiar as this.
102
00:06:03,660 --> 00:06:06,020
So here might be the problem statement.
103
00:06:06,020 --> 00:06:08,450
We want some input to become some output.
104
00:06:08,450 --> 00:06:09,800
What's that input going to be?
105
00:06:09,800 --> 00:06:13,520
Maybe it's a bunch of closed doors like this out of which we want
106
00:06:13,520 --> 00:06:16,250
to get back an answer, true or false.
107
00:06:16,250 --> 00:06:18,785
Is something we're looking for there or not?
108
00:06:18,785 --> 00:06:21,410
You can imagine taking this one step further and trying to find
109
00:06:21,410 --> 00:06:23,520
where is the thing you're looking for.
110
00:06:23,520 --> 00:06:25,940
But for now, let's just take one bite out of the problem.
111
00:06:25,940 --> 00:06:30,560
Can we tell ourselves, true or false, is some number behind one
112
00:06:30,560 --> 00:06:33,390
of these doors or lockers in memory?
113
00:06:33,390 --> 00:06:37,760
But before we go there and start talking about ways to do that-- that is,
114
00:06:37,760 --> 00:06:38,580
algorithms.
115
00:06:38,580 --> 00:06:42,680
Let's consider how we might lay the foundation of, like,
116
00:06:42,680 --> 00:06:46,108
comparing whether one algorithm is better than another.
117
00:06:46,108 --> 00:06:47,900
We talked about correctness, and it sort of
118
00:06:47,900 --> 00:06:51,440
goes without saying that any code you write, any algorithm you implement,
119
00:06:51,440 --> 00:06:52,460
had better be correct.
120
00:06:52,460 --> 00:06:55,340
Otherwise, what's the point if it doesn't give you the right answers?
121
00:06:55,340 --> 00:06:57,260
But we also talked about design.
122
00:06:57,260 --> 00:07:01,280
And in your own words, what do we mean when we say a program is better
123
00:07:01,280 --> 00:07:04,470
designed at this stage than another?
124
00:07:04,470 --> 00:07:07,880
How do you think about this notion of design now?
125
00:07:07,880 --> 00:07:09,050
Yeah, in the middle?
126
00:07:09,050 --> 00:07:11,300
AUDIENCE: Easier to understand or easier to institute.
127
00:07:11,300 --> 00:07:12,990
DAVID J. MALAN: OK, so easier to understand.
128
00:07:12,990 --> 00:07:13,825
I like that.
129
00:07:13,825 --> 00:07:14,450
Other thoughts?
130
00:07:14,450 --> 00:07:14,950
Yeah.
131
00:07:14,950 --> 00:07:15,980
AUDIENCE: Efficiency.
132
00:07:15,980 --> 00:07:18,813
DAVID J. MALAN: Efficiency, and what do you mean by efficiency precisely?
133
00:07:18,813 --> 00:07:22,163
AUDIENCE: [INAUDIBLE]
134
00:07:22,163 --> 00:07:22,830
DAVID J. MALAN: Nice.
135
00:07:22,830 --> 00:07:25,390
It doesn't use up too much memory, and it isn't redundant.
136
00:07:25,390 --> 00:07:27,120
So you can think about design along a few
137
00:07:27,120 --> 00:07:29,078
of these axes-- sort of the quality of the code
138
00:07:29,078 --> 00:07:31,270
but also the quality of the performance.
139
00:07:31,270 --> 00:07:35,818
And as our programs get bigger and more sophisticated and just longer,
140
00:07:35,818 --> 00:07:37,860
those kinds of things are really going to matter.
141
00:07:37,860 --> 00:07:39,860
And in the real world, if you start writing code
142
00:07:39,860 --> 00:07:42,630
not just by yourself but with someone else, getting the design
143
00:07:42,630 --> 00:07:46,140
right is just going to make it easier to collaborate and ultimately produce,
144
00:07:46,140 --> 00:07:48,670
write code, with just higher probability.
145
00:07:48,670 --> 00:07:52,290
So let's consider how we might focus on exactly the second characteristic,
146
00:07:52,290 --> 00:07:54,390
the efficiency, of an algorithm.
147
00:07:54,390 --> 00:07:58,230
And the way we might talk about the efficiency of algorithms, just how fast
148
00:07:58,230 --> 00:08:01,350
or how slow they are, is in terms of their running time.
149
00:08:01,350 --> 00:08:04,900
That is to say, when they're running, how much time do they take?
150
00:08:04,900 --> 00:08:08,010
And we might measure this in seconds or milliseconds or minutes
151
00:08:08,010 --> 00:08:10,350
or just some number of steps in the general case
152
00:08:10,350 --> 00:08:14,560
because presumably fewer steps, to your point, is better than more steps.
153
00:08:14,560 --> 00:08:16,410
So how might we think about running times?
154
00:08:16,410 --> 00:08:19,470
Well, there's one general notation we should define today.
155
00:08:19,470 --> 00:08:23,610
So computer scientists tend to describe the running time of an algorithm
156
00:08:23,610 --> 00:08:27,840
or a piece of code, for that matter, in terms of what's called big O notation.
157
00:08:27,840 --> 00:08:30,630
This is literally a capitalized O, a big O.
158
00:08:30,630 --> 00:08:34,110
And this generally means that the running time of some algorithm
159
00:08:34,110 --> 00:08:37,559
is on the order of such and such, where such and such, we'll see,
160
00:08:37,559 --> 00:08:40,169
is just going to be a very simple mathematical formula.
161
00:08:40,169 --> 00:08:42,450
It's kind of a way of waving your hands mathematically
162
00:08:42,450 --> 00:08:46,770
to convey the idea of just how fast or how slow some algorithm or code is
163
00:08:46,770 --> 00:08:48,600
without getting into the weeds of like, it
164
00:08:48,600 --> 00:08:52,330
took this many milliseconds or this many specific number of steps.
165
00:08:52,330 --> 00:08:56,280
So you might recall then from week zero, I even introduced this picture
166
00:08:56,280 --> 00:08:57,360
but without much context.
167
00:08:57,360 --> 00:09:00,660
At the time, we just use this to compare those phone book algorithms.
168
00:09:00,660 --> 00:09:03,600
Recall that this red straight line was the first algorithm,
169
00:09:03,600 --> 00:09:04,920
one page at a time.
170
00:09:04,920 --> 00:09:09,870
The yellow line that's still straight differed how if you recall?
171
00:09:09,870 --> 00:09:14,370
That line represented what alternative algorithm?
172
00:09:14,370 --> 00:09:15,370
Looking and back.
173
00:09:15,370 --> 00:09:16,620
What is that second algorithm?
174
00:09:16,620 --> 00:09:17,430
Yeah, over there.
175
00:09:17,430 --> 00:09:18,930
AUDIENCE: Like, two pages at a time.
176
00:09:18,930 --> 00:09:22,080
DAVID J. MALAN: Two pages at a time, which was almost correct so long as we
177
00:09:22,080 --> 00:09:25,340
potentially double back a page if maybe we go a little too far in the phone
178
00:09:25,340 --> 00:09:25,840
book.
179
00:09:25,840 --> 00:09:28,372
So it had a potential bug but arguably solvable.
180
00:09:28,372 --> 00:09:31,080
This last algorithm, though, was the so-called divide and conquer
181
00:09:31,080 --> 00:09:34,590
strategy where I sort of unnecessarily tore the phone book in half
182
00:09:34,590 --> 00:09:36,600
and then in half and then in half, which,
183
00:09:36,600 --> 00:09:40,890
as dramatic as that was unnecessarily, it actually took significantly bigger
184
00:09:40,890 --> 00:09:43,420
bites out of the problem-- like 500 pages
185
00:09:43,420 --> 00:09:49,360
the first time, another 250, another 125 versus just 1 or 2 bytes at a time.
186
00:09:49,360 --> 00:09:52,583
And so we described its running time as this picture
187
00:09:52,583 --> 00:09:55,500
there, though I didn't use that expression at the time, running times.
188
00:09:55,500 --> 00:09:58,230
But indeed, time to solve might be measured just
189
00:09:58,230 --> 00:10:00,150
abstractly in some unit of measure--
190
00:10:00,150 --> 00:10:03,420
seconds, milliseconds, minutes, pages--
191
00:10:03,420 --> 00:10:05,110
via this y-axis here.
192
00:10:05,110 --> 00:10:07,830
So let's now slap some numbers on this.
193
00:10:07,830 --> 00:10:11,250
If we had n pages in that phone book, n just representing
194
00:10:11,250 --> 00:10:13,620
a generic number, the first algorithm here
195
00:10:13,620 --> 00:10:15,450
we might describe as taking n steps.
196
00:10:15,450 --> 00:10:18,870
Second algorithm we might describe as taking n divided by 2 steps,
197
00:10:18,870 --> 00:10:21,510
maybe give or take one if we have to double back but generally
198
00:10:21,510 --> 00:10:22,482
n divided by 2.
199
00:10:22,482 --> 00:10:24,690
And then this thing, if you remember your logarithms,
200
00:10:24,690 --> 00:10:26,648
was sort of a fundamentally different formula--
201
00:10:26,648 --> 00:10:30,100
log base 2 of n or just log of n for short.
202
00:10:30,100 --> 00:10:32,790
So this is of a fundamentally different formula.
203
00:10:32,790 --> 00:10:36,430
But what's noteworthy is that these first two algorithms,
204
00:10:36,430 --> 00:10:40,050
even though, yes, the second algorithm was hands down faster--
205
00:10:40,050 --> 00:10:41,910
I mean, literally twice as fast--
206
00:10:41,910 --> 00:10:46,560
when you start to zoom out and if I increase my y-axis and x-axis,
207
00:10:46,560 --> 00:10:52,560
these first two start to look awfully similar to one another.
208
00:10:52,560 --> 00:10:54,810
And if we keep zooming out and zooming out and zooming
209
00:10:54,810 --> 00:10:57,090
out as n gets really large--
210
00:10:57,090 --> 00:10:59,220
that is, the x-axis gets really long--
211
00:10:59,220 --> 00:11:03,250
these first two algorithms start to become essentially the same.
212
00:11:03,250 --> 00:11:06,390
And so this is where computer scientists use big O notation.
213
00:11:06,390 --> 00:11:10,420
Instead of saying specifically, this algorithm takes any steps.
214
00:11:10,420 --> 00:11:13,620
And this one n divided by 2, a computer scientist would say,
215
00:11:13,620 --> 00:11:17,010
eh, each of those algorithms takes on the order of n steps
216
00:11:17,010 --> 00:11:19,020
or on the order of n over 2.
217
00:11:19,020 --> 00:11:19,770
But you know what?
218
00:11:19,770 --> 00:11:23,490
On the order of n over 2 is pretty much the same
219
00:11:23,490 --> 00:11:29,620
when n gets really large as being equivalent to big O of n itself.
220
00:11:29,620 --> 00:11:34,480
So yes, in practice, it's obviously fewer steps to move twice as fast.
221
00:11:34,480 --> 00:11:37,950
But in the big picture, when n becomes a million, a billion,
222
00:11:37,950 --> 00:11:40,620
the numbers are already so darn big at that point
223
00:11:40,620 --> 00:11:43,680
that these are as, the shapes of these curves imply,
224
00:11:43,680 --> 00:11:45,930
pretty much functionally equivalent.
225
00:11:45,930 --> 00:11:49,080
But this one still looks better and better
226
00:11:49,080 --> 00:11:52,470
as n gets large because it's rising so much less quickly.
227
00:11:52,470 --> 00:11:54,660
And so here, a computer scientist would say
228
00:11:54,660 --> 00:11:59,318
that that third algorithm was on the order of-- that is, big O of-- log n.
229
00:11:59,318 --> 00:12:01,110
And they don't have to bother with the base
230
00:12:01,110 --> 00:12:04,980
because it's a smaller mathematical detail that is also just in some sense
231
00:12:04,980 --> 00:12:07,890
a constant, multiplicative factor.
232
00:12:07,890 --> 00:12:09,840
So in short, what are the takeaways here?
233
00:12:09,840 --> 00:12:12,270
This is just a new vocabulary that we'll start
234
00:12:12,270 --> 00:12:16,020
to use when we just want to describe the running time of an algorithm.
235
00:12:16,020 --> 00:12:18,600
To make this more real, if any of you have implemented
236
00:12:18,600 --> 00:12:24,290
a for loop at this point in any of your code and that for loop iterated n times
237
00:12:24,290 --> 00:12:27,590
where maybe in was the height of your pyramid or maybe n
238
00:12:27,590 --> 00:12:31,550
was something else that you wanted to do n times, you wrote code
239
00:12:31,550 --> 00:12:36,440
or you implemented an algorithm that operated in big O of n time,
240
00:12:36,440 --> 00:12:37,200
if you will.
241
00:12:37,200 --> 00:12:39,350
So this is just a way now to retroactively start
242
00:12:39,350 --> 00:12:43,490
describing with somewhat mathematical notation what we've
243
00:12:43,490 --> 00:12:45,780
been doing in practice for a while now.
244
00:12:45,780 --> 00:12:51,500
So here's a list of commonly seen running times in the real world.
245
00:12:51,500 --> 00:12:55,010
This is not a thorough list because you could come up
246
00:12:55,010 --> 00:12:57,710
with an infinite number of mathematical formulas, certainly.
247
00:12:57,710 --> 00:13:01,400
But the common ones we'll discuss and you will see in your own code
248
00:13:01,400 --> 00:13:04,070
probably reduce to this list here.
249
00:13:04,070 --> 00:13:06,320
And if you were to study more computer science theory,
250
00:13:06,320 --> 00:13:07,903
this list would get longer and longer.
251
00:13:07,903 --> 00:13:11,930
But for now, these are sort of the most familiar ones that we'll soon see.
252
00:13:11,930 --> 00:13:14,700
All right, two other pieces of vocabulary, if you will,
253
00:13:14,700 --> 00:13:16,160
before we start to use this stuff--
254
00:13:16,160 --> 00:13:19,430
so this, a big omega, capital omega symbol,
255
00:13:19,430 --> 00:13:25,170
is used now to describe a lower bound on the running time of an algorithm.
256
00:13:25,170 --> 00:13:28,610
So to be clear, big O is on the order of-- that
257
00:13:28,610 --> 00:13:31,910
is, an upper bound-- on how many steps an algorithm might
258
00:13:31,910 --> 00:13:34,700
take, on the order of so many steps.
259
00:13:34,700 --> 00:13:37,580
If you want to talk, though, from the other perspective, well,
260
00:13:37,580 --> 00:13:39,710
how few steps my algorithm take?
261
00:13:39,710 --> 00:13:42,320
Maybe in the so-called best case, it'd be nice
262
00:13:42,320 --> 00:13:45,110
if we had a notation to just describe what a lower
263
00:13:45,110 --> 00:13:47,840
bound is because some algorithms might be super fast
264
00:13:47,840 --> 00:13:49,890
in these so-called best cases.
265
00:13:49,890 --> 00:13:53,660
So the symbology is almost the same, but we replace the big O
266
00:13:53,660 --> 00:13:54,780
with the big omega.
267
00:13:54,780 --> 00:13:58,580
So to be clear, big O describes an upper bound and omega
268
00:13:58,580 --> 00:14:00,110
describes a lower bound.
269
00:14:00,110 --> 00:14:02,540
And we'll see examples of this before long.
270
00:14:02,540 --> 00:14:08,420
And then lastly, last one here, big theta, is used by a computer scientist
271
00:14:08,420 --> 00:14:12,860
when you have a case where both the upper bound on an algorithm's
272
00:14:12,860 --> 00:14:15,890
running time is the same as the lower bound.
273
00:14:15,890 --> 00:14:19,580
You can then describe it in one breath as being in theta of such and such
274
00:14:19,580 --> 00:14:23,660
instead of saying it's in big O and in omega of something else.
275
00:14:23,660 --> 00:14:27,920
All right, so out of context, sort of just seemingly cryptic symbols,
276
00:14:27,920 --> 00:14:31,463
but all they refer to is upper bounds, lower bounds, or when
277
00:14:31,463 --> 00:14:32,880
they happen to be one in the same.
278
00:14:32,880 --> 00:14:36,440
And we'll now introduce over time examples of how we might actually
279
00:14:36,440 --> 00:14:38,990
apply these to concrete problems.
280
00:14:38,990 --> 00:14:43,220
But first, let me pause to see if there's any questions.
281
00:14:43,220 --> 00:14:46,120
Any questions here?
282
00:14:46,120 --> 00:14:48,430
Any questions?
283
00:14:48,430 --> 00:14:50,880
I see pointing somewhere.
284
00:14:50,880 --> 00:14:53,100
Where are you pointing to?
285
00:14:53,100 --> 00:14:54,640
Over here-- there we go.
286
00:14:54,640 --> 00:14:55,980
OK, sorry-- very bright.
287
00:14:55,980 --> 00:14:59,560
AUDIENCE: So, um, smaller--
288
00:14:59,560 --> 00:15:01,520
DAVID J. MALAN: Smaller n functions move faster.
289
00:15:01,520 --> 00:15:06,260
So yes, if you have something like n, that takes only steps.
290
00:15:06,260 --> 00:15:09,265
If you have a formula like n squared, just by nature of the math,
291
00:15:09,265 --> 00:15:11,900
that take more steps and therefore be slower.
292
00:15:11,900 --> 00:15:13,810
So the larger the mathematical expression,
293
00:15:13,810 --> 00:15:18,280
the slower your algorithm is because the more time or more steps that it takes.
294
00:15:18,280 --> 00:15:20,800
295
00:15:20,800 --> 00:15:23,170
AUDIENCE: So you want your n function to be small?
296
00:15:23,170 --> 00:15:26,410
DAVID J. MALAN: You want your n function, so to speak, to be small, yes.
297
00:15:26,410 --> 00:15:28,220
And in fact, the Holy Grail, so to speak,
298
00:15:28,220 --> 00:15:31,690
would be this last one here either in big O notation or even theta,
299
00:15:31,690 --> 00:15:34,900
when an algorithm is on the order of a single step.
300
00:15:34,900 --> 00:15:39,370
That means it literally takes constant time, one step, or maybe 10 steps,
301
00:15:39,370 --> 00:15:42,490
100 steps, but a fixed, constant number of steps.
302
00:15:42,490 --> 00:15:46,120
That's the best because even as the phone book gets bigger,
303
00:15:46,120 --> 00:15:50,350
even as the data set you're searching gets larger and larger,
304
00:15:50,350 --> 00:15:53,500
if something only takes a finite number of steps constantly,
305
00:15:53,500 --> 00:15:57,640
then it doesn't matter how big the data set actually gets.
306
00:15:57,640 --> 00:16:01,840
Questions as well on these notations-- yep, thank you for the pointing.
307
00:16:01,840 --> 00:16:03,310
This is actually very helpful.
308
00:16:03,310 --> 00:16:05,170
I'm seeing pointing this way?
309
00:16:05,170 --> 00:16:08,227
AUDIENCE: [INAUDIBLE]
310
00:16:08,227 --> 00:16:10,560
DAVID J. MALAN: What is the input to each of these functions?
311
00:16:10,560 --> 00:16:13,710
It is an expression of how many steps an algorithm takes.
312
00:16:13,710 --> 00:16:16,080
So in fact, let me go ahead and make this
313
00:16:16,080 --> 00:16:19,220
more concrete with an actual example here if we could.
314
00:16:19,220 --> 00:16:22,350
So on stage here, we have seven lockers which represent,
315
00:16:22,350 --> 00:16:24,300
if you will, an array of memory.
316
00:16:24,300 --> 00:16:26,400
And this array of memory is maybe storing
317
00:16:26,400 --> 00:16:30,150
seven integers, seven integers that we might actually want to search for.
318
00:16:30,150 --> 00:16:33,090
And if we want to search for these values, how might
319
00:16:33,090 --> 00:16:34,048
we go about doing this?
320
00:16:34,048 --> 00:16:36,257
Well, for this, why don't we make things interesting?
321
00:16:36,257 --> 00:16:37,810
Would a volunteer like to come on up?
322
00:16:37,810 --> 00:16:40,920
Have to be masked and on the internet if you are comfortable.
323
00:16:40,920 --> 00:16:44,190
Both of-- oh, there's someone putting their friend's hand up and back?
324
00:16:44,190 --> 00:16:44,940
Yes, OK.
325
00:16:44,940 --> 00:16:45,660
Come on down.
326
00:16:45,660 --> 00:16:50,380
327
00:16:50,380 --> 00:16:53,080
And in just a moment, our brave volunteer
328
00:16:53,080 --> 00:16:57,070
is going to help me find a specific number in the data set
329
00:16:57,070 --> 00:16:58,520
that we have here on the screen.
330
00:16:58,520 --> 00:17:02,230
So come on down, and I'll get things ready for you in advance here.
331
00:17:02,230 --> 00:17:08,723
Come on down nice to meet.
332
00:17:08,723 --> 00:17:09,640
And what is your name?
333
00:17:09,640 --> 00:17:10,390
AUDIENCE: [? Nomira. ?]
334
00:17:10,390 --> 00:17:10,645
DAVID J. MALAN: Minera?
335
00:17:10,645 --> 00:17:11,530
AUDIENCE: [? Nomira. ?]
336
00:17:11,530 --> 00:17:13,113
DAVID J. MALAN: [? Nomira. ?] Nice to meet.
337
00:17:13,113 --> 00:17:13,700
Come on over.
338
00:17:13,700 --> 00:17:17,650
So here we have for Nomira seven lockers or an array of memory.
339
00:17:17,650 --> 00:17:19,540
And behind each of these doors is a number.
340
00:17:19,540 --> 00:17:22,569
And the goal, quite simply, is, given this array of memory
341
00:17:22,569 --> 00:17:27,680
as input, to return, true or false, is the number I care about actually there?
342
00:17:27,680 --> 00:17:30,250
So suppose I care about the number 0.
343
00:17:30,250 --> 00:17:33,730
What would be the simplest, most correct algorithm you could
344
00:17:33,730 --> 00:17:38,200
apply in order to find us the number 0?
345
00:17:38,200 --> 00:17:41,608
OK, try opening the first one.
346
00:17:41,608 --> 00:17:44,150
All right, and maybe just step aside so the audience can see.
347
00:17:44,150 --> 00:17:46,160
I think you have not found 0 yet.
348
00:17:46,160 --> 00:17:47,480
OK, so keep the door open.
349
00:17:47,480 --> 00:17:50,120
Let's move on to your next choice.
350
00:17:50,120 --> 00:17:51,080
Second door, sure.
351
00:17:51,080 --> 00:17:51,998
AUDIENCE: [INAUDIBLE]
352
00:17:51,998 --> 00:17:53,540
DAVID J. MALAN: Oh, go ahead, second door.
353
00:17:53,540 --> 00:17:54,415
Let's keep it simple.
354
00:17:54,415 --> 00:17:57,557
Let's just move from left to right, sort of searching our way.
355
00:17:57,557 --> 00:17:58,640
And what do you see there?
356
00:17:58,640 --> 00:17:59,770
Oh, 6, not 0.
357
00:17:59,770 --> 00:18:00,770
How about the next door?
358
00:18:00,770 --> 00:18:04,200
359
00:18:04,200 --> 00:18:07,230
All right, also not working out so well yet, but that's OK.
360
00:18:07,230 --> 00:18:11,610
If you want to go on to the next, we're still looking for 0.
361
00:18:11,610 --> 00:18:12,930
All right, I see a 2.
362
00:18:12,930 --> 00:18:14,502
All right, it's not so good yet.
363
00:18:14,502 --> 00:18:15,210
Let's keep going.
364
00:18:15,210 --> 00:18:16,860
Next door.
365
00:18:16,860 --> 00:18:18,540
2, 7-- no.
366
00:18:18,540 --> 00:18:20,100
OK, next door.
367
00:18:20,100 --> 00:18:25,870
No, that's a-- all right, very well done.
368
00:18:25,870 --> 00:18:26,370
Oh.
369
00:18:26,370 --> 00:18:29,030
370
00:18:29,030 --> 00:18:32,330
All right, so I kind of set you up for a fairly slow algorithm,
371
00:18:32,330 --> 00:18:34,550
but let me just ask you to describe what is it
372
00:18:34,550 --> 00:18:37,534
you did by following the steps I gave you.
373
00:18:37,534 --> 00:18:39,660
AUDIENCE: I just went one by one to each character.
374
00:18:39,660 --> 00:18:41,660
DAVID J. MALAN: You went one by one to each character
375
00:18:41,660 --> 00:18:43,380
if you want to talk into here.
376
00:18:43,380 --> 00:18:45,140
So you went one by one by each character.
377
00:18:45,140 --> 00:18:48,620
And would you say that algorithm left or right is correct?
378
00:18:48,620 --> 00:18:49,790
AUDIENCE: No.
379
00:18:49,790 --> 00:18:50,960
DAVID J. MALAN: No?
380
00:18:50,960 --> 00:18:53,000
AUDIENCE: Or, yes, in the scenario.
381
00:18:53,000 --> 00:18:54,500
DAVID J. MALAN: OK, yes in this scenario.
382
00:18:54,500 --> 00:18:55,520
Why are you hesitating?
383
00:18:55,520 --> 00:18:56,060
What's going through your mind?
384
00:18:56,060 --> 00:18:58,100
AUDIENCE: Because it's not the most efficient way to do it.
385
00:18:58,100 --> 00:18:58,980
DAVID J. MALAN: OK, good.
386
00:18:58,980 --> 00:19:01,610
So we see a contrast here between correctness and design.
387
00:19:01,610 --> 00:19:04,360
I mean, I do think it was correct because even though it was slow,
388
00:19:04,360 --> 00:19:05,960
you eventually found zero.
389
00:19:05,960 --> 00:19:08,490
But it took some number of steps.
390
00:19:08,490 --> 00:19:10,280
So in fact, this would be an algorithm.
391
00:19:10,280 --> 00:19:12,350
It has a name, called linear search.
392
00:19:12,350 --> 00:19:14,840
And, [? Nomira, ?] as you did, you kind of walked along
393
00:19:14,840 --> 00:19:16,735
a line going from left to right.
394
00:19:16,735 --> 00:19:17,360
Now let me ask.
395
00:19:17,360 --> 00:19:19,880
If you had gone from right to left, would the algorithm
396
00:19:19,880 --> 00:19:23,030
have been fundamentally better?
397
00:19:23,030 --> 00:19:23,780
AUDIENCE: Yes.
398
00:19:23,780 --> 00:19:24,885
DAVID J. MALAN: OK, and why?
399
00:19:24,885 --> 00:19:27,260
AUDIENCE: Because the zero is here in the first scenario.
400
00:19:27,260 --> 00:19:30,860
But if it was like, the zero is in the middle, it wouldn't have been.
401
00:19:30,860 --> 00:19:35,020
DAVID J. MALAN: Yeah, and so here is where the right way to do things
402
00:19:35,020 --> 00:19:36,270
becomes a little less obvious.
403
00:19:36,270 --> 00:19:38,707
You would absolutely have given yourself a better result
404
00:19:38,707 --> 00:19:40,790
if you would just happened to start from the right
405
00:19:40,790 --> 00:19:42,780
or if I had pointed you to start over there.
406
00:19:42,780 --> 00:19:45,907
But the catch is if I asked her to find another number, like the number 8,
407
00:19:45,907 --> 00:19:47,240
well, that would have backfired.
408
00:19:47,240 --> 00:19:48,948
And this time, it would have taken longer
409
00:19:48,948 --> 00:19:51,450
to find that number because it's way over here instead.
410
00:19:51,450 --> 00:19:55,970
And so in the general case, going left to right or, heck, right to left
411
00:19:55,970 --> 00:19:59,540
is probably as correct as you can get because if you know nothing
412
00:19:59,540 --> 00:20:03,517
about the order of these numbers-- and indeed, they seem to be fairly random.
413
00:20:03,517 --> 00:20:05,600
Some of them are smaller, some of them are bigger.
414
00:20:05,600 --> 00:20:07,308
There doesn't seem to be rhyme or reason.
415
00:20:07,308 --> 00:20:11,330
Linear search is about as good as you can do when you don't know anything
416
00:20:11,330 --> 00:20:13,053
a priori about the numbers.
417
00:20:13,053 --> 00:20:15,720
So I have a little thank you gift here, a little CS stress ball.
418
00:20:15,720 --> 00:20:18,680
Round of applause for our first volunteer.
419
00:20:18,680 --> 00:20:19,550
Thank you so much.
420
00:20:19,550 --> 00:20:23,670
421
00:20:23,670 --> 00:20:27,690
Let's try to formalize what I just described as linear search
422
00:20:27,690 --> 00:20:30,938
because indeed, no matter which end [? Nomira ?] had started on,
423
00:20:30,938 --> 00:20:32,730
I could have kind of changed up the problem
424
00:20:32,730 --> 00:20:35,070
to make sure that it appears to be running slow.
425
00:20:35,070 --> 00:20:36,090
But it is correct.
426
00:20:36,090 --> 00:20:39,270
If zero were among those doors, she absolutely would have found it
427
00:20:39,270 --> 00:20:40,260
and indeed did.
428
00:20:40,260 --> 00:20:45,870
So let's now try to translate what we did into what we might call again
429
00:20:45,870 --> 00:20:48,210
pseudo code as from week zero.
430
00:20:48,210 --> 00:20:50,730
So with pseudo code, we just need a terse English
431
00:20:50,730 --> 00:20:53,770
like, or any language, syntax to describe what we did.
432
00:20:53,770 --> 00:20:56,280
So here might be one formulation of what [? Nomira ?] did.
433
00:20:56,280 --> 00:21:00,690
For each door, from left to right, if the number is behind the door,
434
00:21:00,690 --> 00:21:02,280
return true.
435
00:21:02,280 --> 00:21:07,630
Else, at the very end of the program, you would return false by default.
436
00:21:07,630 --> 00:21:08,800
And now you got lucky.
437
00:21:08,800 --> 00:21:10,800
And by the seventh door, [? Nomira ?] had indeed
438
00:21:10,800 --> 00:21:14,430
returned true by saying, well, there is the zero.
439
00:21:14,430 --> 00:21:17,070
But let's consider if this pseudo code is now correct,
440
00:21:17,070 --> 00:21:18,150
an accurate translation.
441
00:21:18,150 --> 00:21:22,360
First of all, normally, when we've seen ifs, we might see an if else.
442
00:21:22,360 --> 00:21:26,100
And yet down here, return false is aligned with the for.
443
00:21:26,100 --> 00:21:29,700
Why did I not indent the return false, or put another way,
444
00:21:29,700 --> 00:21:36,780
why did I not do if number is behind door, return true, else return false?
445
00:21:36,780 --> 00:21:39,855
Why would that version of this code have been problematic?
446
00:21:39,855 --> 00:21:41,318
Way in back.
447
00:21:41,318 --> 00:21:50,320
AUDIENCE: [INAUDIBLE]
448
00:21:50,320 --> 00:21:52,690
DAVID J. MALAN: OK, I'm not sure it's because of redundancy.
449
00:21:52,690 --> 00:21:55,000
Let me go ahead and just make this explicit.
450
00:21:55,000 --> 00:21:58,510
If I had instead done else return false, I
451
00:21:58,510 --> 00:22:02,942
don't think it's so much redundancy that I'd be worried about.
452
00:22:02,942 --> 00:22:04,150
Let me bounce somewhere else.
453
00:22:04,150 --> 00:22:04,810
Yeah, in front?
454
00:22:04,810 --> 00:22:08,350
AUDIENCE: Um, maybe [INAUDIBLE] for the entire list
455
00:22:08,350 --> 00:22:09,730
after just checking one number.
456
00:22:09,730 --> 00:22:11,920
DAVID J. MALAN: Yeah, it would be returning falls for--
457
00:22:11,920 --> 00:22:13,795
even though I'd only looked at-- [? Nomira ?]
458
00:22:13,795 --> 00:22:15,100
had only looked at one element.
459
00:22:15,100 --> 00:22:18,142
And it would have been as though if all of these doors were still closed,
460
00:22:18,142 --> 00:22:21,610
she opens this up and says, nope, this is not zero, return false.
461
00:22:21,610 --> 00:22:24,708
That would give me an incorrect result because obviously,
462
00:22:24,708 --> 00:22:27,250
at that stage in the algorithm, she wouldn't have even looked
463
00:22:27,250 --> 00:22:28,720
through any of the other doors.
464
00:22:28,720 --> 00:22:32,510
So just the original indentation of this, if you will,
465
00:22:32,510 --> 00:22:35,200
without the [? else, ?] is correct because only
466
00:22:35,200 --> 00:22:38,950
if I get to the bottom of this algorithm or the pseudo code does
467
00:22:38,950 --> 00:22:41,740
it make sense to conclude at that point, once she's
468
00:22:41,740 --> 00:22:45,220
gone through all of the doors, that nope, there's in fact--
469
00:22:45,220 --> 00:22:48,550
the number I'm looking for is, in fact, not actually there.
470
00:22:48,550 --> 00:22:52,790
So how might we consider now the running time of this algorithm?
471
00:22:52,790 --> 00:22:55,930
We have a few different types of vocabulary now.
472
00:22:55,930 --> 00:22:59,080
And if we consider now how we might think about this,
473
00:22:59,080 --> 00:23:02,620
let's start to translate it from sort of higher level pseudo code
474
00:23:02,620 --> 00:23:04,420
to something a little lower level.
475
00:23:04,420 --> 00:23:07,820
We've been writing code using n and loops and the like.
476
00:23:07,820 --> 00:23:12,340
So let's take this higher level pseudo code and now just kind of
477
00:23:12,340 --> 00:23:14,890
get a middle ground between English and C.
478
00:23:14,890 --> 00:23:18,910
Let me propose that we think about this version of the same algorithm
479
00:23:18,910 --> 00:23:20,680
as being a little more pedantic.
480
00:23:20,680 --> 00:23:28,820
For i from 0 to n minus 1, if number behind doors bracket i return true.
481
00:23:28,820 --> 00:23:31,520
Otherwise, at the end of the program, return false.
482
00:23:31,520 --> 00:23:33,520
Now I'm kind of mixing English and C here,
483
00:23:33,520 --> 00:23:35,950
but that's reasonable if the reader is familiar with C
484
00:23:35,950 --> 00:23:37,540
or some similar language.
485
00:23:37,540 --> 00:23:39,530
And notice this pattern here.
486
00:23:39,530 --> 00:23:44,650
This is a way of just saying in pseudo code, give myself a variable called i.
487
00:23:44,650 --> 00:23:49,360
Start at 0 and then just count up to n minus 1.
488
00:23:49,360 --> 00:23:53,410
And recall n minus 1 is not one shy of the end of the array.
489
00:23:53,410 --> 00:23:56,170
N minus 1 is the end of the array because again, we
490
00:23:56,170 --> 00:23:57,580
started counting at 0.
491
00:23:57,580 --> 00:24:00,880
So this is a very common way of expressing this kind of loop
492
00:24:00,880 --> 00:24:03,910
from the left all the way to the right of an array.
493
00:24:03,910 --> 00:24:07,210
Doors I'm kind of implicitly treating as the name of this array,
494
00:24:07,210 --> 00:24:10,000
like it's a variable from last week that I defined as being
495
00:24:10,000 --> 00:24:11,780
an array of integers in this case.
496
00:24:11,780 --> 00:24:16,690
So doors bracket i means that when i is 0, it's this location.
497
00:24:16,690 --> 00:24:18,220
When i is 1, it's this.
498
00:24:18,220 --> 00:24:21,790
When i is 7 or, more generally n minus--
499
00:24:21,790 --> 00:24:26,270
sorry, 6 or, more generally, n minus 1, that's this location here.
500
00:24:26,270 --> 00:24:28,700
So same idea but a translation of it.
501
00:24:28,700 --> 00:24:32,920
So now let's consider what the running time of this algorithm is.
502
00:24:32,920 --> 00:24:36,010
If we have this menu of possible answers to this question,
503
00:24:36,010 --> 00:24:38,930
how efficient or inefficient is this algorithm,
504
00:24:38,930 --> 00:24:41,650
let's take a look in the context of this pseudo code.
505
00:24:41,650 --> 00:24:44,500
We don't even have to bother going all the way to C.
506
00:24:44,500 --> 00:24:47,720
How do we go about analyzing each of these steps?
507
00:24:47,720 --> 00:24:49,600
Well, let's consider this.
508
00:24:49,600 --> 00:24:55,540
This outermost loop here for i from 0 to n minus 1, that line of code
509
00:24:55,540 --> 00:24:57,790
is going to execute how many times?
510
00:24:57,790 --> 00:25:01,420
How many times will that loop execute?
511
00:25:01,420 --> 00:25:04,540
Let me give folks this moment to think on it.
512
00:25:04,540 --> 00:25:07,330
How many times is that going to loop here?
513
00:25:07,330 --> 00:25:08,584
Yeah, over there.
514
00:25:08,584 --> 00:25:10,360
AUDIENCE: [INAUDIBLE]
515
00:25:10,360 --> 00:25:11,530
DAVID J. MALAN: n times, right?
516
00:25:11,530 --> 00:25:13,720
Because it's from 0 to n minus 1.
517
00:25:13,720 --> 00:25:16,330
And if it's a little weird to think in from 0 to n minus 1,
518
00:25:16,330 --> 00:25:19,832
this is essentially the same mathematically as from 1 to n.
519
00:25:19,832 --> 00:25:21,790
And that's perhaps a little more obviously more
520
00:25:21,790 --> 00:25:23,690
intuitively n total steps.
521
00:25:23,690 --> 00:25:28,180
So I might just make a note to myself this loop is going to operate n times.
522
00:25:28,180 --> 00:25:29,770
What about these inner steps?
523
00:25:29,770 --> 00:25:33,160
Well, how many steps or seconds does it take to ask a question?
524
00:25:33,160 --> 00:25:35,290
If the number behind--
525
00:25:35,290 --> 00:25:38,740
if the number you're looking for is behind doors bracket i,
526
00:25:38,740 --> 00:25:41,570
well, as [? Nomira ?] did, that's kind of like one step.
527
00:25:41,570 --> 00:25:42,820
So you open the door and boom.
528
00:25:42,820 --> 00:25:46,100
All right, maybe it's two steps, but it's a constant number of steps.
529
00:25:46,100 --> 00:25:48,320
So this is some constant number of steps.
530
00:25:48,320 --> 00:25:50,080
Let's just call it one for simplicity.
531
00:25:50,080 --> 00:25:53,500
How many steps or seconds does it take to return true?
532
00:25:53,500 --> 00:25:55,863
I don't know exactly in the computer's memory
533
00:25:55,863 --> 00:25:57,280
but that feels like a single step.
534
00:25:57,280 --> 00:25:58,940
Just return true.
535
00:25:58,940 --> 00:26:01,960
So if this takes one step, this takes one step
536
00:26:01,960 --> 00:26:04,960
but only if the condition is true, it looks
537
00:26:04,960 --> 00:26:09,370
like you're doing a constant number of things n times.
538
00:26:09,370 --> 00:26:12,530
Or maybe you're doing one additional step.
539
00:26:12,530 --> 00:26:15,010
So in short, the only thing that really matters here
540
00:26:15,010 --> 00:26:18,220
in terms of the efficiency or inefficiency of the algorithm
541
00:26:18,220 --> 00:26:21,495
is what are you doing again and again and again because that's obviously
542
00:26:21,495 --> 00:26:22,870
the thing that's going to add up.
543
00:26:22,870 --> 00:26:26,080
Doing one thing or two things a constant number of times?
544
00:26:26,080 --> 00:26:27,010
Not a big deal.
545
00:26:27,010 --> 00:26:32,050
But looping, that's going to add up over time because the more doors there are,
546
00:26:32,050 --> 00:26:35,420
the bigger n is going to be and the more steps that's going to take,
547
00:26:35,420 --> 00:26:38,320
which is all to say if you were to describe roughly
548
00:26:38,320 --> 00:26:43,120
how many steps does this algorithm take in big O notation,
549
00:26:43,120 --> 00:26:46,060
what might your instincts say?
550
00:26:46,060 --> 00:26:51,350
How many steps is this algorithm on the order of given n doors or n integers?
551
00:26:51,350 --> 00:26:51,850
Yeah?
552
00:26:51,850 --> 00:26:52,805
AUDIENCE: [INAUDIBLE]
553
00:26:52,805 --> 00:26:53,680
DAVID J. MALAN: Say again?
554
00:26:53,680 --> 00:26:54,670
AUDIENCE: O n.
555
00:26:54,670 --> 00:26:56,110
DAVID J. MALAN: Big O of n.
556
00:26:56,110 --> 00:26:58,220
And indeed, that's going to be the case here.
557
00:26:58,220 --> 00:26:58,450
Why?
558
00:26:58,450 --> 00:27:00,533
Because you're essentially, at the end of the day,
559
00:27:00,533 --> 00:27:03,992
doing n things as an upper bound on running time.
560
00:27:03,992 --> 00:27:05,950
And that's, in fact, what exactly what happened
561
00:27:05,950 --> 00:27:08,500
with [? Nomira. ?] She had to look at all n lockers
562
00:27:08,500 --> 00:27:11,450
before finally getting to the right answer.
563
00:27:11,450 --> 00:27:14,470
But what if she got lucky and the number we
564
00:27:14,470 --> 00:27:17,380
were looking for was not at the end of the array
565
00:27:17,380 --> 00:27:20,080
but was at the beginning of the array?
566
00:27:20,080 --> 00:27:21,650
How might we think about that?
567
00:27:21,650 --> 00:27:25,120
Well, have a nomenclature for this too, of course-- omega notation.
568
00:27:25,120 --> 00:27:27,730
Remember, omega notation is a lower bound.
569
00:27:27,730 --> 00:27:34,240
So given this menu of possible running times for lower bounds on an algorithm,
570
00:27:34,240 --> 00:27:38,897
what might the omega notation be for [? Nomira's ?] linear search?
571
00:27:38,897 --> 00:27:40,270
AUDIENCE: Omega 1.
572
00:27:40,270 --> 00:27:42,250
DAVID J. MALAN: Omega of 1, and why that?
573
00:27:42,250 --> 00:27:43,973
AUDIENCE: [INAUDIBLE]
574
00:27:43,973 --> 00:27:46,390
DAVID J. MALAN: Right, because if just by chance she gets lucky
575
00:27:46,390 --> 00:27:49,300
and the number she's looking for is right there where
576
00:27:49,300 --> 00:27:51,490
she begins the algorithm, that's it.
577
00:27:51,490 --> 00:27:52,540
It's one step.
578
00:27:52,540 --> 00:27:55,210
Maybe it's two steps if you have to unlock the door and open it,
579
00:27:55,210 --> 00:27:56,740
but it's a constant number of steps.
580
00:27:56,740 --> 00:27:58,810
And the way we describe constant number of steps
581
00:27:58,810 --> 00:28:01,030
is just with a single number like 1.
582
00:28:01,030 --> 00:28:04,990
So the omega notation for linear search might be omega of 1
583
00:28:04,990 --> 00:28:08,650
because in the best case, she might just get the number right from the get go.
584
00:28:08,650 --> 00:28:11,860
But in the worst case, we need to talk about the upper bound, which
585
00:28:11,860 --> 00:28:13,990
might indeed be big O of n.
586
00:28:13,990 --> 00:28:16,690
So again there's this way now of talking symbolically
587
00:28:16,690 --> 00:28:22,150
about best cases and worst cases or lower bounds and upper bounds.
588
00:28:22,150 --> 00:28:24,880
Theta notation, just as a little trivia now,
589
00:28:24,880 --> 00:28:27,965
is it applicable based on the definition I gave earlier?
590
00:28:27,965 --> 00:28:28,840
AUDIENCE: [INAUDIBLE]
591
00:28:28,840 --> 00:28:31,630
DAVID J. MALAN: OK, no, because you only take out the theta notation
592
00:28:31,630 --> 00:28:34,120
when those two bounds, upper and lower, happen
593
00:28:34,120 --> 00:28:36,740
to be the same for shorthand notation, if you will.
594
00:28:36,740 --> 00:28:41,530
So it suffices here to talk about just big O and omega notation.
595
00:28:41,530 --> 00:28:43,880
Well, what if we are a little smarter about this?
596
00:28:43,880 --> 00:28:47,350
Let me go ahead and sort of semi-secretly here
597
00:28:47,350 --> 00:28:48,380
rearrange these numbers.
598
00:28:48,380 --> 00:28:50,605
But first, how about one other volunteer?
599
00:28:50,605 --> 00:28:53,230
One other volunteer-- you have to be comfortable with your mask
600
00:28:53,230 --> 00:28:55,510
and your being on the internet.
601
00:28:55,510 --> 00:28:58,180
How about over here?
602
00:28:58,180 --> 00:28:59,880
Yes, you want to come on down?
603
00:28:59,880 --> 00:29:00,880
All right, come on down.
604
00:29:00,880 --> 00:29:03,640
And don't look at what I'm doing because I'm going to--
605
00:29:03,640 --> 00:29:08,340
606
00:29:08,340 --> 00:29:10,830
take your time and don't look up this way
607
00:29:10,830 --> 00:29:14,550
because I need a moment to rearrange all of the numbers.
608
00:29:14,550 --> 00:29:17,430
And actually, if you could stay right there before coming up,
609
00:29:17,430 --> 00:29:20,910
just an awkward few seconds while I finish hiding the numbers
610
00:29:20,910 --> 00:29:22,590
behind these doors for you.
611
00:29:22,590 --> 00:29:23,890
AUDIENCE: [INAUDIBLE]
612
00:29:23,890 --> 00:29:26,490
DAVID J. MALAN: I will be right with you.
613
00:29:26,490 --> 00:29:31,110
Actually, if-- do you want to warm up the crowd for a moment
614
00:29:31,110 --> 00:29:32,283
and I'll be right back?
615
00:29:32,283 --> 00:29:33,700
So you want to introduce yourself?
616
00:29:33,700 --> 00:29:34,770
AUDIENCE: Yeah, hi, guys.
617
00:29:34,770 --> 00:29:35,340
I'm Rave.
618
00:29:35,340 --> 00:29:37,900
619
00:29:37,900 --> 00:29:38,400
Yeah!
620
00:29:38,400 --> 00:29:43,500
621
00:29:43,500 --> 00:29:46,200
DAVID J. MALAN: All right, I think I am ready.
622
00:29:46,200 --> 00:29:47,700
Thank you for stalling there.
623
00:29:47,700 --> 00:29:48,970
AUDIENCE: Of course.
624
00:29:48,970 --> 00:29:49,710
DAVID J. MALAN: And I didn't catch your name.
625
00:29:49,710 --> 00:29:50,250
What was your name?
626
00:29:50,250 --> 00:29:51,080
AUDIENCE: I'm Rave.
627
00:29:51,080 --> 00:29:51,450
DAVID J. MALAN: I'm sorry?
628
00:29:51,450 --> 00:29:52,440
AUDIENCE: Rave, like a party.
629
00:29:52,440 --> 00:29:53,130
DAVID J. MALAN: Rave, OK.
630
00:29:53,130 --> 00:29:53,672
Nice to meet.
631
00:29:53,672 --> 00:29:54,930
Come on over.
632
00:29:54,930 --> 00:29:56,760
So Rave has kindly volunteered now.
633
00:29:56,760 --> 00:29:58,725
And I'm going to give you an additional advantage this time.
634
00:29:58,725 --> 00:29:59,400
AUDIENCE: OK.
635
00:29:59,400 --> 00:30:03,180
DAVID J. MALAN: Unbeknownst to you, I now took numbers behind the doors,
636
00:30:03,180 --> 00:30:04,530
but I sorted them for you.
637
00:30:04,530 --> 00:30:06,120
So they're not in the same random order like they
638
00:30:06,120 --> 00:30:08,162
were for [? Nomira. ?] You now have the advantage
639
00:30:08,162 --> 00:30:10,678
to know that the numbers are sorted from small to big.
640
00:30:10,678 --> 00:30:11,220
AUDIENCE: OK.
641
00:30:11,220 --> 00:30:15,180
DAVID J. MALAN: Given that, and given perhaps what we talked about in week zero
642
00:30:15,180 --> 00:30:19,140
with the phone book, where might you propose we begin the story this time?
643
00:30:19,140 --> 00:30:20,850
With which locker?
644
00:30:20,850 --> 00:30:21,808
AUDIENCE: To find zero?
645
00:30:21,808 --> 00:30:23,600
DAVID J. MALAN: Let's find number six this time.
646
00:30:23,600 --> 00:30:24,940
Let's make things interesting.
647
00:30:24,940 --> 00:30:26,200
AUDIENCE: OK.
648
00:30:26,200 --> 00:30:27,300
I'll start in the middle.
649
00:30:27,300 --> 00:30:28,050
DAVID J. MALAN: OK, so the middle.
650
00:30:28,050 --> 00:30:28,990
There's seven total.
651
00:30:28,990 --> 00:30:29,490
So--
652
00:30:29,490 --> 00:30:29,670
AUDIENCE: OK.
653
00:30:29,670 --> 00:30:30,420
DAVID J. MALAN: --that would be right here.
654
00:30:30,420 --> 00:30:30,920
Go ahead.
655
00:30:30,920 --> 00:30:32,460
Open that up.
656
00:30:32,460 --> 00:30:34,320
And you find, sadly, the number five.
657
00:30:34,320 --> 00:30:36,150
So what do you know now?
658
00:30:36,150 --> 00:30:37,257
AUDIENCE: I know to go up.
659
00:30:37,257 --> 00:30:37,840
DAVID J. MALAN: OK.
660
00:30:37,840 --> 00:30:38,382
AUDIENCE: OK.
661
00:30:38,382 --> 00:30:40,465
DAVID J. MALAN: All right, and just to keep it uniform,
662
00:30:40,465 --> 00:30:43,080
just like I did, I opened to the right half of the phone book.
663
00:30:43,080 --> 00:30:43,350
AUDIENCE: Yes.
664
00:30:43,350 --> 00:30:44,880
DAVID J. MALAN: Let's keep it similar.
665
00:30:44,880 --> 00:30:45,390
Yeah.
666
00:30:45,390 --> 00:30:46,223
AUDIENCE: All right.
667
00:30:46,223 --> 00:30:48,182
DAVID J. MALAN: All right, and, uh, a little too far
668
00:30:48,182 --> 00:30:50,070
even though I know you wanted to go one over.
669
00:30:50,070 --> 00:30:51,080
AUDIENCE: All good, all good.
670
00:30:51,080 --> 00:30:52,800
DAVID J. MALAN: And now we're going to go which direction?
671
00:30:52,800 --> 00:30:54,217
AUDIENCE: Over here in the middle.
672
00:30:54,217 --> 00:30:56,320
DAVID J. MALAN: Right, and voila, the number six.
673
00:30:56,320 --> 00:30:57,840
All right, so very nicely done.
674
00:30:57,840 --> 00:31:00,620
675
00:31:00,620 --> 00:31:02,320
A little stressful for you as well.
676
00:31:02,320 --> 00:31:03,210
Thank you again.
677
00:31:03,210 --> 00:31:05,790
So here we see by nature of the locker door
678
00:31:05,790 --> 00:31:10,350
still being open sort of an artifact of the greater efficiency,
679
00:31:10,350 --> 00:31:13,560
it would seem, of this algorithm because now that Rave
680
00:31:13,560 --> 00:31:16,470
was given the assumption that these numbers are sorted from small
681
00:31:16,470 --> 00:31:20,310
on the left to large on the right, she was able to apply that same divide
682
00:31:20,310 --> 00:31:23,610
and conquer algorithm from week zero which we're now going to give a name--
683
00:31:23,610 --> 00:31:25,200
binary search.
684
00:31:25,200 --> 00:31:28,650
And simply by starting in the middle and realizing,
685
00:31:28,650 --> 00:31:32,670
OK, too small, then by going to the right half and realizing, oh,
686
00:31:32,670 --> 00:31:35,820
went a little too far, then by going to the left half, which,
687
00:31:35,820 --> 00:31:39,660
Rave able to find in just three steps instead of seven
688
00:31:39,660 --> 00:31:43,720
the number six in this case that we were actually searching for.
689
00:31:43,720 --> 00:31:47,890
So you can see that this would seem to be more efficient.
690
00:31:47,890 --> 00:31:50,700
Let's consider for just a moment is it correct.
691
00:31:50,700 --> 00:31:56,250
If I had used different numbers but still sorted them from left to right,
692
00:31:56,250 --> 00:31:59,378
would it still have worked this algorithm?
693
00:31:59,378 --> 00:32:00,420
You're nodding your head.
694
00:32:00,420 --> 00:32:01,170
Can I call on you?
695
00:32:01,170 --> 00:32:03,920
Like, why would it still have worked, do you think?
696
00:32:03,920 --> 00:32:06,700
AUDIENCE: [INAUDIBLE]
697
00:32:06,700 --> 00:32:08,450
DAVID J. MALAN: Yeah, so so long as the numbers
698
00:32:08,450 --> 00:32:10,760
are always in the same order from left to right
699
00:32:10,760 --> 00:32:13,970
or, heck, they could even be in reverse order, so long as it's consistent,
700
00:32:13,970 --> 00:32:18,410
the decisions that Rave was making-- if greater than, else, if less than--
701
00:32:18,410 --> 00:32:20,820
would guide us to the solution no matter what.
702
00:32:20,820 --> 00:32:23,460
And it would seem to take fewer steps.
703
00:32:23,460 --> 00:32:26,220
So if we consider now the pseudo code for this algorithm,
704
00:32:26,220 --> 00:32:28,530
let's take a look how we might describe binary search.
705
00:32:28,530 --> 00:32:31,400
So binary search we might describe with something like this.
706
00:32:31,400 --> 00:32:34,640
If the number is behind the middle door, which is where Rave began,
707
00:32:34,640 --> 00:32:36,710
then we can just return true.
708
00:32:36,710 --> 00:32:40,290
Else if the number is less than the middle door,
709
00:32:40,290 --> 00:32:42,800
so if six is less than whatever is behind the middle door,
710
00:32:42,800 --> 00:32:45,140
then Rave would have searched the left half.
711
00:32:45,140 --> 00:32:47,690
Else if the number is greater than the middle door,
712
00:32:47,690 --> 00:32:49,700
Rave would have searched the right half.
713
00:32:49,700 --> 00:32:53,840
Else, if there are no doors-- and we'll see in a moment why I put
714
00:32:53,840 --> 00:32:55,710
this up top just to keep things clean.
715
00:32:55,710 --> 00:32:59,390
If there's no doors, what should Rave have presumably returned immediately
716
00:32:59,390 --> 00:33:02,870
if I gave her no lockers to work with?
717
00:33:02,870 --> 00:33:03,920
Just returned false.
718
00:33:03,920 --> 00:33:06,020
But this is an important case to consider
719
00:33:06,020 --> 00:33:09,980
because if in the process of searching by locker by locker,
720
00:33:09,980 --> 00:33:14,630
we might have whittled down the problem from seven doors to three doors
721
00:33:14,630 --> 00:33:17,370
to one door to zero doors-- and at that point,
722
00:33:17,370 --> 00:33:19,230
we might have had no doors left to search.
723
00:33:19,230 --> 00:33:22,040
So we have to naturally have a scenario for just considering
724
00:33:22,040 --> 00:33:23,120
if there were no doors.
725
00:33:23,120 --> 00:33:26,910
So it's not to say that maybe I don't give Rave any doors to begin with.
726
00:33:26,910 --> 00:33:28,910
But as she divides and divides and divides,
727
00:33:28,910 --> 00:33:32,720
if she runs out of lockers to ask those questions of-- or a few weeks ago,
728
00:33:32,720 --> 00:33:35,660
if I ran out of phone book pages to tear in half,
729
00:33:35,660 --> 00:33:39,210
I too might have had to return false as in this case.
730
00:33:39,210 --> 00:33:42,500
So how can we now describe this a little more like C
731
00:33:42,500 --> 00:33:45,710
just to give ourselves a variable to start thinking and talking about?
732
00:33:45,710 --> 00:33:48,930
Well, I might talk about doors as being an array.
733
00:33:48,930 --> 00:33:52,490
And so if I want to express the middle door, I could just, in pseudo code,
734
00:33:52,490 --> 00:33:54,470
say doors bracket middle.
735
00:33:54,470 --> 00:33:56,270
I'm assuming that someone has done the math
736
00:33:56,270 --> 00:33:59,450
to figure out what the middle door is, but that's easy enough to do.
737
00:33:59,450 --> 00:34:01,940
And then doors, if the number we're looking for
738
00:34:01,940 --> 00:34:05,470
is less than doors bracket middle, then search door
739
00:34:05,470 --> 00:34:09,230
zero through doors middle minus 1.
740
00:34:09,230 --> 00:34:13,610
So again, this is a more pedantic way of taking what's a pretty intuitive idea--
741
00:34:13,610 --> 00:34:16,159
search the left half, search the right half--
742
00:34:16,159 --> 00:34:22,790
but start to now describe it in terms of actual indices or indexes
743
00:34:22,790 --> 00:34:24,593
like we did with our array notation.
744
00:34:24,593 --> 00:34:26,510
The last scenario, of course, is if the number
745
00:34:26,510 --> 00:34:28,760
is greater than the door's bracket middle,
746
00:34:28,760 --> 00:34:32,060
then Rave would have wanted to search the middle door plus 1--
747
00:34:32,060 --> 00:34:37,250
so 1 over-- through doors n minus 1--
748
00:34:37,250 --> 00:34:38,330
through n minus 1.
749
00:34:38,330 --> 00:34:41,389
So again, just a way of sort of describing a little more syntactically
750
00:34:41,389 --> 00:34:42,989
what it is that's going on.
751
00:34:42,989 --> 00:34:46,820
So how might we translate this now into big O notation?
752
00:34:46,820 --> 00:34:53,870
Well, in the worst case, how many steps total might Rave's binary search
753
00:34:53,870 --> 00:34:55,070
algorithm have taken?
754
00:34:55,070 --> 00:34:59,030
Given seven doors or given more generically n doors,
755
00:34:59,030 --> 00:35:03,620
how many times could she go left or go right before finding herself with one
756
00:35:03,620 --> 00:35:06,110
or no doors left?
757
00:35:06,110 --> 00:35:08,787
What's the way to think about that?
758
00:35:08,787 --> 00:35:09,620
Yeah, in the middle?
759
00:35:09,620 --> 00:35:11,150
AUDIENCE: Log n.
760
00:35:11,150 --> 00:35:11,990
DAVID J. MALAN: Log n.
761
00:35:11,990 --> 00:35:13,280
So there's log n again.
762
00:35:13,280 --> 00:35:16,155
And even if you're not feeling wholly comfortable with your logarithm
763
00:35:16,155 --> 00:35:19,250
still, pretty much in programming and in computer science more generally,
764
00:35:19,250 --> 00:35:22,430
any time we talk about some algorithm that's dividing and conquering
765
00:35:22,430 --> 00:35:26,180
in half, in half, in half, or any other multiple,
766
00:35:26,180 --> 00:35:28,580
it's probably involving logarithms in some sense.
767
00:35:28,580 --> 00:35:31,400
And log base n essentially refers to the number
768
00:35:31,400 --> 00:35:37,160
of times you can divide n by 2 until you bottom out at just a single door
769
00:35:37,160 --> 00:35:39,410
or equivalently zero doors left.
770
00:35:39,410 --> 00:35:40,370
So log n.
771
00:35:40,370 --> 00:35:44,270
So we might say that indeed, binary search is in big O of log n
772
00:35:44,270 --> 00:35:48,440
because the door that Rave opened last, this one,
773
00:35:48,440 --> 00:35:50,360
happened to be three doors away.
774
00:35:50,360 --> 00:35:52,790
And actually, if you do the math here, that roughly
775
00:35:52,790 --> 00:35:54,600
works out to be exactly that case.
776
00:35:54,600 --> 00:35:58,640
If we add one, that's sort of out of seven doors or roughly eight,
777
00:35:58,640 --> 00:36:01,940
we were able to search it in just three total steps.
778
00:36:01,940 --> 00:36:03,980
What about omega notation, though?
779
00:36:03,980 --> 00:36:07,220
Like, in the best case, Rave might have gotten lucky.
780
00:36:07,220 --> 00:36:09,170
She opened the door, and there it is.
781
00:36:09,170 --> 00:36:14,970
So how might we describe a lower bound on the running time of binary search.
782
00:36:14,970 --> 00:36:15,470
Yeah.
783
00:36:15,470 --> 00:36:16,125
AUDIENCE: 1.
784
00:36:16,125 --> 00:36:17,000
DAVID J. MALAN: Say again?
785
00:36:17,000 --> 00:36:17,840
AUDIENCE: 1.
786
00:36:17,840 --> 00:36:19,220
DAVID J. MALAN: Omega of 1.
787
00:36:19,220 --> 00:36:23,810
So here too, we see that in some cases binary search and linear search, eh,
788
00:36:23,810 --> 00:36:25,700
like, they're pretty equivalent.
789
00:36:25,700 --> 00:36:30,830
And so this is why sometimes compelling to consider both the best
790
00:36:30,830 --> 00:36:33,290
case in the worst case because honestly, in general,
791
00:36:33,290 --> 00:36:35,600
who really cares if you just get lucky once in a while
792
00:36:35,600 --> 00:36:37,280
and your algorithm is super fast?
793
00:36:37,280 --> 00:36:40,250
What you probably care about is what's the worst case.
794
00:36:40,250 --> 00:36:41,390
How long are my users--
795
00:36:41,390 --> 00:36:45,170
how long am I going to be sitting there watching some spinning hourglass
796
00:36:45,170 --> 00:36:50,807
or beach ball trying to give myself an answer to a pretty big problem?
797
00:36:50,807 --> 00:36:53,640
Well, odds are, you're going to generally care about big O notation.
798
00:36:53,640 --> 00:36:55,430
So indeed, moving forward, will generally
799
00:36:55,430 --> 00:36:58,730
talk about the running time of algorithms often in terms of big O,
800
00:36:58,730 --> 00:37:00,780
a little less so in terms of omega.
801
00:37:00,780 --> 00:37:03,140
But understanding the range can be important
802
00:37:03,140 --> 00:37:08,700
depending on the nature of the data that you're going to actually be given here.
803
00:37:08,700 --> 00:37:11,510
All right let me pause and see if there is any questions.
804
00:37:11,510 --> 00:37:14,200
805
00:37:14,200 --> 00:37:16,420
Any questions here?
806
00:37:16,420 --> 00:37:18,850
Yes, thank you.
807
00:37:18,850 --> 00:37:21,430
AUDIENCE: So this method is clearly more efficient,
808
00:37:21,430 --> 00:37:26,440
but it requires that the information is all compiled in a certain order.
809
00:37:26,440 --> 00:37:29,770
How do you ensure that you can compile information
810
00:37:29,770 --> 00:37:31,265
in a particular order at scale?
811
00:37:31,265 --> 00:37:33,140
DAVID J. MALAN: Yeah, it's a really good question.
812
00:37:33,140 --> 00:37:36,015
And if I can generalize it, how do you guarantee that you can do this
813
00:37:36,015 --> 00:37:38,560
at scale, which algorithm is better?
814
00:37:38,560 --> 00:37:41,440
I've sort of led us down this road of implying
815
00:37:41,440 --> 00:37:43,540
that Rave's second algorithm, binary search,
816
00:37:43,540 --> 00:37:45,580
is better because it's so much faster.
817
00:37:45,580 --> 00:37:49,600
It's log of n in the worst case instead of big O of n.
818
00:37:49,600 --> 00:37:53,230
But Rave was given an advantage when she came up here in that the doors were
819
00:37:53,230 --> 00:37:54,222
already sorted.
820
00:37:54,222 --> 00:37:55,930
And so that sort of invites the question,
821
00:37:55,930 --> 00:37:57,670
well, given a whole bunch of random data,
822
00:37:57,670 --> 00:38:00,710
either a small data set or, heck, something Google sized with millions,
823
00:38:00,710 --> 00:38:04,540
billions of pieces of data, should you sort it first
824
00:38:04,540 --> 00:38:07,150
from smallest to largest and then search?
825
00:38:07,150 --> 00:38:11,920
Or should you just dive right in and search it linearly?
826
00:38:11,920 --> 00:38:13,638
Like, how might you think about that?
827
00:38:13,638 --> 00:38:15,430
If you are Google, for instance, and you've
828
00:38:15,430 --> 00:38:19,090
got millions, billions of web pages, should they just go with linear search
829
00:38:19,090 --> 00:38:21,850
because it's always going to work even though it might be slow?
830
00:38:21,850 --> 00:38:24,820
Or should they invest the time in sorting all of that data--
831
00:38:24,820 --> 00:38:26,630
we'll see how in a bit--
832
00:38:26,630 --> 00:38:28,900
and then search it more efficiently?
833
00:38:28,900 --> 00:38:31,438
Like, how do you decide between those options?
834
00:38:31,438 --> 00:38:33,730
AUDIENCE: If you're sorting the data, then wouldn't you
835
00:38:33,730 --> 00:38:36,573
have to go through all of the data?
836
00:38:36,573 --> 00:38:38,740
DAVID J. MALAN: Yeah, if you had to sort the data first--
837
00:38:38,740 --> 00:38:40,700
and we don't yet formally know how to do this.
838
00:38:40,700 --> 00:38:43,117
But obviously, as humans, we could probably figure it out.
839
00:38:43,117 --> 00:38:45,280
You do have to look at all of the data anyway.
840
00:38:45,280 --> 00:38:48,760
And so you're sort of wasting your time if you're sorting it only
841
00:38:48,760 --> 00:38:50,680
then to go in search it.
842
00:38:50,680 --> 00:38:52,898
But maybe it depends a bit more.
843
00:38:52,898 --> 00:38:54,940
Like, that's absolutely right, and if you're just
844
00:38:54,940 --> 00:38:58,060
searching for one thing in life, then that's probably a waste of time
845
00:38:58,060 --> 00:39:01,720
to sort it and then search it because you're just adding to the process.
846
00:39:01,720 --> 00:39:03,880
But what's another scenario in which you might not
847
00:39:03,880 --> 00:39:09,410
worry about that whereby it might make sense to sort it and then search?
848
00:39:09,410 --> 00:39:09,910
Yeah.
849
00:39:09,910 --> 00:39:16,580
AUDIENCE: [INAUDIBLE] you can go and use the other values as a way
850
00:39:16,580 --> 00:39:17,810
to find out what's happening.
851
00:39:17,810 --> 00:39:18,852
DAVID J. MALAN: Yeah, exactly.
852
00:39:18,852 --> 00:39:21,392
So if your problem is a Google-like problem where
853
00:39:21,392 --> 00:39:24,350
you have more than just one user who's searching for more than just one
854
00:39:24,350 --> 00:39:26,780
website page, probably you should incur the cost up front
855
00:39:26,780 --> 00:39:30,620
and sort the whole thing because every subsequent request thereafter
856
00:39:30,620 --> 00:39:32,810
is going to be faster, faster, faster because it's
857
00:39:32,810 --> 00:39:36,440
going to [INAUDIBLE] algorithm of binary search, binary search, binary search
858
00:39:36,440 --> 00:39:39,320
that's going to add up to be way fewer steps
859
00:39:39,320 --> 00:39:41,610
than doing linear search multiple times.
860
00:39:41,610 --> 00:39:43,490
So again, kind of depends on the use case
861
00:39:43,490 --> 00:39:45,350
and kind of depends on how important it is.
862
00:39:45,350 --> 00:39:48,050
And this happens even in real world contexts.
863
00:39:48,050 --> 00:39:50,900
I think back always to graduate school, when I was writing some code
864
00:39:50,900 --> 00:39:52,610
to analyze some large data set.
865
00:39:52,610 --> 00:39:55,400
And honestly, it was actually easier at the time for me
866
00:39:55,400 --> 00:39:58,310
to write pretty inefficient but hopefully correct
867
00:39:58,310 --> 00:39:59,537
code because you know what?
868
00:39:59,537 --> 00:40:02,870
I could just go to sleep for eight hours and let it analyze this really big data
869
00:40:02,870 --> 00:40:03,370
set.
870
00:40:03,370 --> 00:40:06,335
I didn't have to bother writing more complex code to sort it just
871
00:40:06,335 --> 00:40:07,460
to run it more efficiently.
872
00:40:07,460 --> 00:40:07,960
Why?
873
00:40:07,960 --> 00:40:11,520
Because I was the only user, and I only needed to run these queries once.
874
00:40:11,520 --> 00:40:13,700
And so this was kind of a reasonable approach,
875
00:40:13,700 --> 00:40:17,550
reasonable until I woke up eight hours later and my code was incorrect.
876
00:40:17,550 --> 00:40:20,960
And now I had to spend another eight hours rerunning it after fixing it.
877
00:40:20,960 --> 00:40:22,910
But even there, you see an example where,
878
00:40:22,910 --> 00:40:24,890
what is your most precious resource?
879
00:40:24,890 --> 00:40:26,720
Is it time to run the code?
880
00:40:26,720 --> 00:40:28,970
Is it time to write the code?
881
00:40:28,970 --> 00:40:31,098
Is it the amount of memory the computer is using?
882
00:40:31,098 --> 00:40:33,890
These are all resources we'll start to talk about because it really
883
00:40:33,890 --> 00:40:36,080
depends on what your goals are.
884
00:40:36,080 --> 00:40:39,050
Any questions, then, on upper bounds, lower bounds,
885
00:40:39,050 --> 00:40:42,260
or each of these two searches, linear or binary?
886
00:40:42,260 --> 00:40:42,991
Yeah.
887
00:40:42,991 --> 00:40:45,580
AUDIENCE: So just, when you're calculating running time,
888
00:40:45,580 --> 00:40:50,317
does the sorting step count for that time?
889
00:40:50,317 --> 00:40:53,150
DAVID J. MALAN: When analyzing running time, does the sorting step count?
890
00:40:53,150 --> 00:40:55,310
If you want it to if you actually do it.
891
00:40:55,310 --> 00:40:56,900
At the moment, it did not apply.
892
00:40:56,900 --> 00:41:01,100
I just gave Rave the luxury of knowing that the data was sorted.
893
00:41:01,100 --> 00:41:04,520
But if I really wanted to charge her for the amount of time
894
00:41:04,520 --> 00:41:07,730
it took to find that number six, I should have added the time
895
00:41:07,730 --> 00:41:09,712
to sort plus the time to search.
896
00:41:09,712 --> 00:41:11,420
And in fact, that's a road we'll go down.
897
00:41:11,420 --> 00:41:13,170
Why don't we go ahead and pace ourselves as before?
898
00:41:13,170 --> 00:41:14,587
Let's take a 10 minute break here.
899
00:41:14,587 --> 00:41:17,130
And when we come back, we'll write some actual code.
900
00:41:17,130 --> 00:41:21,038
So we've seen a couple of searches-- linear search and binary search, which,
901
00:41:21,038 --> 00:41:22,580
to be fair, we saw back in week zero.
902
00:41:22,580 --> 00:41:25,790
But let's actually translate at least one of those now to some code
903
00:41:25,790 --> 00:41:28,970
using this building block from last week where we can actually
904
00:41:28,970 --> 00:41:32,820
define an array if we want, like an array of integers called numbers.
905
00:41:32,820 --> 00:41:34,550
So let me switch over to BS Code here.
906
00:41:34,550 --> 00:41:37,910
Let me go ahead and start a program called numbers.c.
907
00:41:37,910 --> 00:41:40,940
And in numbers.c, let me go ahead here.
908
00:41:40,940 --> 00:41:44,840
And how about let's include our familiar header files?
909
00:41:44,840 --> 00:41:46,670
So css50.h.
910
00:41:46,670 --> 00:41:51,330
I'll include standardio.h that we can get input and print input if we want.
911
00:41:51,330 --> 00:41:54,410
And now I'm going to go ahead and give myself int main void.
912
00:41:54,410 --> 00:41:56,100
No command line arguments today.
913
00:41:56,100 --> 00:41:57,232
So I'll leave that as void.
914
00:41:57,232 --> 00:41:58,940
And I'm going to go ahead and give myself
915
00:41:58,940 --> 00:42:01,410
an array of how about seven numbers?
916
00:42:01,410 --> 00:42:04,220
So I'll call it int number 7.
917
00:42:04,220 --> 00:42:06,260
And then I can fill this array with numbers.
918
00:42:06,260 --> 00:42:10,100
Like, numbers brackets 0 can be the number 4, and numbers bracket 1
919
00:42:10,100 --> 00:42:14,308
could be the number 6, and numbers bracket 2 can be the number 8.
920
00:42:14,308 --> 00:42:16,850
And this is the same list that we saw with [? Nomira ?] a bit
921
00:42:16,850 --> 00:42:19,170
ago where it was 4, then 6, then 8.
922
00:42:19,170 --> 00:42:19,920
But you know what?
923
00:42:19,920 --> 00:42:22,340
There's actually another syntax I can show you here.
924
00:42:22,340 --> 00:42:25,220
If you know in advance in a C program that you
925
00:42:25,220 --> 00:42:30,390
want an array of certain values and you know therefore how many of those values
926
00:42:30,390 --> 00:42:33,410
you want, you can actually do this little trick using curly braces.
927
00:42:33,410 --> 00:42:36,560
You can say, don't worry about how big this is.
928
00:42:36,560 --> 00:42:39,620
It's going to be implicit by way of these curly braces.
929
00:42:39,620 --> 00:42:44,610
Here, I can do 4, 6, 8, 2, 7, 5, 0, close curly brace.
930
00:42:44,610 --> 00:42:46,730
So it's a somewhat new use of curly braces.
931
00:42:46,730 --> 00:42:50,780
But this has the effect of giving me an array called numbers inside
932
00:42:50,780 --> 00:42:52,440
of which are a whole bunch of integers.
933
00:42:52,440 --> 00:42:53,030
How many?
934
00:42:53,030 --> 00:42:56,960
The compiler can infer it from what's ever inside these curly braces.
935
00:42:56,960 --> 00:43:00,140
And it seems to be of size 1, 2, 3, 4, 5, 6, 7.
936
00:43:00,140 --> 00:43:05,510
And all seven elements will be initialized with 4, 6, 8, 2, 7, 5, 0
937
00:43:05,510 --> 00:43:06,330
respectively.
938
00:43:06,330 --> 00:43:08,780
So just a minor optimization code wise to tighten up
939
00:43:08,780 --> 00:43:12,090
what would have otherwise been like eight separate lines of code.
940
00:43:12,090 --> 00:43:15,080
Now let's go ahead and implement linear search, as we called it.
941
00:43:15,080 --> 00:43:18,122
And you can do this in a bunch of ways, but I'm going to do it like this.
942
00:43:18,122 --> 00:43:24,830
For int i get 0, i is less than 7 i plus plus.
943
00:43:24,830 --> 00:43:27,800
Then inside of my loop, I'm going to ask the question, well,
944
00:43:27,800 --> 00:43:33,020
if the numbers at location i equals equals, as we asked of
945
00:43:33,020 --> 00:43:36,680
[? Nomira, ?] the number 0, then I'm going to go ahead and do something
946
00:43:36,680 --> 00:43:41,450
like printf found backslash n.
947
00:43:41,450 --> 00:43:43,280
And then I'm going to return 0.
948
00:43:43,280 --> 00:43:46,040
Just because of last week's discussion of returning
949
00:43:46,040 --> 00:43:50,150
a value for main when all is well, I'm going to return 0 by convention
950
00:43:50,150 --> 00:43:53,060
just to signal that indeed, I found what I'm looking for.
951
00:43:53,060 --> 00:44:00,560
Otherwise, on what line do I want to go and add a printf, like, not found
952
00:44:00,560 --> 00:44:02,600
and return something other than 0?
953
00:44:02,600 --> 00:44:06,860
Right, I don't think I want an else here per our pseudo code earlier.
954
00:44:06,860 --> 00:44:11,030
So on what line would you prefer I sort of insert a default scenario
955
00:44:11,030 --> 00:44:14,490
of not found and I'll return an error?
956
00:44:14,490 --> 00:44:15,795
Yeah, over here?
957
00:44:15,795 --> 00:44:19,343
[INTERPOSING VOICES]
958
00:44:19,343 --> 00:44:20,010
DAVID J. MALAN: Nice.
959
00:44:20,010 --> 00:44:21,718
So at the end of the for loop because you
960
00:44:21,718 --> 00:44:23,550
want to give the program or our volunteer
961
00:44:23,550 --> 00:44:26,980
earlier a chance to go through all of the doors, all of the numbers.
962
00:44:26,980 --> 00:44:29,700
But if you go through the whole thing, through the whole loop,
963
00:44:29,700 --> 00:44:33,630
at the very end, you probably just want to conclude not found backslash n
964
00:44:33,630 --> 00:44:36,060
and then return something like positive 1
965
00:44:36,060 --> 00:44:38,040
just to signify that an error happened.
966
00:44:38,040 --> 00:44:40,170
And again, this was a minor detail last week.
967
00:44:40,170 --> 00:44:44,370
Any time main is successful, the programming convention is to return 0.
968
00:44:44,370 --> 00:44:45,850
That means all as well.
969
00:44:45,850 --> 00:44:49,020
And if something goes wrong, like you didn't find what you're looking for,
970
00:44:49,020 --> 00:44:52,950
you might return something other than 0, like positive 1, maybe positive 2,
971
00:44:52,950 --> 00:44:55,202
or even negative numbers if you want.
972
00:44:55,202 --> 00:44:57,160
All right, well, let me go ahead and save this.
973
00:44:57,160 --> 00:44:58,800
Let me do make numbers.
974
00:44:58,800 --> 00:45:01,230
Hopefully no syntax errors.
975
00:45:01,230 --> 00:45:02,520
All good so far.
976
00:45:02,520 --> 00:45:05,040
dot slash numbers, enter.
977
00:45:05,040 --> 00:45:07,500
All right, and it's found, as I would hope it would be.
978
00:45:07,500 --> 00:45:10,710
And just as a little check, let's search for something that's definitely
979
00:45:10,710 --> 00:45:14,790
not there, like the number negative 1.
980
00:45:14,790 --> 00:45:17,640
Let me go ahead and recompile the code with make numbers.
981
00:45:17,640 --> 00:45:19,890
Let me rerun the code with dot slash numbers
982
00:45:19,890 --> 00:45:21,900
and hopefully-- whew, OK, not found.
983
00:45:21,900 --> 00:45:24,473
So proof by example seems to be working correctly.
984
00:45:24,473 --> 00:45:26,640
But let's make things a little more interesting now.
985
00:45:26,640 --> 00:45:29,700
Right now, I'm using just an array of integers.
986
00:45:29,700 --> 00:45:34,090
Let me go ahead and introduce maybe an array of strings instead.
987
00:45:34,090 --> 00:45:37,360
And maybe this time, I'll store a bunch of names and not just integers
988
00:45:37,360 --> 00:45:39,100
but actual strings of names.
989
00:45:39,100 --> 00:45:40,330
So how might I do this?
990
00:45:40,330 --> 00:45:42,130
Well, let me go back to my code here.
991
00:45:42,130 --> 00:45:46,470
I'm going to switch us over to maybe a file called names.c.
992
00:45:46,470 --> 00:45:50,130
And in here, I'll go ahead and include cs50.h.
993
00:45:50,130 --> 00:45:53,550
I'll include standardio.h.
994
00:45:53,550 --> 00:45:57,030
And I'm going to go ahead and for now include a new friend
995
00:45:57,030 --> 00:46:00,478
from last week, string.h, which gives me some string-related functionality.
996
00:46:00,478 --> 00:46:03,270
Int main void because I'm not going to bother with any command line
997
00:46:03,270 --> 00:46:04,480
arguments for now.
998
00:46:04,480 --> 00:46:09,330
And now if I want an array of strings, I could do something like this--
999
00:46:09,330 --> 00:46:12,120
string names bracket 7.
1000
00:46:12,120 --> 00:46:14,100
And then I could start doing like before.
1001
00:46:14,100 --> 00:46:17,580
Names bracket 0 could be someone like Bill, and names bracket 1
1002
00:46:17,580 --> 00:46:20,740
could be someone like Charlie and so forth.
1003
00:46:20,740 --> 00:46:24,352
But there's this new improvement I can make.
1004
00:46:24,352 --> 00:46:27,060
Let me just let the compiler figure out how many names there are.
1005
00:46:27,060 --> 00:46:32,550
And using curly braces, I'll do Bill and then Charlie and then Fred and then
1006
00:46:32,550 --> 00:46:39,690
George and then Ginny and then Percy and then Ron if there's the pattern there.
1007
00:46:39,690 --> 00:46:42,930
All right, so now I have these seven names as strings.
1008
00:46:42,930 --> 00:46:44,230
Let's do something similar.
1009
00:46:44,230 --> 00:46:47,730
So for int, i get 0.
1010
00:46:47,730 --> 00:46:51,330
i is less than 7 as before, i plus plus as before.
1011
00:46:51,330 --> 00:46:54,900
And inside of the, loop lets this time check for the string in question,
1012
00:46:54,900 --> 00:46:57,630
and suppose we're searching for Ron arbitrarily.
1013
00:46:57,630 --> 00:47:00,090
He is there, so we should eventually find him.
1014
00:47:00,090 --> 00:47:07,530
Let me go ahead and say if names bracket i equals quote unquote Ron, then inside
1015
00:47:07,530 --> 00:47:11,340
of my if condition, I'm going to say printf found just like before.
1016
00:47:11,340 --> 00:47:13,590
And I'm going to return 0 just because all is well.
1017
00:47:13,590 --> 00:47:16,390
And I'm going to take your advice from the get go this time
1018
00:47:16,390 --> 00:47:20,560
and, at the end of the loop, print out not found because if I get this far,
1019
00:47:20,560 --> 00:47:23,670
I have not printed found, and I have not returned already.
1020
00:47:23,670 --> 00:47:27,840
So I'm just going to go ahead and return 1 after printing not found.
1021
00:47:27,840 --> 00:47:30,420
All right, let me go ahead and cross my fingers as always.
1022
00:47:30,420 --> 00:47:33,310
Make names this time.
1023
00:47:33,310 --> 00:47:36,300
And it doesn't seem to like my code here.
1024
00:47:36,300 --> 00:47:38,370
This is perhaps a new error that you might not
1025
00:47:38,370 --> 00:47:41,080
have seen yet in names.c line 11.
1026
00:47:41,080 --> 00:47:43,920
So that's this line here, my if condition.
1027
00:47:43,920 --> 00:47:47,970
Result of comparison against a string literal is unspecified.
1028
00:47:47,970 --> 00:47:50,392
Use an explicit string comparison function instead.
1029
00:47:50,392 --> 00:47:53,100
I mean, that's kind of a mouthful, and the first time you see it,
1030
00:47:53,100 --> 00:47:55,600
you're probably not going to know how to make sense of that.
1031
00:47:55,600 --> 00:47:59,130
But it does kind of draw our attention to something being awry
1032
00:47:59,130 --> 00:48:03,840
with the equality checking here, with equal equals and Ron.
1033
00:48:03,840 --> 00:48:06,240
And here's where again we've been telling
1034
00:48:06,240 --> 00:48:08,700
sort of a white lie for the past couple of weeks.
1035
00:48:08,700 --> 00:48:12,900
Strings are a thing in C. Strings are a thing in programming.
1036
00:48:12,900 --> 00:48:14,670
But recall from last week, I did disclaim
1037
00:48:14,670 --> 00:48:16,650
there's no such thing as a string data type
1038
00:48:16,650 --> 00:48:20,670
technically because it's not a primitive in the way an int
1039
00:48:20,670 --> 00:48:24,210
and a float and a bool are that are sort of built into the language.
1040
00:48:24,210 --> 00:48:28,170
You can't just use equation equals to compare two strings.
1041
00:48:28,170 --> 00:48:31,110
You actually have to use a special function that's
1042
00:48:31,110 --> 00:48:34,140
in this header file we talked briefly about last week.
1043
00:48:34,140 --> 00:48:36,780
In that header file was string length or strlen.
1044
00:48:36,780 --> 00:48:39,490
But there's other functions instead as well.
1045
00:48:39,490 --> 00:48:43,080
Let me, in fact, go ahead and open up the manual pages.
1046
00:48:43,080 --> 00:48:46,200
And if we go to string.h--
1047
00:48:46,200 --> 00:48:47,760
let me scroll down a bit.
1048
00:48:47,760 --> 00:48:52,800
In string.h you can perhaps infer what function will probably take
1049
00:48:52,800 --> 00:48:56,612
the place of equals equals for today.
1050
00:48:56,612 --> 00:48:57,570
What do we want to use?
1051
00:48:57,570 --> 00:48:57,930
Yeah.
1052
00:48:57,930 --> 00:48:58,680
AUDIENCE: Strcmp?
1053
00:48:58,680 --> 00:49:03,305
DAVID J. MALAN: So strcmp, S-T-R-C-M-P, which apparently compares two strings.
1054
00:49:03,305 --> 00:49:05,430
And if I click on that, we'll see more information.
1055
00:49:05,430 --> 00:49:09,510
And indeed, if I click on strcmp, we'll see under the synopsis
1056
00:49:09,510 --> 00:49:14,490
that, OK, I need to use the CS50 header file and string.h, as I already have.
1057
00:49:14,490 --> 00:49:17,850
Here is its prototype, which is telling me
1058
00:49:17,850 --> 00:49:21,360
that strcmp takes two strings, S1 and S2, that
1059
00:49:21,360 --> 00:49:22,890
are presumably going to be compared.
1060
00:49:22,890 --> 00:49:25,230
And it returns an integer, which is interesting.
1061
00:49:25,230 --> 00:49:26,430
So let's read on.
1062
00:49:26,430 --> 00:49:29,730
The description of this function is that it compares two strings case
1063
00:49:29,730 --> 00:49:30,870
sensitively.
1064
00:49:30,870 --> 00:49:34,110
So uppercase or lowercase matters, just FYI.
1065
00:49:34,110 --> 00:49:36,570
And then let's look it the return value here.
1066
00:49:36,570 --> 00:49:40,860
The return value of this function returns an int less than 0
1067
00:49:40,860 --> 00:49:48,480
if S1 comes before S2, 0 if S1 is the same as S2, or an int greater than 0
1068
00:49:48,480 --> 00:49:50,940
if S1 comes after S2.
1069
00:49:50,940 --> 00:49:54,780
So the reason that this function returns an integer and not just a
1070
00:49:54,780 --> 00:49:57,390
bool, true or false, is that it actually will
1071
00:49:57,390 --> 00:50:00,750
allow us to sort these things eventually because if you can tell me
1072
00:50:00,750 --> 00:50:04,980
if two strings come in this order or in this order or they're the same,
1073
00:50:04,980 --> 00:50:07,080
you need three possible return values.
1074
00:50:07,080 --> 00:50:08,910
And a bool, of course, only gives you two,
1075
00:50:08,910 --> 00:50:12,480
but an int gives you like 4 billion even though we just need the 3.
1076
00:50:12,480 --> 00:50:17,520
So 0 or a positive number or a negative number is what this function returns.
1077
00:50:17,520 --> 00:50:21,960
And the documentation goes on to explain what we mean by ASCIIbetical order.
1078
00:50:21,960 --> 00:50:25,260
Recall that capital A is 65, capital B is 66,
1079
00:50:25,260 --> 00:50:27,660
and it's those underlying ASCII or Unicode
1080
00:50:27,660 --> 00:50:31,050
numbers that a computer uses to figure out whether something comes before it
1081
00:50:31,050 --> 00:50:33,180
or after it like in the dictionary.
1082
00:50:33,180 --> 00:50:35,950
But for our purposes now, we only care about equality.
1083
00:50:35,950 --> 00:50:37,680
So I'm going to go ahead and do this.
1084
00:50:37,680 --> 00:50:41,820
If I want to compare names bracket i against Ron,
1085
00:50:41,820 --> 00:50:49,320
I use stir compare or strcmp, names bracket i comma, quote unquote, Ron.
1086
00:50:49,320 --> 00:50:51,510
So it's a little more involved than actually
1087
00:50:51,510 --> 00:50:55,830
using equals equals, which does work for integers, longs,
1088
00:50:55,830 --> 00:50:57,000
and certain other values.
1089
00:50:57,000 --> 00:51:00,850
But for strings, it turns out we need to use a more powerful function.
1090
00:51:00,850 --> 00:51:01,350
Why?
1091
00:51:01,350 --> 00:51:03,630
Well, last week, recall what a string really is.
1092
00:51:03,630 --> 00:51:06,220
It's an array of characters.
1093
00:51:06,220 --> 00:51:09,690
And so whereas you can use equals equals for single characters,
1094
00:51:09,690 --> 00:51:12,600
strcmp, as we'll eventually see, is going
1095
00:51:12,600 --> 00:51:14,438
to compare multiple characters for us.
1096
00:51:14,438 --> 00:51:15,480
There's more logic there.
1097
00:51:15,480 --> 00:51:19,570
There's a loop needed, and that's why it comes with the string library.
1098
00:51:19,570 --> 00:51:22,290
But it doesn't just work out of the box with equals equals alone.
1099
00:51:22,290 --> 00:51:26,267
That would literally be comparing two things, not two arrays of things.
1100
00:51:26,267 --> 00:51:28,350
And we'll come back to this next week as to what's
1101
00:51:28,350 --> 00:51:29,920
really going on under the hood.
1102
00:51:29,920 --> 00:51:34,140
So let me go ahead and fix one bug that I just realized I made.
1103
00:51:34,140 --> 00:51:39,450
I want to check if the return value of str compare is equal to 0
1104
00:51:39,450 --> 00:51:42,660
because per the documentation, that meant they're the same.
1105
00:51:42,660 --> 00:51:45,640
All right, let me go ahead and make names this time.
1106
00:51:45,640 --> 00:51:46,710
Now it compiles.
1107
00:51:46,710 --> 00:51:49,770
Dot slash names, Enter, found.
1108
00:51:49,770 --> 00:51:55,290
And just as a sanity check, let's check someone outside the family.
1109
00:51:55,290 --> 00:51:59,010
Searching now for Hermione after recompiling the code,
1110
00:51:59,010 --> 00:52:00,570
after rerunning the code.
1111
00:52:00,570 --> 00:52:02,280
And she's not, in fact, found.
1112
00:52:02,280 --> 00:52:05,220
So here's just a similar implementation of linear search
1113
00:52:05,220 --> 00:52:09,570
not for integers this time but instead for strings,
1114
00:52:09,570 --> 00:52:13,140
the subtlety really being we need a helper function, str compare,
1115
00:52:13,140 --> 00:52:17,640
to actually do the legwork for us of comparing two arrays of characters.
1116
00:52:17,640 --> 00:52:21,208
All right, questions on either of these implementations-- yeah, in the middle?
1117
00:52:21,208 --> 00:52:22,892
AUDIENCE: So, if I do [INAUDIBLE]
1118
00:52:22,892 --> 00:52:24,100
DAVID J. MALAN: Ah, good question.
1119
00:52:24,100 --> 00:52:28,260
If I had not fixed what I claimed was a mistake earlier and I did this--
1120
00:52:28,260 --> 00:52:30,840
and we saw an example of this last week, actually.
1121
00:52:30,840 --> 00:52:37,470
If a function returns an integer, be it negative or positive or 0,
1122
00:52:37,470 --> 00:52:40,740
when you get back 0, the expression, the Boolean expression,
1123
00:52:40,740 --> 00:52:42,040
will be considered false.
1124
00:52:42,040 --> 00:52:44,940
So 0 equals false always.
1125
00:52:44,940 --> 00:52:49,470
If a function returns any positive number, or any negative number,
1126
00:52:49,470 --> 00:52:52,770
that's going to be interpreted as true even
1127
00:52:52,770 --> 00:52:57,510
if it's positive or negative, whether it's 1, negative 1, 2, negative 2.
1128
00:52:57,510 --> 00:53:01,570
And so if I did this, this would be saying the opposite.
1129
00:53:01,570 --> 00:53:06,930
So if I were to say this, if str compare of names bracket i and Hermione, that's
1130
00:53:06,930 --> 00:53:13,393
implicitly like saying this does not equal 0, or it means sort of is true,
1131
00:53:13,393 --> 00:53:15,810
but you don't want to check for true because, again, we're
1132
00:53:15,810 --> 00:53:17,350
comparing integers here.
1133
00:53:17,350 --> 00:53:21,000
So the reason I did 0 here in this case is
1134
00:53:21,000 --> 00:53:24,610
that it explicitly checks for the return value that means they're the same.
1135
00:53:24,610 --> 00:53:25,110
And yeah.
1136
00:53:25,110 --> 00:53:26,055
Follow up?
1137
00:53:26,055 --> 00:53:30,948
AUDIENCE: [INAUDIBLE]
1138
00:53:30,948 --> 00:53:32,990
DAVID J. MALAN: Yes, you might not have seen this yet,
1139
00:53:32,990 --> 00:53:36,580
but you can express the equivalent because if you
1140
00:53:36,580 --> 00:53:40,300
want to check if this is false, you can actually
1141
00:53:40,300 --> 00:53:43,300
use an exclamation point, known as a bang in programming,
1142
00:53:43,300 --> 00:53:44,960
that inverts the meaning.
1143
00:53:44,960 --> 00:53:48,163
So false becomes true, true becomes false.
1144
00:53:48,163 --> 00:53:50,080
So this would be another way of expressing it.
1145
00:53:50,080 --> 00:53:54,940
This is arguably a worse design, though, because the documentation explicitly
1146
00:53:54,940 --> 00:53:58,660
says you should be checking for 0 or a positive value
1147
00:53:58,660 --> 00:54:01,750
or a negative value, and this little trick, while correct,
1148
00:54:01,750 --> 00:54:05,500
and I think you can make a reasonable case for it, sort of hides that detail.
1149
00:54:05,500 --> 00:54:07,420
And I would argue instead for the first way,
1150
00:54:07,420 --> 00:54:09,607
checking for equals equals 0 instead.
1151
00:54:09,607 --> 00:54:11,440
And if that's a little subtle, not to worry.
1152
00:54:11,440 --> 00:54:16,240
We'll come back to little syntactic tricks like that before long.
1153
00:54:16,240 --> 00:54:20,770
Other questions on linear search in these two forms.
1154
00:54:20,770 --> 00:54:22,450
Is there another hand or hands?
1155
00:54:22,450 --> 00:54:24,070
Two hands?
1156
00:54:24,070 --> 00:54:24,610
No?
1157
00:54:24,610 --> 00:54:25,900
OK, just holler if I missed.
1158
00:54:25,900 --> 00:54:28,012
So let's now actually take this one step further.
1159
00:54:28,012 --> 00:54:30,970
Suppose that we want to write a program that maybe implements something
1160
00:54:30,970 --> 00:54:35,110
a little more like a phone book that has both names and numbers and not
1161
00:54:35,110 --> 00:54:37,030
just integers but actual phone numbers.
1162
00:54:37,030 --> 00:54:39,340
Well, we could escalate things like this.
1163
00:54:39,340 --> 00:54:42,850
We could now have two arrays-- one called names, one called numbers.
1164
00:54:42,850 --> 00:54:45,370
And I'm going to use strings for the numbers now,
1165
00:54:45,370 --> 00:54:48,010
the phone numbers, because in most communities,
1166
00:54:48,010 --> 00:54:51,730
phone numbers might have dashes, pluses, parentheses, so something
1167
00:54:51,730 --> 00:54:55,030
that really looks more like a string even though we call it a phone number.
1168
00:54:55,030 --> 00:54:58,580
Probably don't want to use an int lest we throw away those kinds of details.
1169
00:54:58,580 --> 00:55:03,160
So let me switch back to BS Code here, and let's do one more program, this one
1170
00:55:03,160 --> 00:55:05,033
in a file called phonebook.c.
1171
00:55:05,033 --> 00:55:06,700
And now let me go ahead and do the same.
1172
00:55:06,700 --> 00:55:08,380
Let me include cs50.h.
1173
00:55:08,380 --> 00:55:14,320
Let me include standardio.h, and let me include string.h.
1174
00:55:14,320 --> 00:55:17,380
I'm going to again do int main void.
1175
00:55:17,380 --> 00:55:20,710
And then inside of my program, I'm going to give myself two arrays--
1176
00:55:20,710 --> 00:55:22,510
the efficient way this time.
1177
00:55:22,510 --> 00:55:25,000
String names will be just two of us this time.
1178
00:55:25,000 --> 00:55:28,390
How about Carter and me?
1179
00:55:28,390 --> 00:55:30,880
And then I'll give myself-- oops, typo already.
1180
00:55:30,880 --> 00:55:33,790
If I want this to be an array, I don't have to specify the number.
1181
00:55:33,790 --> 00:55:35,380
The compiler can count for me.
1182
00:55:35,380 --> 00:55:37,300
But I do need the square brackets.
1183
00:55:37,300 --> 00:55:43,750
Then for numbers, I'm again going to use a string array specifying with
1184
00:55:43,750 --> 00:55:49,510
the curly braces that how about Carter can be at 1-617-495-1000.
1185
00:55:49,510 --> 00:55:51,280
And how about my own number here--
1186
00:55:51,280 --> 00:55:55,000
1-949-468-- oh pattern appearing--
1187
00:55:55,000 --> 00:55:57,760
2750 will be mine.
1188
00:55:57,760 --> 00:55:58,600
Why mine?
1189
00:55:58,600 --> 00:56:00,530
Well, I'm just kind of lined things up.
1190
00:56:00,530 --> 00:56:03,460
So Carter's number is apparently first in this array,
1191
00:56:03,460 --> 00:56:06,800
and I'm claiming that he'll be first in this array, respectively.
1192
00:56:06,800 --> 00:56:09,610
I, David, will be the first-- the second in the names array
1193
00:56:09,610 --> 00:56:12,267
and second in the numbers array.
1194
00:56:12,267 --> 00:56:15,100
If you want to have a little fun with programming, feel free to text
1195
00:56:15,100 --> 00:56:17,270
or call me some time at that number.
1196
00:56:17,270 --> 00:56:20,950
So now let's actually use this data in some way.
1197
00:56:20,950 --> 00:56:24,368
Let's go ahead and actually search for my own name and number here.
1198
00:56:24,368 --> 00:56:24,910
So let me do.
1199
00:56:24,910 --> 00:56:27,490
For int i, get 0.
1200
00:56:27,490 --> 00:56:32,090
There's two of us this time-- so i less than 2 and then i plus plus as before.
1201
00:56:32,090 --> 00:56:34,480
And now I'm going to practice what I preached earlier,
1202
00:56:34,480 --> 00:56:38,440
and I'm going to use str compare to find my name in this case.
1203
00:56:38,440 --> 00:56:45,100
And I'm going to say if strcmp of names bracket i equals quote unquote David
1204
00:56:45,100 --> 00:56:48,740
and that equals 0, meaning they're the same,
1205
00:56:48,740 --> 00:56:51,610
then just as before, I'm going to go ahead and print something out.
1206
00:56:51,610 --> 00:56:53,320
But this time, I'm going to make the program more useful
1207
00:56:53,320 --> 00:56:55,100
and not just say found or not found.
1208
00:56:55,100 --> 00:56:59,050
Now I'm implementing a phone book, like the contacts app on iOS or Android.
1209
00:56:59,050 --> 00:57:02,380
So I'm going to say something like, quote unquote, found percent
1210
00:57:02,380 --> 00:57:08,830
s backslash n and then actually plug in numbers bracket i
1211
00:57:08,830 --> 00:57:12,370
to correspond to the current name bracket i.
1212
00:57:12,370 --> 00:57:14,225
And then I'll return 0 as before.
1213
00:57:14,225 --> 00:57:16,600
And then down here if we get all the way through the loop
1214
00:57:16,600 --> 00:57:20,120
and David's not there for some reason, I'm going to print as before not found
1215
00:57:20,120 --> 00:57:21,580
and then return 1.
1216
00:57:21,580 --> 00:57:26,590
So let me go ahead and compile this with make phone dot slash phonebook,
1217
00:57:26,590 --> 00:57:29,240
and it seems to have found the number.
1218
00:57:29,240 --> 00:57:33,130
So this code I'm going to claim is correct.
1219
00:57:33,130 --> 00:57:36,190
It's kind of stupid because I've just made a phone book or a contacts
1220
00:57:36,190 --> 00:57:37,690
app that only supports two people.
1221
00:57:37,690 --> 00:57:39,503
They're only going to be me and Carter.
1222
00:57:39,503 --> 00:57:41,920
This would be like downloading the contacts app on a phone
1223
00:57:41,920 --> 00:57:43,837
and you can only call two people in the world.
1224
00:57:43,837 --> 00:57:45,820
There's no ability to add names or edit things.
1225
00:57:45,820 --> 00:57:48,860
That, of course, could come later using get string or something else.
1226
00:57:48,860 --> 00:57:50,693
But for now for the sake of discussion, I've
1227
00:57:50,693 --> 00:57:53,450
just hardcoded two names and two numbers.
1228
00:57:53,450 --> 00:57:56,290
But for what it does, I claim this is correct.
1229
00:57:56,290 --> 00:57:59,590
It's going to find me and print out my number.
1230
00:57:59,590 --> 00:58:01,480
But is it well-designed?
1231
00:58:01,480 --> 00:58:05,560
Let's start to now consider if we're not just using arrays,
1232
00:58:05,560 --> 00:58:07,852
but are we using them, well?
1233
00:58:07,852 --> 00:58:10,810
We started to use them last week, but are we using them well this week?
1234
00:58:10,810 --> 00:58:14,680
And what might I even mean by using an array well or designing
1235
00:58:14,680 --> 00:58:16,640
this program well?
1236
00:58:16,640 --> 00:58:21,940
Any critiques or concerns with why this might not
1237
00:58:21,940 --> 00:58:24,100
be the best road for us to be going down when
1238
00:58:24,100 --> 00:58:28,540
I want to implement something like a phone book with pieces of information?
1239
00:58:28,540 --> 00:58:31,540
It seems all too vulnerable to just mistakes.
1240
00:58:31,540 --> 00:58:35,620
For instance, if I screw up the actual number of names in the names array
1241
00:58:35,620 --> 00:58:40,190
such that it's now more or less than is in the numbers array or vise versa,
1242
00:58:40,190 --> 00:58:43,420
it feels like there's not a tight relationship between those pieces
1243
00:58:43,420 --> 00:58:47,140
of data, and it's just sort of is trusting on the honor system
1244
00:58:47,140 --> 00:58:53,440
that any time I use names bracket i that it lines up with numbers bracket i.
1245
00:58:53,440 --> 00:58:54,160
And that's fine.
1246
00:58:54,160 --> 00:58:56,080
If you're the one writing the code, you're probably
1247
00:58:56,080 --> 00:58:57,640
not going to really screw this up.
1248
00:58:57,640 --> 00:59:00,265
But if you start collaborating with someone else or the program
1249
00:59:00,265 --> 00:59:03,700
is getting much, much longer, the odds that you or your colleagues
1250
00:59:03,700 --> 00:59:08,110
remember that you're sort of just trusting that names and numbers line up
1251
00:59:08,110 --> 00:59:10,420
like this is going to fail eventually.
1252
00:59:10,420 --> 00:59:13,540
Someone's not going to realize that, and just, the code is going to break.
1253
00:59:13,540 --> 00:59:16,540
And you're going to start out putting the wrong numbers for names, which
1254
00:59:16,540 --> 00:59:20,710
is to say it'd be much nicer if we could somehow couple these two
1255
00:59:20,710 --> 00:59:24,850
pieces of data, names and numbers, a little more tightly together so
1256
00:59:24,850 --> 00:59:28,900
that you're not just trusting that these two independent variables, names
1257
00:59:28,900 --> 00:59:32,630
and numbers, have this kind of relationship with themselves.
1258
00:59:32,630 --> 00:59:35,200
So let's consider how we might solve this.
1259
00:59:35,200 --> 00:59:39,400
A new feature today that we'll introduce is generally known as a data structure.
1260
00:59:39,400 --> 00:59:43,280
In C, we have the ability to invent our own data types,
1261
00:59:43,280 --> 00:59:46,555
if you will-- data types that the authors of C decades
1262
00:59:46,555 --> 00:59:48,430
ago just didn't envision or just didn't think
1263
00:59:48,430 --> 00:59:51,880
were necessary because we can implement them ourselves-- similar to Scratch
1264
00:59:51,880 --> 00:59:54,280
just as you could create custom puzzle pieces,
1265
00:59:54,280 --> 00:59:56,360
or in C, you can create custom functions.
1266
00:59:56,360 --> 01:00:00,760
So in C, can you create your own types of data
1267
01:00:00,760 --> 01:00:04,900
that go beyond the built in ints and floats and even strings?
1268
01:00:04,900 --> 01:00:10,540
You can make, for instance, a person data type or a candidate data
1269
01:00:10,540 --> 01:00:13,090
type in the context of elections or a person data type
1270
01:00:13,090 --> 01:00:15,950
more generically that might have a name and a number.
1271
01:00:15,950 --> 01:00:17,720
So how might we do this?
1272
01:00:17,720 --> 01:00:23,470
Well, let me go here and propose that if we want to define a person,
1273
01:00:23,470 --> 01:00:26,920
wouldn't it be nice if we could have a person data type,
1274
01:00:26,920 --> 01:00:29,470
and then we could have an array called people?
1275
01:00:29,470 --> 01:00:32,770
And maybe that array is our only array with two things
1276
01:00:32,770 --> 01:00:35,200
in it, two persons in it.
1277
01:00:35,200 --> 01:00:38,140
But somehow, those data types, these persons,
1278
01:00:38,140 --> 01:00:41,158
would have both a name and a number associated with them.
1279
01:00:41,158 --> 01:00:42,700
So we don't need two separate arrays.
1280
01:00:42,700 --> 01:00:47,240
We need one array of persons, a brand new data type.
1281
01:00:47,240 --> 01:00:48,890
So how might we do this?
1282
01:00:48,890 --> 01:00:51,070
Well, if we want every person in the world
1283
01:00:51,070 --> 01:00:53,320
or in this program to have a name and a number,
1284
01:00:53,320 --> 01:00:56,380
we literally right out first those two data types.
1285
01:00:56,380 --> 01:00:57,860
Give me a string called name.
1286
01:00:57,860 --> 01:01:00,850
Give me a string called number semicolon, after each.
1287
01:01:00,850 --> 01:01:04,030
And then we wrap that, those two lines of code,
1288
01:01:04,030 --> 01:01:06,730
with this syntax, which at first glance is a little cryptic.
1289
01:01:06,730 --> 01:01:08,410
It's a lot of words all of a sudden.
1290
01:01:08,410 --> 01:01:12,730
But typedef is a new keyword today that defines a new data type.
1291
01:01:12,730 --> 01:01:16,510
This is the C key word that lets you create your own data
1292
01:01:16,510 --> 01:01:18,100
type for the very first time.
1293
01:01:18,100 --> 01:01:22,840
Struct is another related key word that tells the compiler that this isn't just
1294
01:01:22,840 --> 01:01:27,310
a simple data type, like an int or a float renamed or something like that.
1295
01:01:27,310 --> 01:01:28,940
It actually is a structure.
1296
01:01:28,940 --> 01:01:32,710
It's got some dimensions to it, like two things in it or three things in it
1297
01:01:32,710 --> 01:01:35,260
or even 50 things inside of it.
1298
01:01:35,260 --> 01:01:39,310
The last word down here is the name that you want to give your data type,
1299
01:01:39,310 --> 01:01:41,980
and it weirdly goes after the curly braces.
1300
01:01:41,980 --> 01:01:45,760
But this is how you invent a data type called person.
1301
01:01:45,760 --> 01:01:48,670
And what this code is implying is that henceforth,
1302
01:01:48,670 --> 01:01:53,980
the compiler clang will know that a person is composed of a name that's
1303
01:01:53,980 --> 01:01:56,770
a string and a number that's a string.
1304
01:01:56,770 --> 01:01:59,770
And you don't have to worry about having multiple arrays now.
1305
01:01:59,770 --> 01:02:03,950
You can just have an array of people moving forward.
1306
01:02:03,950 --> 01:02:05,918
So how can we go about using this?
1307
01:02:05,918 --> 01:02:07,960
Well, let me go back to my code from before where
1308
01:02:07,960 --> 01:02:09,340
I was implementing a phone book.
1309
01:02:09,340 --> 01:02:11,715
And why don't we enhance the phone book code a little bit
1310
01:02:11,715 --> 01:02:14,230
by borrowing some of that new syntax?
1311
01:02:14,230 --> 01:02:16,720
Let me go to the top of my program above main
1312
01:02:16,720 --> 01:02:19,660
and define a type that's a structure or a data
1313
01:02:19,660 --> 01:02:24,500
structure that has a name inside of it and that has a number inside of it.
1314
01:02:24,500 --> 01:02:28,150
And the name of this new structure again is going to be called person.
1315
01:02:28,150 --> 01:02:33,550
Inside of my code now, let me go ahead and delete this old stuff temporarily.
1316
01:02:33,550 --> 01:02:37,510
Let me give myself an array called people of size 2.
1317
01:02:37,510 --> 01:02:40,997
And I'm going to use the non-terse way to do this.
1318
01:02:40,997 --> 01:02:42,580
I'm not going to use the curly braces.
1319
01:02:42,580 --> 01:02:47,140
I'm going to more pedantic spell out what I want in this array of size 2
1320
01:02:47,140 --> 01:02:50,860
at location 0, which is the first person in an array
1321
01:02:50,860 --> 01:02:52,690
because you always start counting at 0.
1322
01:02:52,690 --> 01:02:56,500
I'm going to give that person a name of quote unquote Carter.
1323
01:02:56,500 --> 01:02:59,980
And the dot is admittedly one new piece of syntax today too.
1324
01:02:59,980 --> 01:03:02,410
The dot means go inside of that structure
1325
01:03:02,410 --> 01:03:06,190
and access the variable called name and give it this value Carter.
1326
01:03:06,190 --> 01:03:08,350
Similarly, if I'm going to give Carter a number,
1327
01:03:08,350 --> 01:03:13,030
I can go into people bracket 0 dot number and give that the same thing
1328
01:03:13,030 --> 01:03:17,950
as before plus 1-617-495-1000.
1329
01:03:17,950 --> 01:03:20,230
And then I can do the same for myself here--
1330
01:03:20,230 --> 01:03:24,145
people bracket-- where should I go?
1331
01:03:24,145 --> 01:03:26,107
OK, one because again, two elements.
1332
01:03:26,107 --> 01:03:27,440
But we started counting at zero.
1333
01:03:27,440 --> 01:03:29,420
Bracket name equals quote unquote David.
1334
01:03:29,420 --> 01:03:34,340
And then lastly, people bracket 1 dot number equals quote unquote plus
1335
01:03:34,340 --> 01:03:40,370
1-949-468-2750.
1336
01:03:40,370 --> 01:03:43,250
So now if I scroll down here to my logic,
1337
01:03:43,250 --> 01:03:46,130
I don't think this part needs to change too much.
1338
01:03:46,130 --> 01:03:50,680
I'm still, for the sake of discussion, going to iterate 2 times from i
1339
01:03:50,680 --> 01:03:53,630
is 0 on up to but not through 2.
1340
01:03:53,630 --> 01:03:56,670
But I think this line of code needs to change.
1341
01:03:56,670 --> 01:04:04,910
How should I now refer to the i-th person's name as I iterate?
1342
01:04:04,910 --> 01:04:08,250
What should I compare quote unquote David to this time?
1343
01:04:08,250 --> 01:04:08,750
Let me see.
1344
01:04:08,750 --> 01:04:10,050
On the end here?
1345
01:04:10,050 --> 01:04:13,078
AUDIENCE: People bracket i dot name.
1346
01:04:13,078 --> 01:04:14,870
DAVID J. MALAN: Yeah, people bracket i dot name.
1347
01:04:14,870 --> 01:04:15,170
Why?
1348
01:04:15,170 --> 01:04:16,837
Because people is the name of the array.
1349
01:04:16,837 --> 01:04:20,480
Bracket i is the i-th person that we're iterating over in the current loop--
1350
01:04:20,480 --> 01:04:23,240
first zero, then one, maybe higher if it had more people.
1351
01:04:23,240 --> 01:04:26,570
Then dot is our new syntax for going inside of a data structure
1352
01:04:26,570 --> 01:04:29,870
and accessing a variable therein which in this case is name.
1353
01:04:29,870 --> 01:04:32,280
And so I can compare David just as before.
1354
01:04:32,280 --> 01:04:36,890
So it's a little more verbose, but now arguably this is a better program
1355
01:04:36,890 --> 01:04:42,470
because now these people are full fledged data types unto themselves.
1356
01:04:42,470 --> 01:04:44,810
There's no more honor system inside of my loop
1357
01:04:44,810 --> 01:04:47,102
that this is going to line up because in just a moment,
1358
01:04:47,102 --> 01:04:49,867
I'm going to fix this one last remnant of the previous version.
1359
01:04:49,867 --> 01:04:51,950
And if I can call back on you again, what should I
1360
01:04:51,950 --> 01:04:54,830
change numbers bracket i to this time?
1361
01:04:54,830 --> 01:05:00,760
AUDIENCE: [INAUDIBLE] dot number.
1362
01:05:00,760 --> 01:05:02,390
DAVID J. MALAN: Dot number, exactly.
1363
01:05:02,390 --> 01:05:04,940
So gone is the honor system that just assumes
1364
01:05:04,940 --> 01:05:08,060
that bracket i in this array lines up with bracket i in this other array.
1365
01:05:08,060 --> 01:05:08,810
Now why?
1366
01:05:08,810 --> 01:05:10,430
There's only one array.
1367
01:05:10,430 --> 01:05:11,990
It's an array called people.
1368
01:05:11,990 --> 01:05:14,060
The things it stores are persons.
1369
01:05:14,060 --> 01:05:15,800
A person has a name and a number.
1370
01:05:15,800 --> 01:05:18,217
And so even though it's kind of marginal admittedly given
1371
01:05:18,217 --> 01:05:21,050
that this is a short program and given that this kind of made things
1372
01:05:21,050 --> 01:05:23,300
look more complicated at first glance, we're
1373
01:05:23,300 --> 01:05:26,540
now laying the foundation for just a better design because you really
1374
01:05:26,540 --> 01:05:29,180
can't screw up now the association of names
1375
01:05:29,180 --> 01:05:32,780
with numbers because every person's name and number is, so to speak,
1376
01:05:32,780 --> 01:05:36,710
encapsulated inside of the same data type.
1377
01:05:36,710 --> 01:05:38,270
And that's a term of art in CS.
1378
01:05:38,270 --> 01:05:41,720
Encapsulation means to encapsulate-- that is, contain--
1379
01:05:41,720 --> 01:05:43,830
related pieces of information.
1380
01:05:43,830 --> 01:05:49,670
And thus, we have a person that encapsulates two other data types, name
1381
01:05:49,670 --> 01:05:50,450
and number.
1382
01:05:50,450 --> 01:05:52,310
And this just sets the foundation for all
1383
01:05:52,310 --> 01:05:55,190
of the cool stuff we've talked about and you use every day.
1384
01:05:55,190 --> 01:05:55,993
What is an image?
1385
01:05:55,993 --> 01:05:58,910
Well, recall that an image is a bunch of pixels or dots on the screen.
1386
01:05:58,910 --> 01:06:02,570
Every one of those dots has RGB values associated
1387
01:06:02,570 --> 01:06:04,430
with it-- red, green, and blue.
1388
01:06:04,430 --> 01:06:07,760
You could imagine now creating a structure in C probably where
1389
01:06:07,760 --> 01:06:11,540
maybe you have three values, three variables-- one called red,
1390
01:06:11,540 --> 01:06:13,400
one called green, one called blue.
1391
01:06:13,400 --> 01:06:15,980
And then you could name the thing not person but pixel.
1392
01:06:15,980 --> 01:06:19,910
And now you could store in C three different colors-- some amount of red,
1393
01:06:19,910 --> 01:06:23,978
some green, some blue-- and collectively treat it as the color of a pixel.
1394
01:06:23,978 --> 01:06:27,020
And you could imagine doing something similar perhaps for video or music.
1395
01:06:27,020 --> 01:06:30,440
Music, you might have three variables-- one for the musical note,
1396
01:06:30,440 --> 01:06:32,660
the duration, the loudness of it.
1397
01:06:32,660 --> 01:06:36,120
And you can imagine coming up with your own data type for music as well.
1398
01:06:36,120 --> 01:06:37,370
So this is a little low level.
1399
01:06:37,370 --> 01:06:39,920
We're just using like a familiar contacts application.
1400
01:06:39,920 --> 01:06:44,270
But we now have the way in code to express most any type of data
1401
01:06:44,270 --> 01:06:48,260
that we might want to implement or discuss ultimately.
1402
01:06:48,260 --> 01:06:53,510
So any questions now on struct or defining our own types,
1403
01:06:53,510 --> 01:06:58,110
the purposes for which are to use arrays but use them more responsibly
1404
01:06:58,110 --> 01:07:01,280
now in a better design but also to lay the foundation
1405
01:07:01,280 --> 01:07:05,880
for implementing cooler and cooler stuff per our week zero discussion?
1406
01:07:05,880 --> 01:07:06,380
Yeah.
1407
01:07:06,380 --> 01:07:07,713
AUDIENCE: What's the [INAUDIBLE]
1408
01:07:07,713 --> 01:07:10,713
DAVID J. MALAN: What's the difference between this and an object in an object
1409
01:07:10,713 --> 01:07:11,550
oriented language?
1410
01:07:11,550 --> 01:07:14,390
So slight side note, C is not object-oriented.
1411
01:07:14,390 --> 01:07:17,990
Languages like Java and C++ and others which you might have heard
1412
01:07:17,990 --> 01:07:21,200
of, programmed yourself, had friends program in, are object oriented
1413
01:07:21,200 --> 01:07:25,430
languages in those languages they have things called classes or objects which
1414
01:07:25,430 --> 01:07:26,450
are interrelated.
1415
01:07:26,450 --> 01:07:30,020
And objects can store not just data, like variables.
1416
01:07:30,020 --> 01:07:34,490
Objects can also store functions, and you can kind of sort of do this in C.
1417
01:07:34,490 --> 01:07:36,380
But it's not sort of conventional.
1418
01:07:36,380 --> 01:07:39,770
In C, you have data structures that store data.
1419
01:07:39,770 --> 01:07:44,780
In languages like Java and C+, you have objects that store data and functions
1420
01:07:44,780 --> 01:07:45,410
together.
1421
01:07:45,410 --> 01:07:47,790
Python is an object-oriented language as well.
1422
01:07:47,790 --> 01:07:51,270
So we'll see this issue in a few weeks, but let me wave my hands at it for now.
1423
01:07:51,270 --> 01:07:51,770
Yeah.
1424
01:07:51,770 --> 01:07:53,755
AUDIENCE: Could you use this [INAUDIBLE]??
1425
01:07:53,755 --> 01:07:54,380
DAVID J. MALAN: Yes.
1426
01:07:54,380 --> 01:07:57,020
Could you use this struct to redefine how an int is defined?
1427
01:07:57,020 --> 01:07:58,250
Short answer, yes.
1428
01:07:58,250 --> 01:08:01,700
We talked a couple of times now about integer overflow.
1429
01:08:01,700 --> 01:08:05,900
And most recently, you might have seen me mention the bug in iOS and Mac OS
1430
01:08:05,900 --> 01:08:08,480
that was literally related to an int overflow.
1431
01:08:08,480 --> 01:08:12,890
That's the result of ints only storing 4 bytes or 32 bits
1432
01:08:12,890 --> 01:08:15,800
or even as long as 64 bits or 8 bytes.
1433
01:08:15,800 --> 01:08:16,880
But it's finite.
1434
01:08:16,880 --> 01:08:19,520
But if you want to implement some financial software
1435
01:08:19,520 --> 01:08:22,100
or some scientific or mathematical software that
1436
01:08:22,100 --> 01:08:25,970
allows you to count way bigger than a typical int or a long,
1437
01:08:25,970 --> 01:08:29,135
you could imagine John coming up with your own structure.
1438
01:08:29,135 --> 01:08:31,260
And in fact, in some languages there is a structure
1439
01:08:31,260 --> 01:08:35,370
called big int, which allows you to express even bigger numbers.
1440
01:08:35,370 --> 01:08:35,970
How?
1441
01:08:35,970 --> 01:08:40,410
Well, maybe you store inside of a big ant an array of values.
1442
01:08:40,410 --> 01:08:43,492
And you somehow allow yourself to store more and more bits
1443
01:08:43,492 --> 01:08:45,450
based on how high you want to be able to count.
1444
01:08:45,450 --> 01:08:46,470
So in short, yes.
1445
01:08:46,470 --> 01:08:49,840
We now have the ability now to do most anything we want in the language
1446
01:08:49,840 --> 01:08:52,200
even if it's not built in for us.
1447
01:08:52,200 --> 01:08:53,100
Other questions.
1448
01:08:53,100 --> 01:08:58,762
AUDIENCE: [INAUDIBLE]
1449
01:08:58,762 --> 01:09:01,470
DAVID J. MALAN: Could you define a name and a number in the same line?
1450
01:09:01,470 --> 01:09:02,069
Sort of.
1451
01:09:02,069 --> 01:09:03,986
It starts to get syntactically a little messy,
1452
01:09:03,986 --> 01:09:07,229
so I did it a little more pedantic line by line.
1453
01:09:07,229 --> 01:09:07,870
Good question.
1454
01:09:07,870 --> 01:09:08,430
Over here.
1455
01:09:08,430 --> 01:09:12,910
AUDIENCE: [INAUDIBLE] function you use for the function
1456
01:09:12,910 --> 01:09:15,340
at the bottom of the [INAUDIBLE].
1457
01:09:15,340 --> 01:09:19,029
Could you do something like that [INAUDIBLE]??
1458
01:09:19,029 --> 01:09:21,399
DAVID J. MALAN: Prototypes-- you have to do A and C. You
1459
01:09:21,399 --> 01:09:25,029
have to define anything you're going to use or declare anything you're going
1460
01:09:25,029 --> 01:09:26,779
to use before you actually use it.
1461
01:09:26,779 --> 01:09:30,830
So it is deliberate that I put it at the top of my code in this file.
1462
01:09:30,830 --> 01:09:34,870
Otherwise, the compiler would not know what I mean by person when I first
1463
01:09:34,870 --> 01:09:37,750
use it here on what's line 14.
1464
01:09:37,750 --> 01:09:41,479
So it has to come first, or it has to be put into something like a header file
1465
01:09:41,479 --> 01:09:44,979
so that you include it at the very top of your code.
1466
01:09:44,979 --> 01:09:46,779
Other questions over here.
1467
01:09:46,779 --> 01:09:47,662
Yeah.
1468
01:09:47,662 --> 01:09:53,282
AUDIENCE: [INAUDIBLE]
1469
01:09:53,282 --> 01:09:54,990
DAVID J. MALAN: Yeah, good question, and we'll
1470
01:09:54,990 --> 01:09:58,500
come back to this later in the term when we talk about SQL, a database language,
1471
01:09:58,500 --> 01:10:00,630
and storing things in actual databases.
1472
01:10:00,630 --> 01:10:04,320
Generally speaking, even though we humans call things phone numbers,
1473
01:10:04,320 --> 01:10:07,800
or in the US, we have social security numbers, those types of numbers
1474
01:10:07,800 --> 01:10:12,060
often have other punctuation in it, like dashes, parentheses, pluses,
1475
01:10:12,060 --> 01:10:13,380
and so forth.
1476
01:10:13,380 --> 01:10:17,340
You could not store any of that syntax or that punctuation inside of an int.
1477
01:10:17,340 --> 01:10:18,880
You could only store numbers.
1478
01:10:18,880 --> 01:10:20,940
So one motivation for using a string is just
1479
01:10:20,940 --> 01:10:24,570
I can store whatever the human wanted me to store, including parentheses
1480
01:10:24,570 --> 01:10:25,690
and so forth.
1481
01:10:25,690 --> 01:10:29,428
Another reason for storing things as strings,
1482
01:10:29,428 --> 01:10:31,470
even if they look like numbers, is in the context
1483
01:10:31,470 --> 01:10:33,088
of zip codes in the United States.
1484
01:10:33,088 --> 01:10:34,380
Again, we'll come back to this.
1485
01:10:34,380 --> 01:10:36,690
But long story short-- years ago, actually--
1486
01:10:36,690 --> 01:10:39,732
I was using Microsoft Outlook for my email client.
1487
01:10:39,732 --> 01:10:41,190
And eventually I switched to Gmail.
1488
01:10:41,190 --> 01:10:42,900
And this is like 10 plus years ago now.
1489
01:10:42,900 --> 01:10:47,430
And Outlook at the time lets you export all of your contacts as a CSV file--
1490
01:10:47,430 --> 01:10:48,780
Comma Separated Values.
1491
01:10:48,780 --> 01:10:50,610
More on that in the weeks to come too.
1492
01:10:50,610 --> 01:10:52,402
And that just means I could download a text
1493
01:10:52,402 --> 01:10:55,710
file with all of my friends and family and their numbers inside of it.
1494
01:10:55,710 --> 01:10:59,812
Unfortunately, I open that same CSV file with Excel, I think, at the time
1495
01:10:59,812 --> 01:11:01,770
just to kind of spot check it and see if what's
1496
01:11:01,770 --> 01:11:03,400
in there was what it was expected.
1497
01:11:03,400 --> 01:11:06,870
And I must have instinctively hit, like, Command or Control-S to save it.
1498
01:11:06,870 --> 01:11:09,900
And Excel at least has this habit of sort of reformatting your data.
1499
01:11:09,900 --> 01:11:12,630
If things look like numbers, it treats them as numbers.
1500
01:11:12,630 --> 01:11:14,040
And Apple Numbers does this too.
1501
01:11:14,040 --> 01:11:16,080
Google Spreadsheets does this to nowadays.
1502
01:11:16,080 --> 01:11:23,040
But long story short, I then imported my mildly saved CSV file into Gmail.
1503
01:11:23,040 --> 01:11:26,760
And now 10 plus years later, I'm still occasionally finding friends and family
1504
01:11:26,760 --> 01:11:32,640
members whose zip codes are in Cambridge, Massachusetts 2138,
1505
01:11:32,640 --> 01:11:36,300
which is missing the 0 because we here in Cambridge are 02138.
1506
01:11:36,300 --> 01:11:39,420
And that's because I treated or I let Excel
1507
01:11:39,420 --> 01:11:42,630
treat what looks like a number as an actual number or int,
1508
01:11:42,630 --> 01:11:45,540
and now leading zeros become a problem because mathematically, they
1509
01:11:45,540 --> 01:11:48,900
mean nothing, but in the mail system, they do--
1510
01:11:48,900 --> 01:11:50,070
sending envelopes and such.
1511
01:11:50,070 --> 01:11:51,653
All right, other final questions here.
1512
01:11:51,653 --> 01:11:54,660
AUDIENCE: [INAUDIBLE]
1513
01:11:54,660 --> 01:11:58,500
DAVID J. MALAN: Yeah, so could I have used a 2D or two dimensional
1514
01:11:58,500 --> 01:12:02,730
array to solve the problem earlier of having just one array?
1515
01:12:02,730 --> 01:12:06,750
Yes, but one, I would argue it's less readable, especially
1516
01:12:06,750 --> 01:12:08,640
as I get lots of names and numbers.
1517
01:12:08,640 --> 01:12:11,730
And two, that too is also kind of relying on the honor system.
1518
01:12:11,730 --> 01:12:14,940
It would be all too easy to omit some of the square brackets in the two
1519
01:12:14,940 --> 01:12:15,940
dimensional array.
1520
01:12:15,940 --> 01:12:20,010
So I would argue it too is not as good as introducing a struct.
1521
01:12:20,010 --> 01:12:21,180
More on that down the road.
1522
01:12:21,180 --> 01:12:26,070
Two dimensional arrays just means arrays of arrays, as you might infer.
1523
01:12:26,070 --> 01:12:28,080
All right, so now that we have this ability
1524
01:12:28,080 --> 01:12:32,210
to store different types of data like contacts in a phone book,
1525
01:12:32,210 --> 01:12:33,960
having names and addresses, let's actually
1526
01:12:33,960 --> 01:12:36,780
take a step back and consider how we might now
1527
01:12:36,780 --> 01:12:41,850
solve one of the original problems by actually sorting the information we're
1528
01:12:41,850 --> 01:12:45,930
given in advance and considering, per our discussion earlier, just how
1529
01:12:45,930 --> 01:12:48,900
costly, how time consuming is that because that might tip
1530
01:12:48,900 --> 01:12:53,010
the scales in favor of sorting, then searching, or maybe just
1531
01:12:53,010 --> 01:12:54,960
not sorting and only searching.
1532
01:12:54,960 --> 01:12:58,470
It'll give us a sense of just how expensive, so to speak,
1533
01:12:58,470 --> 01:13:00,345
sorting something actually is.
1534
01:13:00,345 --> 01:13:02,220
Well, what's the formulation of this problem?
1535
01:13:02,220 --> 01:13:03,780
It's the same thing as week zero.
1536
01:13:03,780 --> 01:13:04,950
We've got input to sort.
1537
01:13:04,950 --> 01:13:07,120
We want it to be output as sorted.
1538
01:13:07,120 --> 01:13:10,560
So for instance, if we're taking unsorted input as input,
1539
01:13:10,560 --> 01:13:13,895
we want the sorted output as the result. More concretely,
1540
01:13:13,895 --> 01:13:15,270
if we've got numbers like these--
1541
01:13:15,270 --> 01:13:20,100
63852741, which are just randomly arranged numbers--
1542
01:13:20,100 --> 01:13:24,510
we want to get back out 12345678.
1543
01:13:24,510 --> 01:13:26,440
So we just want those things to be sorted.
1544
01:13:26,440 --> 01:13:28,470
So again, inside of the black box here is
1545
01:13:28,470 --> 01:13:33,460
going to be one or more algorithms that actually gets this job done.
1546
01:13:33,460 --> 01:13:35,680
So how might we go about doing this?
1547
01:13:35,680 --> 01:13:39,180
Well, just to vary things a bit more, I think we have a chance here
1548
01:13:39,180 --> 01:13:41,580
for a bit more audience participation.
1549
01:13:41,580 --> 01:13:43,790
But this time, we need eight people if we may.
1550
01:13:43,790 --> 01:13:46,290
All of you have to be comfortable appearing on the internet.
1551
01:13:46,290 --> 01:13:49,165
OK, so this is actually quite convenient that you're all quite close.
1552
01:13:49,165 --> 01:13:52,752
How about 1, 2, 3, 4, 5, 6, 7--
1553
01:13:52,752 --> 01:13:57,060
oh, OK, and someone volunteering their friend-- number eight.
1554
01:13:57,060 --> 01:13:58,020
Come on down.
1555
01:13:58,020 --> 01:13:58,875
Come on down.
1556
01:13:58,875 --> 01:14:00,750
And if you could, I'm going to set things up.
1557
01:14:00,750 --> 01:14:03,630
If you all could join Valerie, my colleague over there,
1558
01:14:03,630 --> 01:14:08,910
to give you a prop to use here, we'll go ahead in just a moment
1559
01:14:08,910 --> 01:14:11,835
and try to find some numbers at hand.
1560
01:14:11,835 --> 01:14:15,180
1561
01:14:15,180 --> 01:14:20,820
In just a moment, each of our volunteers is going to be representing an integer.
1562
01:14:20,820 --> 01:14:25,380
And that integer is initially going to be in unsorted order.
1563
01:14:25,380 --> 01:14:29,100
And I claim that using an algorithm, step by step instructions,
1564
01:14:29,100 --> 01:14:33,820
we can probably sort these folks in at least a couple of different ways.
1565
01:14:33,820 --> 01:14:38,430
So they're in wardrobe right now just getting their very own Harvard T-shirts
1566
01:14:38,430 --> 01:14:42,945
with a Jersey number on it, which will then represent an element of our array.
1567
01:14:42,945 --> 01:14:46,650
1568
01:14:46,650 --> 01:14:51,270
Give us just a moment to finish getting the attire ready.
1569
01:14:51,270 --> 01:14:56,040
They're being handed a shirt and a number.
1570
01:14:56,040 --> 01:14:58,620
And let me ask the audience for just a moment.
1571
01:14:58,620 --> 01:15:02,760
As we have these numbers up here on the screen, these numbers too are unsorted.
1572
01:15:02,760 --> 01:15:04,353
They're just in random order.
1573
01:15:04,353 --> 01:15:05,520
And let me ask the audience.
1574
01:15:05,520 --> 01:15:10,980
How would you go about sorting these eight numbers on the screen?
1575
01:15:10,980 --> 01:15:12,930
How would you go about sorting these?
1576
01:15:12,930 --> 01:15:14,138
Yeah, what are your thoughts?
1577
01:15:14,138 --> 01:15:20,327
AUDIENCE: [INAUDIBLE] the number at the end, the following number.
1578
01:15:20,327 --> 01:15:20,910
DAVID J. MALAN: OK.
1579
01:15:20,910 --> 01:15:24,547
AUDIENCE: The following number is bigger, then I keep it as it is.
1580
01:15:24,547 --> 01:15:25,130
DAVID J. MALAN: OK.
1581
01:15:25,130 --> 01:15:26,800
AUDIENCE: If not, then [INAUDIBLE].
1582
01:15:26,800 --> 01:15:29,352
DAVID J. MALAN: OK, so just to recap, you would start
1583
01:15:29,352 --> 01:15:30,810
with one of the numbers on the end.
1584
01:15:30,810 --> 01:15:33,210
You would look to the number to the right or to the left of it,
1585
01:15:33,210 --> 01:15:34,530
depending on which end you start at.
1586
01:15:34,530 --> 01:15:37,113
And if it's out of order, you would just start to swap things.
1587
01:15:37,113 --> 01:15:38,350
And that seems reasonable.
1588
01:15:38,350 --> 01:15:40,435
There's a whole bunch of mistakes to fix here
1589
01:15:40,435 --> 01:15:42,060
because things are pretty out of order.
1590
01:15:42,060 --> 01:15:45,060
But probably, if you start to solve small problems at a time,
1591
01:15:45,060 --> 01:15:47,910
you can achieve the end result of getting the whole thing sorted.
1592
01:15:47,910 --> 01:15:50,820
Other instincts, if you were just handed these numbers, how
1593
01:15:50,820 --> 01:15:54,077
you might go about sorting them?
1594
01:15:54,077 --> 01:15:54,660
How might you?
1595
01:15:54,660 --> 01:15:55,856
Yeah, in the back.
1596
01:15:55,856 --> 01:16:00,140
AUDIENCE: [INAUDIBLE]
1597
01:16:00,140 --> 01:16:01,920
DAVID J. MALAN: OK, I like that.
1598
01:16:01,920 --> 01:16:05,840
So to recap there, find the smallest one first and put it at the beginning,
1599
01:16:05,840 --> 01:16:07,070
if I heard you correctly.
1600
01:16:07,070 --> 01:16:10,342
And then presumably, you could do that again and again and again.
1601
01:16:10,342 --> 01:16:13,050
And that would seem to give you a couple of different algorithms.
1602
01:16:13,050 --> 01:16:15,710
And if you all are attired here--
1603
01:16:15,710 --> 01:16:18,800
do you want to come on up if you're ready?
1604
01:16:18,800 --> 01:16:20,660
We had some [? felt ?] volunteers too.
1605
01:16:20,660 --> 01:16:23,030
Come on over.
1606
01:16:23,030 --> 01:16:25,520
So if you all would like to line yourselves up
1607
01:16:25,520 --> 01:16:27,770
facing the audience in exactly this order-- so
1608
01:16:27,770 --> 01:16:30,170
whoever is number zero should be way over here,
1609
01:16:30,170 --> 01:16:33,710
and whoever is number five should be way over there.
1610
01:16:33,710 --> 01:16:36,920
Feel free to distance as much as you'd like and scooch a little with this way
1611
01:16:36,920 --> 01:16:38,480
if you could.
1612
01:16:38,480 --> 01:16:39,653
OK, all right.
1613
01:16:39,653 --> 01:16:40,820
And make a little more room.
1614
01:16:40,820 --> 01:16:41,870
So seven-- let's see.
1615
01:16:41,870 --> 01:16:44,170
5, 2, 7, 4--
1616
01:16:44,170 --> 01:16:45,090
AUDIENCE: [INAUDIBLE]
1617
01:16:45,090 --> 01:16:46,733
DAVID J. MALAN: 4, hopefully 1.
1618
01:16:46,733 --> 01:16:47,900
Yeah, keep them to the side.
1619
01:16:47,900 --> 01:16:51,210
OK, 1, 6, and there we go--
1620
01:16:51,210 --> 01:16:51,710
3.
1621
01:16:51,710 --> 01:16:52,543
Come on over, three.
1622
01:16:52,543 --> 01:16:53,690
I was looking for you.
1623
01:16:53,690 --> 01:16:57,075
All right, so here, we have an array of eight numbers--
1624
01:16:57,075 --> 01:16:58,200
eight integers if you will.
1625
01:16:58,200 --> 01:17:00,770
And do you want to each say a quick hello to the group?
1626
01:17:00,770 --> 01:17:02,750
AUDIENCE: Hello, I'm Quinn.
1627
01:17:02,750 --> 01:17:04,892
Go [INAUDIBLE].
1628
01:17:04,892 --> 01:17:05,850
AUDIENCE: Hi, everyone.
1629
01:17:05,850 --> 01:17:08,060
I'm [INAUDIBLE].
1630
01:17:08,060 --> 01:17:09,460
AUDIENCE: Hey, I'm Mitchell.
1631
01:17:09,460 --> 01:17:10,460
AUDIENCE: Hi, I'm Brett.
1632
01:17:10,460 --> 01:17:12,675
And also, go [INAUDIBLE].
1633
01:17:12,675 --> 01:17:13,550
AUDIENCE: I'm Hannah.
1634
01:17:13,550 --> 01:17:15,137
Go [INAUDIBLE].
1635
01:17:15,137 --> 01:17:16,220
AUDIENCE: Hi, I'm Matthew.
1636
01:17:16,220 --> 01:17:18,058
Go [INAUDIBLE]
1637
01:17:18,058 --> 01:17:19,100
AUDIENCE: Hi, I'm Miriam.
1638
01:17:19,100 --> 01:17:20,720
Go Winthrop.
1639
01:17:20,720 --> 01:17:22,905
AUDIENCE: Hi, I'm Celeste, and go Strauss.
1640
01:17:22,905 --> 01:17:23,780
DAVID J. MALAN: Wonderful.
1641
01:17:23,780 --> 01:17:26,930
Well, welcome all to the stage, and let's just visualize,
1642
01:17:26,930 --> 01:17:29,430
perhaps organically, how you eight would solve this problem.
1643
01:17:29,430 --> 01:17:32,540
So we currently have the numbers 0 through 7 quite out of order.
1644
01:17:32,540 --> 01:17:36,508
Could you go ahead and just yourselves from 0 through 7?
1645
01:17:36,508 --> 01:17:37,341
AUDIENCE: Thank you.
1646
01:17:37,341 --> 01:17:41,400
1647
01:17:41,400 --> 01:17:44,200
DAVID J. MALAN: OK, so what did they just do?
1648
01:17:44,200 --> 01:17:44,700
OK, yes.
1649
01:17:44,700 --> 01:17:46,260
First of all, yes, very well done.
1650
01:17:46,260 --> 01:17:50,030
1651
01:17:50,030 --> 01:17:53,037
How would you describe what they just did?
1652
01:17:53,037 --> 01:17:53,870
Well, let's do this.
1653
01:17:53,870 --> 01:17:56,060
Could you go back into that order on the screen--
1654
01:17:56,060 --> 01:18:00,320
52741630?
1655
01:18:00,320 --> 01:18:03,440
And could you do exactly what you just did again?
1656
01:18:03,440 --> 01:18:04,750
Sort yourselves.
1657
01:18:04,750 --> 01:18:08,040
1658
01:18:08,040 --> 01:18:09,030
All right, what did--
1659
01:18:09,030 --> 01:18:09,630
OK, yes.
1660
01:18:09,630 --> 01:18:10,470
Well done again.
1661
01:18:10,470 --> 01:18:14,190
1662
01:18:14,190 --> 01:18:17,480
All right, so admittedly, there's kind of a lot going on because each of you,
1663
01:18:17,480 --> 01:18:21,150
except number four, are doing something in parallel all at the same time.
1664
01:18:21,150 --> 01:18:23,390
And that's not really how a computer typically works.
1665
01:18:23,390 --> 01:18:26,900
Just like a computer can only look at one memory location, at one locker,
1666
01:18:26,900 --> 01:18:31,490
at a time, so can a computer only move one number at a time-- sort of opening
1667
01:18:31,490 --> 01:18:33,717
a locker, checking what's there, moving it as needed.
1668
01:18:33,717 --> 01:18:36,800
So let's try this more methodically based on the two audience suggestions.
1669
01:18:36,800 --> 01:18:42,350
If you all could randomize yourself again to 52741630,
1670
01:18:42,350 --> 01:18:44,762
let's take the second of those approaches first.
1671
01:18:44,762 --> 01:18:46,220
I'm going to look at these numbers.
1672
01:18:46,220 --> 01:18:48,870
And even though I as the human can obviously see all the numbers
1673
01:18:48,870 --> 01:18:50,930
and I just kind of have the intuition for how to fix this,
1674
01:18:50,930 --> 01:18:53,180
we got to be more methodical because eventually, we've
1675
01:18:53,180 --> 01:18:55,500
got to translate this to pseudo code and then code.
1676
01:18:55,500 --> 01:18:56,390
So let me see.
1677
01:18:56,390 --> 01:18:59,200
I'm going to search for, as you proposed, the smallest number.
1678
01:18:59,200 --> 01:19:00,950
And I'm going to start from left to right.
1679
01:19:00,950 --> 01:19:04,100
I could do it right to left, but left to right just tends to be convention.
1680
01:19:04,100 --> 01:19:07,080
All right, 5 at this moment is the smallest number I've seen.
1681
01:19:07,080 --> 01:19:09,818
So I'm going to remember that in a variable, if you will.
1682
01:19:09,818 --> 01:19:11,360
Now I'm going to take one more step--
1683
01:19:11,360 --> 01:19:11,930
2.
1684
01:19:11,930 --> 01:19:15,260
OK, 2 I'm going to compare to the variable in mind, obviously smaller.
1685
01:19:15,260 --> 01:19:19,160
I'm going to forget about 5 and only now remember 2 as the now smallest
1686
01:19:19,160 --> 01:19:19,820
elements.
1687
01:19:19,820 --> 01:19:23,090
7, nope-- I'm going to ignore that because it's not smaller than the 2
1688
01:19:23,090 --> 01:19:23,810
I have in mind.
1689
01:19:23,810 --> 01:19:27,170
4, 1-- OK, I'm going to update the variable in mind
1690
01:19:27,170 --> 01:19:28,430
because that's indeed smaller.
1691
01:19:28,430 --> 01:19:31,130
Now obviously, we the humans know that's getting pretty small.
1692
01:19:31,130 --> 01:19:32,180
Maybe it's the end.
1693
01:19:32,180 --> 01:19:35,630
I have to check all values to see if there's something even smaller
1694
01:19:35,630 --> 01:19:38,435
because 6 is not, 3 is not, but 0 is.
1695
01:19:38,435 --> 01:19:39,560
And what's your name again?
1696
01:19:39,560 --> 01:19:40,430
AUDIENCE: Celeste.
1697
01:19:40,430 --> 01:19:41,270
DAVID J. MALAN: Celeste.
1698
01:19:41,270 --> 01:19:47,960
Where should Celeste or number 0 go according to this proposed algorithm?
1699
01:19:47,960 --> 01:19:49,590
All right, I'm seeing a lot of this.
1700
01:19:49,590 --> 01:19:52,820
So at the beginning of the array, so before doing this for real,
1701
01:19:52,820 --> 01:19:54,560
let's have you pop out in front.
1702
01:19:54,560 --> 01:19:58,100
And could you all shift and make room for Celeste?
1703
01:19:58,100 --> 01:20:02,030
Is this a good idea to have all of them move or equivalently
1704
01:20:02,030 --> 01:20:04,550
move everything in the array to make room for Celeste
1705
01:20:04,550 --> 01:20:06,920
and number 0 over there?
1706
01:20:06,920 --> 01:20:07,670
No, probably not.
1707
01:20:07,670 --> 01:20:08,878
That felt like a lot of work.
1708
01:20:08,878 --> 01:20:12,200
And even though it happened pretty quickly, that's like seven steps
1709
01:20:12,200 --> 01:20:14,040
to happen just to move her in place.
1710
01:20:14,040 --> 01:20:16,580
So what would be marginally smarter perhaps--
1711
01:20:16,580 --> 01:20:18,960
a little more efficient, perhaps?
1712
01:20:18,960 --> 01:20:19,460
What's that?
1713
01:20:19,460 --> 01:20:19,910
AUDIENCE: Swapping.
1714
01:20:19,910 --> 01:20:20,490
DAVID J. MALAN: Swapping.
1715
01:20:20,490 --> 01:20:21,532
What do you mean by swap?
1716
01:20:21,532 --> 01:20:23,053
AUDIENCE: Replacing swaps.
1717
01:20:23,053 --> 01:20:24,470
DAVID J. MALAN: OK, replace two values.
1718
01:20:24,470 --> 01:20:28,090
So if you want to go back to where you were, one step Over, number 5,
1719
01:20:28,090 --> 01:20:29,560
he's not in the right place.
1720
01:20:29,560 --> 01:20:30,790
He's got to move eventually.
1721
01:20:30,790 --> 01:20:31,498
So you know what?
1722
01:20:31,498 --> 01:20:34,300
If that's where Celeste belongs, why don't we just swap 5 and 0?
1723
01:20:34,300 --> 01:20:36,925
So if you want to go ahead and exchange places with each other.
1724
01:20:36,925 --> 01:20:38,230
Notice what's just happened.
1725
01:20:38,230 --> 01:20:41,420
The problem I'm trying to solve has gotten smaller.
1726
01:20:41,420 --> 01:20:44,020
Instead of being size 8, now it's size 7.
1727
01:20:44,020 --> 01:20:47,050
Now granted, I moved 5 to another wrong location.
1728
01:20:47,050 --> 01:20:48,940
But if these numbers started off randomly,
1729
01:20:48,940 --> 01:20:52,790
it doesn't really matter where 5 goes until we get him into the right place.
1730
01:20:52,790 --> 01:20:54,040
So I think we've improved.
1731
01:20:54,040 --> 01:20:57,430
And now if I go back, my loop is sort of coming back around.
1732
01:20:57,430 --> 01:21:01,690
I can ignore Celeste and make this a seven step problem and not eight
1733
01:21:01,690 --> 01:21:03,430
because I know she's in the right place.
1734
01:21:03,430 --> 01:21:04,840
2 seems to be the smallest.
1735
01:21:04,840 --> 01:21:05,650
I'll remember that.
1736
01:21:05,650 --> 01:21:07,300
Not 7, not 4--
1737
01:21:07,300 --> 01:21:09,070
1 seems to be the smallest.
1738
01:21:09,070 --> 01:21:13,600
Now I know as a human this should be my next smallest.
1739
01:21:13,600 --> 01:21:18,100
But why, intuitively, should I keep going, do you think?
1740
01:21:18,100 --> 01:21:20,950
I can't sort of optimize as a human and just say, number 1,
1741
01:21:20,950 --> 01:21:22,760
let's get you into the right place.
1742
01:21:22,760 --> 01:21:24,610
I still want to check the whole array.
1743
01:21:24,610 --> 01:21:25,330
Why?
1744
01:21:25,330 --> 01:21:25,870
Yeah.
1745
01:21:25,870 --> 01:21:28,477
AUDIENCE: Perhaps there's another 1.
1746
01:21:28,477 --> 01:21:30,310
DAVID J. MALAN: Maybe there's another 1, and that
1747
01:21:30,310 --> 01:21:32,030
could be another problem altogether.
1748
01:21:32,030 --> 01:21:32,770
Other thoughts?
1749
01:21:32,770 --> 01:21:33,270
Yeah.
1750
01:21:33,270 --> 01:21:34,458
AUDIENCE: Could be another 0
1751
01:21:34,458 --> 01:21:36,250
DAVID J. MALAN: There could be another 0 indeed,
1752
01:21:36,250 --> 01:21:38,560
but I did go through the list once, right?
1753
01:21:38,560 --> 01:21:39,970
And I kind of know there isn't.
1754
01:21:39,970 --> 01:21:40,570
Your thoughts?
1755
01:21:40,570 --> 01:21:43,300
AUDIENCE: You don't know that every value is represented.
1756
01:21:43,300 --> 01:21:47,387
So maybe there's a [INAUDIBLE] You just don't know what kind of data
1757
01:21:47,387 --> 01:21:48,220
you're working with.
1758
01:21:48,220 --> 01:21:50,600
DAVID J. MALAN: Yeah, I don't necessarily know what is there.
1759
01:21:50,600 --> 01:21:54,940
And honestly, I only stipulated earlier that I'm using one variable in my mind.
1760
01:21:54,940 --> 01:21:58,180
I could use two and remember the two smallest elements I've seen.
1761
01:21:58,180 --> 01:22:00,130
I could use three variables, four.
1762
01:22:00,130 --> 01:22:03,440
But then I'm going to start to use a lot of space in addition to time.
1763
01:22:03,440 --> 01:22:06,880
So if I've stipulated that I only have one variable to solve this problem,
1764
01:22:06,880 --> 01:22:09,190
I don't know anything more about these elements
1765
01:22:09,190 --> 01:22:11,398
because the only thing I'm remembering at this moment
1766
01:22:11,398 --> 01:22:13,490
is number 1 is the smallest element I've seen.
1767
01:22:13,490 --> 01:22:14,290
So I'm going to keep going.
1768
01:22:14,290 --> 01:22:14,710
6?
1769
01:22:14,710 --> 01:22:15,070
Nope.
1770
01:22:15,070 --> 01:22:15,490
3?
1771
01:22:15,490 --> 01:22:15,790
Nope.
1772
01:22:15,790 --> 01:22:16,210
5?
1773
01:22:16,210 --> 01:22:16,710
Nope.
1774
01:22:16,710 --> 01:22:18,670
OK, I know that number 1, and your name was--
1775
01:22:18,670 --> 01:22:19,378
AUDIENCE: Hannah.
1776
01:22:19,378 --> 01:22:21,850
DAVID J. MALAN: --Hannah is the next smallest element.
1777
01:22:21,850 --> 01:22:24,340
I could have everyone move over to make room, but nope.
1778
01:22:24,340 --> 01:22:25,060
2?
1779
01:22:25,060 --> 01:22:26,980
You know, even though you're so close to where I want you,
1780
01:22:26,980 --> 01:22:29,170
I'm just going to keep it simple and swap you two.
1781
01:22:29,170 --> 01:22:31,570
So granted, I've made the problem a little worse.
1782
01:22:31,570 --> 01:22:35,200
But on average, I could get lucky too and just pop number 2
1783
01:22:35,200 --> 01:22:36,280
into the right place.
1784
01:22:36,280 --> 01:22:38,210
Now let me just accelerate this.
1785
01:22:38,210 --> 01:22:42,550
I can now ignore Hannah and Celeste, making the problem size 6 instead of 8.
1786
01:22:42,550 --> 01:22:43,870
So it's getting smaller.
1787
01:22:43,870 --> 01:22:45,190
7 is the smallest.
1788
01:22:45,190 --> 01:22:46,450
Nope, now 4 is--
1789
01:22:46,450 --> 01:22:47,920
2 is the smallest.
1790
01:22:47,920 --> 01:22:50,470
Still 2, still 2, still 2.
1791
01:22:50,470 --> 01:22:53,560
So let's go ahead and swap 2 and 7.
1792
01:22:53,560 --> 01:22:56,230
And now I'll just kind of orchestrate it verbally.
1793
01:22:56,230 --> 01:22:57,950
4, you're about to have to do something.
1794
01:22:57,950 --> 01:23:01,750
So we now have 4, 7, 6 3, 5.
1795
01:23:01,750 --> 01:23:04,480
OK, 3-- could you swap with 4?
1796
01:23:04,480 --> 01:23:07,675
All right, now we have 7, 6, 4, 5.
1797
01:23:07,675 --> 01:23:10,420
OK, 4, could you swap with 7?
1798
01:23:10,420 --> 01:23:13,180
Now we have 6, 7, 5.
1799
01:23:13,180 --> 01:23:15,430
5, could you swap with 6?
1800
01:23:15,430 --> 01:23:16,835
And now we have 7, 6.
1801
01:23:16,835 --> 01:23:18,160
6, would you swap at 7?
1802
01:23:18,160 --> 01:23:19,690
And now perhaps round of applause.
1803
01:23:19,690 --> 01:23:20,980
They've sorted themselves.
1804
01:23:20,980 --> 01:23:25,250
OK, hang on there one minute.
1805
01:23:25,250 --> 01:23:27,130
So we'll do this one other approach.
1806
01:23:27,130 --> 01:23:30,220
And my God, that felt so much slower than the first approach,
1807
01:23:30,220 --> 01:23:33,040
but that's, one, because I was kind of providing a long voiceover.
1808
01:23:33,040 --> 01:23:37,690
But two, we were doing one thing at a time whereas the first time, you guys
1809
01:23:37,690 --> 01:23:41,110
had the luxury of moving like eight different CPUs--
1810
01:23:41,110 --> 01:23:43,690
brains, if you will-- were all operating at the same time.
1811
01:23:43,690 --> 01:23:45,140
And computers like that exist.
1812
01:23:45,140 --> 01:23:47,893
If you have a computer with multiple cores, so to speak,
1813
01:23:47,893 --> 01:23:49,810
that's like having a computer that technically
1814
01:23:49,810 --> 01:23:51,490
can do multiple things at once.
1815
01:23:51,490 --> 01:23:54,470
But software typically, at least as we've written it thus far,
1816
01:23:54,470 --> 01:23:56,195
can only do one thing at a time.
1817
01:23:56,195 --> 01:23:58,070
So in a bit, we'll add up all of these steps.
1818
01:23:58,070 --> 01:23:59,862
But for now, let's take one other approach.
1819
01:23:59,862 --> 01:24:02,140
If you all could reorder yourselves like that--
1820
01:24:02,140 --> 01:24:06,940
52741630-- let's take the other approach that
1821
01:24:06,940 --> 01:24:10,610
was recommended by just fixing small problems and see where this gets us.
1822
01:24:10,610 --> 01:24:12,730
So we're back in the original order.
1823
01:24:12,730 --> 01:24:14,812
5 and 2 are clearly out of order.
1824
01:24:14,812 --> 01:24:15,520
So you know what?
1825
01:24:15,520 --> 01:24:17,350
Let's just bite this problem off now.
1826
01:24:17,350 --> 01:24:19,090
5 and 2, could you swap?
1827
01:24:19,090 --> 01:24:20,530
Now let me take a next step.
1828
01:24:20,530 --> 01:24:22,330
5 and 7, I think you're OK.
1829
01:24:22,330 --> 01:24:25,000
There's a gap, yes, but that might not be a big deal.
1830
01:24:25,000 --> 01:24:26,360
7 and 4-- problem.
1831
01:24:26,360 --> 01:24:28,420
Let's have you swap.
1832
01:24:28,420 --> 01:24:31,150
OK, 7 and 1, let's have you swap.
1833
01:24:31,150 --> 01:24:34,120
7 and 6, let's have you swap.
1834
01:24:34,120 --> 01:24:35,830
7 and 3, you swap.
1835
01:24:35,830 --> 01:24:37,690
7 and 0, you swap.
1836
01:24:37,690 --> 01:24:39,490
Now let me pause for just a moment.
1837
01:24:39,490 --> 01:24:40,790
Still not sorted.
1838
01:24:40,790 --> 01:24:42,470
So I'm clearly not done.
1839
01:24:42,470 --> 01:24:45,340
But have I improved the problem?
1840
01:24:45,340 --> 01:24:48,100
Right, I can't see-- like before, I can't optimize like
1841
01:24:48,100 --> 01:24:50,210
before because 0 is obviously not here.
1842
01:24:50,210 --> 01:24:54,160
So unless they're still way back there, so it's not like I've gone from 8 steps
1843
01:24:54,160 --> 01:24:56,300
to 7 to 6 just yet.
1844
01:24:56,300 --> 01:24:58,047
But have I made any improvements?
1845
01:24:58,047 --> 01:24:58,630
AUDIENCE: Yes.
1846
01:24:58,630 --> 01:24:59,255
DAVID J. MALAN: Yes.
1847
01:24:59,255 --> 01:25:01,480
In what sense is this improved?
1848
01:25:01,480 --> 01:25:06,050
What's a concrete thing you could point to is better?
1849
01:25:06,050 --> 01:25:06,550
Yeah.
1850
01:25:06,550 --> 01:25:08,110
AUDIENCE: Sorted the highest number.
1851
01:25:08,110 --> 01:25:10,690
DAVID J. MALAN: I've sorted the highest number, which is indeed 7.
1852
01:25:10,690 --> 01:25:15,400
And conversely, if you prefer, Celeste is one step closer to the beginning.
1853
01:25:15,400 --> 01:25:19,970
Now worst case, Celeste is going to have to move one step on each iteration.
1854
01:25:19,970 --> 01:25:22,540
So I might need to do this thing like n total times
1855
01:25:22,540 --> 01:25:24,215
to move her all the way over.
1856
01:25:24,215 --> 01:25:25,340
But that might work out OK.
1857
01:25:25,340 --> 01:25:26,510
Let me see.
1858
01:25:26,510 --> 01:25:27,980
2 and 5, you're good.
1859
01:25:27,980 --> 01:25:29,480
5 and 4, swap you.
1860
01:25:29,480 --> 01:25:31,370
5 and 1, let's swap you.
1861
01:25:31,370 --> 01:25:32,570
5 and 6, you're good.
1862
01:25:32,570 --> 01:25:34,550
6 and 3, let's swap you.
1863
01:25:34,550 --> 01:25:36,650
6 and 0, let's swap you.
1864
01:25:36,650 --> 01:25:38,240
6 and 7, you're good.
1865
01:25:38,240 --> 01:25:39,380
And I think now--
1866
01:25:39,380 --> 01:25:41,510
notice that the high values, as you noted,
1867
01:25:41,510 --> 01:25:44,150
are sort of bubbling up, if you will, to the end of the list.
1868
01:25:44,150 --> 01:25:45,530
2 and 4, you're good.
1869
01:25:45,530 --> 01:25:46,820
4 and 1, let's swap.
1870
01:25:46,820 --> 01:25:48,080
4 and 5, good.
1871
01:25:48,080 --> 01:25:49,430
5 and 3, swap.
1872
01:25:49,430 --> 01:25:51,780
5 and 0, swap.
1873
01:25:51,780 --> 01:25:53,370
5, 6, 7, of course, are good.
1874
01:25:53,370 --> 01:25:56,175
So now you can sort of see the problem resolving itself.
1875
01:25:56,175 --> 01:25:57,800
And let's just do this part now faster.
1876
01:25:57,800 --> 01:26:00,140
2 and 1, 2 and 4.
1877
01:26:00,140 --> 01:26:04,160
OK, 4 and 3, 4 and 0.
1878
01:26:04,160 --> 01:26:08,877
All right, now 1 and 2, 2, and 3, and 0, and good.
1879
01:26:08,877 --> 01:26:10,460
So we do have some optimization there.
1880
01:26:10,460 --> 01:26:12,860
We don't need to keep going because those all are sorted.
1881
01:26:12,860 --> 01:26:13,910
1 and 2, you're good.
1882
01:26:13,910 --> 01:26:16,730
2 and 0, all right, done.
1883
01:26:16,730 --> 01:26:20,330
1 and 0-- and big round of applause in closing.
1884
01:26:20,330 --> 01:26:24,890
OK, so thank you all.
1885
01:26:24,890 --> 01:26:27,140
We need the puppets back, but you can keep the shirts.
1886
01:26:27,140 --> 01:26:28,840
Thank you for volunteering here.
1887
01:26:28,840 --> 01:26:31,810
Feel free to make your way exits left or right.
1888
01:26:31,810 --> 01:26:33,760
And let's see if, thanks to our volunteers
1889
01:26:33,760 --> 01:26:40,000
here, we can't now formalize a little bit what we did on both passes here.
1890
01:26:40,000 --> 01:26:44,170
I claim that the first algorithm our volunteers kindly acted out
1891
01:26:44,170 --> 01:26:45,730
is what's called selection sort.
1892
01:26:45,730 --> 01:26:50,980
And as the name implied, we selected the smallest elements again and again
1893
01:26:50,980 --> 01:26:53,530
and again, working our way from left to right,
1894
01:26:53,530 --> 01:26:58,100
putting Celeste into the right place, and then continuing with everyone else.
1895
01:26:58,100 --> 01:27:01,060
So selection sort, as it's formally called,
1896
01:27:01,060 --> 01:27:04,000
can be described, for instance, with this pseudo code here--
1897
01:27:04,000 --> 01:27:06,910
4i from 0 to n minus 1.
1898
01:27:06,910 --> 01:27:08,350
And again, why this?
1899
01:27:08,350 --> 01:27:10,420
This is just how talk about arrays.
1900
01:27:10,420 --> 01:27:14,920
The left end is 0, the right end is n minus 1 where in this case,
1901
01:27:14,920 --> 01:27:16,720
n happened to be eight people.
1902
01:27:16,720 --> 01:27:18,700
So that's 0 through 7.
1903
01:27:18,700 --> 01:27:22,240
So for i from 0 to n minus 1, what did I do?
1904
01:27:22,240 --> 01:27:27,400
I found the smallest number between numbers bracket i and numbers bracket
1905
01:27:27,400 --> 01:27:28,810
n minus 1.
1906
01:27:28,810 --> 01:27:31,030
It's a little cryptic at first glance, but this
1907
01:27:31,030 --> 01:27:34,510
is just a very pseudo code-like way of saying
1908
01:27:34,510 --> 01:27:37,960
find the smallest element among all eight volunteers
1909
01:27:37,960 --> 01:27:43,120
because if i starts at 0 and n minus 1 never changes because there's always
1910
01:27:43,120 --> 01:27:47,230
8, 8 people, so 8 minus 1 is 7, this first
1911
01:27:47,230 --> 01:27:50,320
says find the smallest number between numbers bracket 0
1912
01:27:50,320 --> 01:27:53,350
and numbers bracket 7, if you will.
1913
01:27:53,350 --> 01:27:54,550
Then what do I do?
1914
01:27:54,550 --> 01:27:57,730
Swap the smallest number with numbers bracket i.
1915
01:27:57,730 --> 01:28:01,210
So that's how we got Celeste from over here all the way over there.
1916
01:28:01,210 --> 01:28:03,340
We just swapped those two values.
1917
01:28:03,340 --> 01:28:05,840
What then happens next in this pseudo code?
1918
01:28:05,840 --> 01:28:07,900
i, of course, goes from 0 to 1.
1919
01:28:07,900 --> 01:28:10,060
And that's the technical way of saying now
1920
01:28:10,060 --> 01:28:13,810
find the smallest element among the 7 remaining volunteers,
1921
01:28:13,810 --> 01:28:17,570
ignoring Celeste this time because she was already in the correct location.
1922
01:28:17,570 --> 01:28:19,960
So the problem went from size 8 to size 7.
1923
01:28:19,960 --> 01:28:23,500
And if we repeat, size 6, 5, 4, 3, 2, 1, until boom,
1924
01:28:23,500 --> 01:28:25,820
it's all done at the very end.
1925
01:28:25,820 --> 01:28:29,200
So this is just one way of expressing in pseudo code what
1926
01:28:29,200 --> 01:28:33,040
we did a little more organically and a formalization of what someone
1927
01:28:33,040 --> 01:28:35,420
volunteered out in the audience.
1928
01:28:35,420 --> 01:28:40,300
So if we consider, then, the efficiency of this algorithm,
1929
01:28:40,300 --> 01:28:42,730
maybe abstracting it away now as a bunch of doors
1930
01:28:42,730 --> 01:28:46,960
where the left most again is always 0, the right most is always n minus 1,
1931
01:28:46,960 --> 01:28:50,350
or equivalently, the second to last is n minus 2, the third to last
1932
01:28:50,350 --> 01:28:54,550
is n minus 3 where n might be 8 or anything else,
1933
01:28:54,550 --> 01:28:59,620
how do we think about or quantify the running time of selection sort?
1934
01:28:59,620 --> 01:29:02,230
Big O of what?
1935
01:29:02,230 --> 01:29:05,230
I mean, that was a lot of steps to be adding up.
1936
01:29:05,230 --> 01:29:09,130
It's probably more than n, right, because I went through the list
1937
01:29:09,130 --> 01:29:10,030
again and again.
1938
01:29:10,030 --> 01:29:14,740
It was like n plus n minus 1 plus n minus 2.
1939
01:29:14,740 --> 01:29:17,080
Any instincts here?
1940
01:29:17,080 --> 01:29:20,620
We got like the whole team in the orchestra now.
1941
01:29:20,620 --> 01:29:25,180
Let me propose we think about it this way with just a bit of formula, say.
1942
01:29:25,180 --> 01:29:29,350
So the first time, I had to look at n different volunteers.
1943
01:29:29,350 --> 01:29:33,130
n was 8 in this case, but generically, I looked at all eight numbers
1944
01:29:33,130 --> 01:29:35,230
in order to decide who was the smallest.
1945
01:29:35,230 --> 01:29:37,270
And sure enough, Celeste was at the very end.
1946
01:29:37,270 --> 01:29:39,103
She happened to be all the way to the right.
1947
01:29:39,103 --> 01:29:43,510
But I only knew that once I looked at all 8 or all n volunteers.
1948
01:29:43,510 --> 01:29:45,880
So that took me n steps first.
1949
01:29:45,880 --> 01:29:49,270
But once the list was swapped into the right place, then
1950
01:29:49,270 --> 01:29:53,290
my problem with size n minus 1, and I had n minus 1 other people
1951
01:29:53,290 --> 01:29:54,320
to look through.
1952
01:29:54,320 --> 01:29:55,870
So that's n minus 1 steps.
1953
01:29:55,870 --> 01:29:59,825
Then after that, it's n minus 2 plus n minus 3 plus n minus 4 plus dot dot
1954
01:29:59,825 --> 01:30:01,270
dot until I had one final step.
1955
01:30:01,270 --> 01:30:04,460
And it's obvious that I only have one human left to consider.
1956
01:30:04,460 --> 01:30:07,180
So we might wave our hands at this with a little ellipsis
1957
01:30:07,180 --> 01:30:10,400
and just say dot dot dot plus 1 for the final step.
1958
01:30:10,400 --> 01:30:11,890
Now what does this actually equal?
1959
01:30:11,890 --> 01:30:13,480
Well, this is where you might think back on, like,
1960
01:30:13,480 --> 01:30:15,400
your high school math or physics textbook that
1961
01:30:15,400 --> 01:30:18,640
has a little cheat sheet at the end that shows these kinds of recurrences.
1962
01:30:18,640 --> 01:30:21,490
That happens to work out mathematically to be
1963
01:30:21,490 --> 01:30:25,120
n times n plus 1 all divided by 2.
1964
01:30:25,120 --> 01:30:28,690
That's just what that recurrence, that series, actually adds up to.
1965
01:30:28,690 --> 01:30:31,300
So if you take on faith that that math is correct, let's
1966
01:30:31,300 --> 01:30:35,530
just now multiply this out mathematically.
1967
01:30:35,530 --> 01:30:41,920
That's n squared plus n divided by 2 or n squared divided by 2 plus n over 2.
1968
01:30:41,920 --> 01:30:44,890
And here's where we're starting to get annoyingly into the weeds.
1969
01:30:44,890 --> 01:30:50,080
Like, honestly, as n gets really large, like a million doors or integers
1970
01:30:50,080 --> 01:30:54,940
or a billion web pages in Google search engine, honestly, which of these terms
1971
01:30:54,940 --> 01:30:57,400
is going to matter the most mathematically
1972
01:30:57,400 --> 01:30:59,200
if n is a really big number?
1973
01:30:59,200 --> 01:31:01,840
Is n squared divided by 2 the dominant factor,
1974
01:31:01,840 --> 01:31:04,260
or is n divided by 2 the dominant factor?
1975
01:31:04,260 --> 01:31:05,995
AUDIENCE: n squared.
1976
01:31:05,995 --> 01:31:07,120
DAVID J. MALAN: Yeah, n squared.
1977
01:31:07,120 --> 01:31:09,290
I mean, no matter what n is-- and the bigger it is,
1978
01:31:09,290 --> 01:31:12,580
the bigger raising it to the power 2 is going to be.
1979
01:31:12,580 --> 01:31:13,330
So you know what?
1980
01:31:13,330 --> 01:31:16,270
Let's just wave our hands at this because at the end of the day,
1981
01:31:16,270 --> 01:31:19,780
as n gets really large, the dominant factor is indeed that first one.
1982
01:31:19,780 --> 01:31:20,530
And you know what?
1983
01:31:20,530 --> 01:31:24,290
Even the divided 2, as I claimed earlier with our two phone book examples, where
1984
01:31:24,290 --> 01:31:26,960
the two straight lines if you keep zooming out essentially
1985
01:31:26,960 --> 01:31:31,760
looked the same when n is large enough, let's just call this on the order of n
1986
01:31:31,760 --> 01:31:32,490
squared.
1987
01:31:32,490 --> 01:31:37,130
So that is to say a computer scientist would describe bubble sort as taking
1988
01:31:37,130 --> 01:31:39,860
on the order of n squared steps.
1989
01:31:39,860 --> 01:31:41,510
That's an oversimplification.
1990
01:31:41,510 --> 01:31:44,600
If we really added it up, it's actually this many steps-- n
1991
01:31:44,600 --> 01:31:46,610
squared divided by 2 plus n over 2.
1992
01:31:46,610 --> 01:31:50,420
But again, if we want to just be able to generally compare two algorithms'
1993
01:31:50,420 --> 01:31:53,810
performance, I think it's going to suffice if we look at that highest
1994
01:31:53,810 --> 01:31:59,720
order term to get a sense of what the algorithm feels like, if you will,
1995
01:31:59,720 --> 01:32:02,540
or what it even looks like graphically.
1996
01:32:02,540 --> 01:32:06,170
All right, so with that said, we might describe bubble sort
1997
01:32:06,170 --> 01:32:07,790
as being in big O--
1998
01:32:07,790 --> 01:32:11,700
sorry, selection sort as being in big O of n squared.
1999
01:32:11,700 --> 01:32:17,030
But what if we consider now the best case scenario-- an opportunity
2000
01:32:17,030 --> 01:32:19,070
to talk about a lower bound?
2001
01:32:19,070 --> 01:32:23,305
In the best case, how many steps does selection sort take?
2002
01:32:23,305 --> 01:32:24,680
Well, here, we need some context.
2003
01:32:24,680 --> 01:32:27,320
Like, what does it mean to be the best case or the worst case
2004
01:32:27,320 --> 01:32:29,090
when it comes to sorting?
2005
01:32:29,090 --> 01:32:32,600
Like, what could you imagine meaning the best possible scenario when you're
2006
01:32:32,600 --> 01:32:35,651
trying to sort a bunch of numbers?
2007
01:32:35,651 --> 01:32:37,100
I got the whole crew here again.
2008
01:32:37,100 --> 01:32:37,400
Yeah.
2009
01:32:37,400 --> 01:32:39,050
AUDIENCE: They would already be sorted.
2010
01:32:39,050 --> 01:32:40,370
DAVID J. MALAN: All right, they're already sorted, right?
2011
01:32:40,370 --> 01:32:44,030
I can't really imagine a better scenario than I have to sort some numbers,
2012
01:32:44,030 --> 01:32:46,040
but they're already sorted for me.
2013
01:32:46,040 --> 01:32:51,170
But does this algorithm leverage that fact in practice?
2014
01:32:51,170 --> 01:32:54,110
Even if all of our humans had lined up from 0 to 7,
2015
01:32:54,110 --> 01:32:56,930
I'm pretty sure I would have pretty naively started here.
2016
01:32:56,930 --> 01:32:58,670
And yes, Celeste happens to be here.
2017
01:32:58,670 --> 01:33:03,375
But I only know she needs to be here once I've looked at all eight people.
2018
01:33:03,375 --> 01:33:06,000
And then I would have realized, well, that was a waste of time.
2019
01:33:06,000 --> 01:33:07,280
I can leave Celeste be.
2020
01:33:07,280 --> 01:33:09,500
But then what would I have done?
2021
01:33:09,500 --> 01:33:12,860
I would have ignored her position because we've solved one problem.
2022
01:33:12,860 --> 01:33:16,260
I would have done the same thing now for seven people, then six people.
2023
01:33:16,260 --> 01:33:19,320
So every time I walk through, I'm not doing much useful work.
2024
01:33:19,320 --> 01:33:21,860
But I am doing those comparisons because I
2025
01:33:21,860 --> 01:33:25,500
don't know until I do the work that the people were in the right order.
2026
01:33:25,500 --> 01:33:30,440
So this would seem to imply that the omega notation, the best case
2027
01:33:30,440 --> 01:33:33,740
scenario, even, a lower bound on the running time would be what, then?
2028
01:33:33,740 --> 01:33:35,840
AUDIENCE: [INAUDIBLE]
2029
01:33:35,840 --> 01:33:37,470
DAVID J. MALAN: A little louder?
2030
01:33:37,470 --> 01:33:38,410
AUDIENCE: N squared.
2031
01:33:38,410 --> 01:33:40,250
DAVID J. MALAN: It's still going to be n squared,
2032
01:33:40,250 --> 01:33:45,820
in fact, because the code I'm giving myself doesn't leverage or benefit
2033
01:33:45,820 --> 01:33:50,260
from any of that scenario because it just mindlessly continues
2034
01:33:50,260 --> 01:33:51,770
to do this again and again.
2035
01:33:51,770 --> 01:33:57,010
So in this case, yes, I would claim that the omega notation for selection sort
2036
01:33:57,010 --> 01:33:58,830
is also big O of n squared.
2037
01:33:58,830 --> 01:34:00,580
So those are the kinds of numbers to beat.
2038
01:34:00,580 --> 01:34:03,760
It seems like the upper bound and lower bound of selection
2039
01:34:03,760 --> 01:34:06,130
sort are indeed n squared.
2040
01:34:06,130 --> 01:34:08,380
And so we can also describe selection sort, therefore,
2041
01:34:08,380 --> 01:34:09,717
as being in theta of n squared.
2042
01:34:09,717 --> 01:34:12,550
That's the first algorithm we've had the chance to describe that in,
2043
01:34:12,550 --> 01:34:14,650
which is to say that it's kind of slow.
2044
01:34:14,650 --> 01:34:16,595
I mean, maybe other algorithms are slower,
2045
01:34:16,595 --> 01:34:18,220
but this isn't the best starting point.
2046
01:34:18,220 --> 01:34:19,370
Can we do better?
2047
01:34:19,370 --> 01:34:22,930
Well, there's a reason that I guided us to doing the second algorithm second.
2048
01:34:22,930 --> 01:34:25,430
Even though you verbally proposed them in a different order,
2049
01:34:25,430 --> 01:34:28,600
this second algorithm we did is generally known as bubble sort.
2050
01:34:28,600 --> 01:34:31,210
And I deliberately used that word a bit ago,
2051
01:34:31,210 --> 01:34:34,930
saying the big values are bubbling their way up to the right
2052
01:34:34,930 --> 01:34:37,990
to kind of capture the fact that, indeed, this algorithm works
2053
01:34:37,990 --> 01:34:38,620
differently.
2054
01:34:38,620 --> 01:34:40,870
But let's consider if it's better or worse.
2055
01:34:40,870 --> 01:34:43,960
So here, we have pseudo code for bubble sort.
2056
01:34:43,960 --> 01:34:45,930
You could write this too in different ways.
2057
01:34:45,930 --> 01:34:48,550
But let's consider what we did on the stage.
2058
01:34:48,550 --> 01:34:51,850
We repeated the following n minus 1 times.
2059
01:34:51,850 --> 01:34:55,390
We initialized at least, even though I didn't verbalize it this way,
2060
01:34:55,390 --> 01:35:00,430
a variable like i from 0 to n minus 2, n minus 2.
2061
01:35:00,430 --> 01:35:01,810
And then I asked this question.
2062
01:35:01,810 --> 01:35:08,470
If numbers bracket i and numbers bracket i plus 1 are out of order,
2063
01:35:08,470 --> 01:35:10,250
then swap them.
2064
01:35:10,250 --> 01:35:12,580
So again, I just did it more intuitively by pointing,
2065
01:35:12,580 --> 01:35:14,950
but this would be a way, with a bit of pseudo code,
2066
01:35:14,950 --> 01:35:16,147
to describe what's going on.
2067
01:35:16,147 --> 01:35:18,730
But notice that I'm doing something a little differently here.
2068
01:35:18,730 --> 01:35:22,730
I'm iterating from if equals 0 to n minus 2.
2069
01:35:22,730 --> 01:35:23,230
Why?
2070
01:35:23,230 --> 01:35:26,740
Well, if I'm comparing two things, left hand and right hand,
2071
01:35:26,740 --> 01:35:28,690
I'd still want to start at 0.
2072
01:35:28,690 --> 01:35:31,450
But I don't want to go all the way to n minus 1
2073
01:35:31,450 --> 01:35:35,048
because then, I'd be going past the boundary of my array, which
2074
01:35:35,048 --> 01:35:35,590
would be bad.
2075
01:35:35,590 --> 01:35:38,080
I want to make sure that my left hand-- i, if you will--
2076
01:35:38,080 --> 01:35:42,820
stops at n minus 2 so that when I plus 1 in my pseudo code,
2077
01:35:42,820 --> 01:35:45,610
I'm looking at the last two elements, not the last element
2078
01:35:45,610 --> 01:35:46,900
and then pass the boundary.
2079
01:35:46,900 --> 01:35:48,733
That's actually a common programming mistake
2080
01:35:48,733 --> 01:35:50,620
that we'll undoubtedly soon make by going
2081
01:35:50,620 --> 01:35:52,880
beyond the boundaries of your array.
2082
01:35:52,880 --> 01:35:59,530
So this pseudo code, then, allows me to say compare every one again and again
2083
01:35:59,530 --> 01:36:01,730
and swap them if they're out of order.
2084
01:36:01,730 --> 01:36:06,190
Why do I repeat the whole thing n minus 1 times?
2085
01:36:06,190 --> 01:36:11,500
Like, why does it not suffice just to do this loop here?
2086
01:36:11,500 --> 01:36:14,980
Think what happened with Celeste.
2087
01:36:14,980 --> 01:36:19,480
Why do I repeat this whole thing n minus 1 times?
2088
01:36:19,480 --> 01:36:20,230
Yeah, in the back?
2089
01:36:20,230 --> 01:36:27,855
AUDIENCE: [INAUDIBLE]
2090
01:36:27,855 --> 01:36:30,230
DAVID J. MALAN: Indeed, and I think if I can recap accurately,
2091
01:36:30,230 --> 01:36:31,640
think back to Celeste again.
2092
01:36:31,640 --> 01:36:34,130
And I'm sorry to keep calling on you as our number 0.
2093
01:36:34,130 --> 01:36:37,670
Each time through bubble sort, she only moved one step.
2094
01:36:37,670 --> 01:36:41,280
And so in total, if there's n locations, at the end of the day,
2095
01:36:41,280 --> 01:36:45,930
she needs to move n minus 1 steps to get 0 all the way to where it needs to be.
2096
01:36:45,930 --> 01:36:50,330
And so this inner loop, if you will, where we're iterating using i,
2097
01:36:50,330 --> 01:36:52,700
that just fixes some of the problems.
2098
01:36:52,700 --> 01:36:56,150
But it doesn't fix all of the problems until we do that same logic again
2099
01:36:56,150 --> 01:36:57,660
and again and again.
2100
01:36:57,660 --> 01:37:01,410
And so how might we quantify the running time of this algorithm?
2101
01:37:01,410 --> 01:37:04,670
Well, one way to see it is to just literally look at the pseudo code.
2102
01:37:04,670 --> 01:37:09,170
The outer loop repeats n minus 1 times by definition.
2103
01:37:09,170 --> 01:37:10,490
It literally says that.
2104
01:37:10,490 --> 01:37:15,960
The inner loop, the for loop, also iterates n minus 1 times.
2105
01:37:15,960 --> 01:37:16,460
Why?
2106
01:37:16,460 --> 01:37:18,540
Because it's going from 0 to n minus 2.
2107
01:37:18,540 --> 01:37:22,580
And if that's hard to think about, that's the same thing is 1 to n minus 1
2108
01:37:22,580 --> 01:37:25,320
if you just add 1 to both ends of the formula.
2109
01:37:25,320 --> 01:37:29,910
So that means you're doing n minus 1 things n minus 1 times.
2110
01:37:29,910 --> 01:37:32,540
So I literally multiply how many times the outer loop
2111
01:37:32,540 --> 01:37:35,960
is running by how many times the inner loop is running, which gives me
2112
01:37:35,960 --> 01:37:39,320
sort of FOIL method n minus 1 squared.
2113
01:37:39,320 --> 01:37:41,300
And I could multiply that whole thing out.
2114
01:37:41,300 --> 01:37:44,330
Well, let's consider this just a little more methodically here.
2115
01:37:44,330 --> 01:37:48,382
If I have n minus 1 on the outer, n minus 1 on the inner--
2116
01:37:48,382 --> 01:37:49,590
let's go ahead and FOIL this.
2117
01:37:49,590 --> 01:37:53,180
So n squared minus n minus n plus 1, combine
2118
01:37:53,180 --> 01:37:56,570
like terms-- n squared minus 2n plus 1.
2119
01:37:56,570 --> 01:38:01,470
And now which of these terms is clearly going to be dominant, so to speak?
2120
01:38:01,470 --> 01:38:01,970
The--
2121
01:38:01,970 --> 01:38:02,450
AUDIENCE: N squared.
2122
01:38:02,450 --> 01:38:03,720
DAVID J. MALAN: --the n squared.
2123
01:38:03,720 --> 01:38:06,260
So yes, even though minus 2n is a good thing
2124
01:38:06,260 --> 01:38:08,660
because it's subtracting off some of the time required,
2125
01:38:08,660 --> 01:38:11,690
plus 1 is not that big a thing, there's such drops in the bucket when
2126
01:38:11,690 --> 01:38:14,120
n gets really large, like in the millions or billions,
2127
01:38:14,120 --> 01:38:18,510
certainly, that bubble sort 2 is on the order of n squared.
2128
01:38:18,510 --> 01:38:21,110
It's not the same exactly as selection sort.
2129
01:38:21,110 --> 01:38:23,030
But as n gets big, honestly, we're barely
2130
01:38:23,030 --> 01:38:25,530
going to be able to notice the difference most likely.
2131
01:38:25,530 --> 01:38:29,190
And so it too might be said to be on the order of n squared.
2132
01:38:29,190 --> 01:38:34,530
And if we consider now the lower bound on bubble sort's running time,
2133
01:38:34,530 --> 01:38:38,720
here's where things get potentially interesting.
2134
01:38:38,720 --> 01:38:44,450
What might you claim is the running time of bubble sort in the best case?
2135
01:38:44,450 --> 01:38:48,230
And the best case, I claim, is when the numbers are already sorted.
2136
01:38:48,230 --> 01:38:50,720
Is our pseudo code going to take that into account?
2137
01:38:50,720 --> 01:38:52,442
AUDIENCE: N
2138
01:38:52,442 --> 01:38:53,150
DAVID J. MALAN: OK, n.
2139
01:38:53,150 --> 01:38:54,350
Why do you propose n?
2140
01:38:54,350 --> 01:38:59,043
AUDIENCE: [INAUDIBLE]
2141
01:38:59,043 --> 01:39:00,710
DAVID J. MALAN: Yes, and that's the key word.
2142
01:39:00,710 --> 01:39:05,150
To summarize, in bubble sort, I do have to minimally make one pass because if I
2143
01:39:05,150 --> 01:39:07,760
don't look at all n elements, that I'm theoretically
2144
01:39:07,760 --> 01:39:09,260
just guessing if it's sorted or not.
2145
01:39:09,260 --> 01:39:11,480
Like, I obviously intuitively have to look
2146
01:39:11,480 --> 01:39:14,390
at every element to decide yay or nay, it's in the right order.
2147
01:39:14,390 --> 01:39:17,180
And my original pseudo code, though, is pretty naive.
2148
01:39:17,180 --> 01:39:22,640
It's just going to blindly go back and forth n minus 1 times again and again,
2149
01:39:22,640 --> 01:39:23,810
and that's going to add up.
2150
01:39:23,810 --> 01:39:25,760
But what if I add a bit of an optimization
2151
01:39:25,760 --> 01:39:28,010
that you might have glimpsed on the slide a moment ago
2152
01:39:28,010 --> 01:39:31,610
where if I compare two people and I don't swap them, compare two people,
2153
01:39:31,610 --> 01:39:34,550
don't swap them, and I go all the way through the list comparing
2154
01:39:34,550 --> 01:39:38,120
every pair of adjacent people, and I make no swaps,
2155
01:39:38,120 --> 01:39:40,820
it would be kind of not just naive but stupid
2156
01:39:40,820 --> 01:39:44,360
to do that same process again because if the humans have not moved,
2157
01:39:44,360 --> 01:39:47,150
I'm not going to make any different decisions.
2158
01:39:47,150 --> 01:39:49,740
I'm going to do nothing again, nothing again.
2159
01:39:49,740 --> 01:39:53,150
So at that point, it would be stupid, very inefficient, to go back and forth
2160
01:39:53,150 --> 01:39:54,210
and back and forth.
2161
01:39:54,210 --> 01:39:58,070
So if I modify our pseudo code with just an additional if condition,
2162
01:39:58,070 --> 01:40:00,050
I bet we can speed this up.
2163
01:40:00,050 --> 01:40:06,200
Inside of that same pseudo code, what if I say, hey, if no swaps, quit?
2164
01:40:06,200 --> 01:40:10,010
Like quit, prematurely before the loops are finished running.
2165
01:40:10,010 --> 01:40:13,380
One of the loops has gone through per the indentation here.
2166
01:40:13,380 --> 01:40:16,190
But if I do a loop from left to right and I
2167
01:40:16,190 --> 01:40:18,680
have made no swaps, which you can think of as just being
2168
01:40:18,680 --> 01:40:22,250
one other variable that's plus plusing as I go keeping, track of how many
2169
01:40:22,250 --> 01:40:22,820
swaps--
2170
01:40:22,820 --> 01:40:24,740
if I've made no swaps from left to right,
2171
01:40:24,740 --> 01:40:27,240
I'm not going to make any swaps the next time around either.
2172
01:40:27,240 --> 01:40:30,030
So let's just quit at that point.
2173
01:40:30,030 --> 01:40:32,630
And that is to say in the best case, if you will,
2174
01:40:32,630 --> 01:40:36,410
when the list is already sorted, the omega notation for bubble sort
2175
01:40:36,410 --> 01:40:41,460
might indeed be omega of n if you add that optimization
2176
01:40:41,460 --> 01:40:44,060
so as to short circuit all of that inefficient
2177
01:40:44,060 --> 01:40:49,830
looping to do it only as many times as is necessary.
2178
01:40:49,830 --> 01:40:52,280
Let me pause to see if there's any questions here.
2179
01:40:52,280 --> 01:40:53,192
Yeah.
2180
01:40:53,192 --> 01:41:02,038
AUDIENCE: [INAUDIBLE] to optimize the running time for all cases possible?
2181
01:41:02,038 --> 01:41:03,080
DAVID J. MALAN: Good question.
2182
01:41:03,080 --> 01:41:08,690
If the running time of selection sort and bubble sort are both in big O
2183
01:41:08,690 --> 01:41:14,480
of n squared but selection sort is in omega of n squared while bubble sort is
2184
01:41:14,480 --> 01:41:17,300
in omega of n, which sounds better--
2185
01:41:17,300 --> 01:41:20,510
I think if I may, should we just always use bubble sort?
2186
01:41:20,510 --> 01:41:24,680
Yes if we think that we might benefit over time
2187
01:41:24,680 --> 01:41:29,250
from a lot of good case scenarios or best case scenarios.
2188
01:41:29,250 --> 01:41:31,340
However, the goal at hand in just a bit is
2189
01:41:31,340 --> 01:41:33,480
going to be to do even better than both of these.
2190
01:41:33,480 --> 01:41:35,330
So hold that question further for a moment.
2191
01:41:35,330 --> 01:41:35,830
Yeah.
2192
01:41:35,830 --> 01:41:41,357
AUDIENCE: [INAUDIBLE] n minus 1?
2193
01:41:41,357 --> 01:41:41,940
DAVID J. MALAN: No.
2194
01:41:41,940 --> 01:41:43,080
So yes, good question.
2195
01:41:43,080 --> 01:41:46,890
So I say omega of n, but is it technically omega of n minus 1?
2196
01:41:46,890 --> 01:41:50,110
Maybe, but again, we're throwing away lower order terms.
2197
01:41:50,110 --> 01:41:53,670
And that's an advantage because we're not comparing things ever so precisely.
2198
01:41:53,670 --> 01:41:57,180
Just like I plotted with the green and yellow and red chart,
2199
01:41:57,180 --> 01:41:59,980
I just want to get a sense of the shape of these algorithms
2200
01:41:59,980 --> 01:42:03,330
so that when n gets really large, which of these choices
2201
01:42:03,330 --> 01:42:05,185
is going to matter the most?
2202
01:42:05,185 --> 01:42:07,560
At the end of the day, it's actually perfectly reasonable
2203
01:42:07,560 --> 01:42:09,460
to use selection sort or bubble sort if you
2204
01:42:09,460 --> 01:42:12,210
don't have that much data because they're going to be pretty fast.
2205
01:42:12,210 --> 01:42:14,520
My God, our computers nowadays are 1 gigahertz,
2206
01:42:14,520 --> 01:42:18,440
2 gigahertz, 1 billion things per second, 2 billion things per second.
2207
01:42:18,440 --> 01:42:20,940
But if we have large data sets, as we will later in the term
2208
01:42:20,940 --> 01:42:23,470
and as you might in the real world, that the Googles of the world,
2209
01:42:23,470 --> 01:42:25,530
then you're going to want to be more thoughtful.
2210
01:42:25,530 --> 01:42:27,240
And that's where we're going today.
2211
01:42:27,240 --> 01:42:30,090
All right, so let's actually see this visualized a little bit.
2212
01:42:30,090 --> 01:42:32,010
In a moment, I'm going to change screens here
2213
01:42:32,010 --> 01:42:37,350
to open up what is a little visualization tool that will give us
2214
01:42:37,350 --> 01:42:40,920
a sense of how these things actually work and look at a faster rate
2215
01:42:40,920 --> 01:42:42,820
than our humans are able to do here on stage.
2216
01:42:42,820 --> 01:42:47,580
So here is another visualization of a bunch of numbers, an array of numbers.
2217
01:42:47,580 --> 01:42:50,670
Short bars mean small numbers, tall bars mean big numbers.
2218
01:42:50,670 --> 01:42:52,920
So instead of having the numbers on their torsos here,
2219
01:42:52,920 --> 01:42:57,690
we just have bars that are small or tall based on the magnitude of the number.
2220
01:42:57,690 --> 01:43:00,630
Let me go ahead, and I preconfigured this in advance
2221
01:43:00,630 --> 01:43:02,050
to operate somewhat quickly.
2222
01:43:02,050 --> 01:43:05,550
Let's go ahead and do selections sort by clicking this button.
2223
01:43:05,550 --> 01:43:08,220
And you'll see some pink bars flying by.
2224
01:43:08,220 --> 01:43:11,640
And that's like me walking left and right, left and right,
2225
01:43:11,640 --> 01:43:14,140
to select the next smallest number.
2226
01:43:14,140 --> 01:43:17,580
And so what you'll see happening on the left of this array of numbers
2227
01:43:17,580 --> 01:43:20,520
is Celeste, if you will, and all of the other smaller numbers
2228
01:43:20,520 --> 01:43:24,030
are appearing on the left while we continue to solve the remaining
2229
01:43:24,030 --> 01:43:25,810
problems to the right.
2230
01:43:25,810 --> 01:43:29,177
So again, we no longer have to touch the smaller numbers here.
2231
01:43:29,177 --> 01:43:32,010
So that's why the problem is getting smaller and smaller and smaller
2232
01:43:32,010 --> 01:43:32,790
over time.
2233
01:43:32,790 --> 01:43:36,420
But you can notice now visually, look at how many times
2234
01:43:36,420 --> 01:43:37,920
we're retracing our steps.
2235
01:43:37,920 --> 01:43:40,710
This is why things that are n squared tend
2236
01:43:40,710 --> 01:43:44,830
to be frowned upon if avoidable because I'm touching the same elements again
2237
01:43:44,830 --> 01:43:45,330
and again.
2238
01:43:45,330 --> 01:43:47,610
When I was walking through, I kept pointing at the same humans
2239
01:43:47,610 --> 01:43:48,360
again and again.
2240
01:43:48,360 --> 01:43:49,660
And that adds up.
2241
01:43:49,660 --> 01:43:52,750
So let's see if bubble sort looks or feels a little different.
2242
01:43:52,750 --> 01:43:56,188
Let me re-randomize the thing, and let me now click Bubble Sort at the top.
2243
01:43:56,188 --> 01:43:58,980
And as you might infer, there's other sorting algorithms out there,
2244
01:43:58,980 --> 01:44:00,272
not all of which we'll look at.
2245
01:44:00,272 --> 01:44:01,740
But here's bubble sort.
2246
01:44:01,740 --> 01:44:04,570
Same pink coloration, but it's doing something different.
2247
01:44:04,570 --> 01:44:07,860
It's two pink bars going through again and again comparing
2248
01:44:07,860 --> 01:44:09,550
the adjacent numbers.
2249
01:44:09,550 --> 01:44:13,050
And you'll see that the largest numbers are indeed bubbling the way up
2250
01:44:13,050 --> 01:44:18,120
to the right, but the smaller numbers, like our number 0 was,
2251
01:44:18,120 --> 01:44:20,287
is only slowly making its way over.
2252
01:44:20,287 --> 01:44:21,120
Here's a comparable.
2253
01:44:21,120 --> 01:44:22,200
Here's the number one.
2254
01:44:22,200 --> 01:44:24,900
And it's going to take a while to get all the way to the left.
2255
01:44:24,900 --> 01:44:28,560
And here too, notice how many times the same bars
2256
01:44:28,560 --> 01:44:32,590
are becoming pink, how many times the algorithm is retracing and retracing
2257
01:44:32,590 --> 01:44:33,090
its steps.
2258
01:44:33,090 --> 01:44:33,600
Why?
2259
01:44:33,600 --> 01:44:37,140
Because it's only solving one problem at a time on each pass.
2260
01:44:37,140 --> 01:44:40,717
And each time we do that, we're stepping through practically the whole array.
2261
01:44:40,717 --> 01:44:43,800
And now granted, I could speed this up even further if I really wanted to,
2262
01:44:43,800 --> 01:44:48,120
but my God, this is only, what, like 50 or 60 elements, something like that?
2263
01:44:48,120 --> 01:44:49,140
This is slow.
2264
01:44:49,140 --> 01:44:52,320
Like, this is what n squared looks like and feels like.
2265
01:44:52,320 --> 01:44:54,570
And now I'm just trying to come up with words to say
2266
01:44:54,570 --> 01:44:56,280
until we get to the finish line here.
2267
01:44:56,280 --> 01:44:59,250
Like, this would be annoying if this is the speed of sorting,
2268
01:44:59,250 --> 01:45:03,133
and this is why I sort of secretly sorted the numbers for Rave in advance
2269
01:45:03,133 --> 01:45:05,550
because it would have taken us an annoying number of steps
2270
01:45:05,550 --> 01:45:07,210
to get that in place for her.
2271
01:45:07,210 --> 01:45:10,170
So those two algorithms are n squared.
2272
01:45:10,170 --> 01:45:12,265
Can we do, in fact, better?
2273
01:45:12,265 --> 01:45:15,390
Well, to save the best algorithm for last, let's take a shorter five minute
2274
01:45:15,390 --> 01:45:15,890
break here.
2275
01:45:15,890 --> 01:45:20,410
And when we come back, we'll do even better than n squared.
2276
01:45:20,410 --> 01:45:22,480
All right.
2277
01:45:22,480 --> 01:45:27,180
So the challenge at hand is to do better than selection sort
2278
01:45:27,180 --> 01:45:30,270
and better than bubble sort and ideally not just marginally
2279
01:45:30,270 --> 01:45:32,460
better but fundamentally better.
2280
01:45:32,460 --> 01:45:35,970
Just like in week zero, that third and final divide and conquer algorithm
2281
01:45:35,970 --> 01:45:38,860
was sort of fundamentally faster than the other two.
2282
01:45:38,860 --> 01:45:41,950
So can we do better than something on the order of n squared?
2283
01:45:41,950 --> 01:45:44,310
Well, I bet we can if we start to approach
2284
01:45:44,310 --> 01:45:46,080
the problem a little differently.
2285
01:45:46,080 --> 01:45:48,202
The sorts we've done thus far, generally known
2286
01:45:48,202 --> 01:45:50,160
as comparison sorts-- and that kind of captures
2287
01:45:50,160 --> 01:45:54,000
the reality that we were doing a huge number of comparisons again and again.
2288
01:45:54,000 --> 01:45:57,360
And you kind of saw that in the vertical bars that were going pink as everything
2289
01:45:57,360 --> 01:45:58,855
was being compared again and again.
2290
01:45:58,855 --> 01:46:01,230
But there's this programming technique, and it's actually
2291
01:46:01,230 --> 01:46:04,140
a mathematical technique known as recursion
2292
01:46:04,140 --> 01:46:05,950
that we've actually seen before.
2293
01:46:05,950 --> 01:46:09,000
And this is a building block or a mental model
2294
01:46:09,000 --> 01:46:12,300
we can bring to bear on the problem to solve the sorting problem
2295
01:46:12,300 --> 01:46:13,740
sort of fundamentally differently.
2296
01:46:13,740 --> 01:46:16,620
But first, let's look at it in a more familiar context.
2297
01:46:16,620 --> 01:46:23,190
A little bit ago, I proposed this pseudo code for the binary search algorithm.
2298
01:46:23,190 --> 01:46:26,130
And notice that what was interesting about this code,
2299
01:46:26,130 --> 01:46:29,970
even though I didn't call it out at the time, it's kind of cyclically defined.
2300
01:46:29,970 --> 01:46:32,730
Like, I claim this is an algorithm for search,
2301
01:46:32,730 --> 01:46:36,750
and yet it seems a little unfair that I'm using the verb search
2302
01:46:36,750 --> 01:46:38,960
inside of the algorithm for search.
2303
01:46:38,960 --> 01:46:41,720
It's like an English sort of defining a word by using the word.
2304
01:46:41,720 --> 01:46:43,850
Normally, you shouldn't really get away with that.
2305
01:46:43,850 --> 01:46:46,430
But there's something interesting about this technique
2306
01:46:46,430 --> 01:46:51,110
here because even though this whole thing is a search algorithm
2307
01:46:51,110 --> 01:46:56,510
and I'm using my own algorithm to search the left half or the right half,
2308
01:46:56,510 --> 01:46:58,520
the key feature here that doesn't normally
2309
01:46:58,520 --> 01:47:01,670
happen in English when you define a word in terms of a word
2310
01:47:01,670 --> 01:47:05,300
is that when I search the left half or search the right half, yes,
2311
01:47:05,300 --> 01:47:06,810
I'm doing the same thing.
2312
01:47:06,810 --> 01:47:08,090
I'm using the same algorithm.
2313
01:47:08,090 --> 01:47:11,060
But the problem is, by definition, half as large.
2314
01:47:11,060 --> 01:47:14,180
So this isn't going to be a cyclical argument in the same way.
2315
01:47:14,180 --> 01:47:17,750
This approach, by using search within search
2316
01:47:17,750 --> 01:47:21,290
is going to whittle the problem down and down and down until hopefully,
2317
01:47:21,290 --> 01:47:24,180
one door or no doors remains.
2318
01:47:24,180 --> 01:47:26,720
And so recursion is a programming technique
2319
01:47:26,720 --> 01:47:29,930
whereby a function calls itself.
2320
01:47:29,930 --> 01:47:34,220
And we haven't seen this yet in C, and we haven't seen this really in Scratch.
2321
01:47:34,220 --> 01:47:38,122
But in C, you can have a function call itself.
2322
01:47:38,122 --> 01:47:39,830
And the form that takes is like literally
2323
01:47:39,830 --> 01:47:44,180
using the function's name inside of the function's implementation itself.
2324
01:47:44,180 --> 01:47:48,310
We've actually seen an opportunity for this once before too.
2325
01:47:48,310 --> 01:47:49,310
Think back to week zero.
2326
01:47:49,310 --> 01:47:51,560
Here's that same pseudo code for searching for someone
2327
01:47:51,560 --> 01:47:53,180
in an actual, physical phone book.
2328
01:47:53,180 --> 01:47:55,700
And notice these yellow lines here.
2329
01:47:55,700 --> 01:47:59,810
We described those in week zero as inducing a loop, a cycle.
2330
01:47:59,810 --> 01:48:04,310
And this is a very procedural approach, if you will, because lines 8 and 11
2331
01:48:04,310 --> 01:48:06,470
are very mechanically, if you will, telling
2332
01:48:06,470 --> 01:48:10,370
me to go back to line three to do this kind of looping thing.
2333
01:48:10,370 --> 01:48:14,750
But really, what that's doing in the binary search algorithm for the phone
2334
01:48:14,750 --> 01:48:20,120
book is it's just telling me to search the left half or search the right half.
2335
01:48:20,120 --> 01:48:23,900
I'm doing it more mechanically again by sort of telling myself
2336
01:48:23,900 --> 01:48:25,490
what line number to go back to.
2337
01:48:25,490 --> 01:48:28,400
But that's equivalent to just telling myself go search the left half,
2338
01:48:28,400 --> 01:48:30,950
search the right half, the key thing being the left
2339
01:48:30,950 --> 01:48:33,720
have and the right half are smaller than the original problem.
2340
01:48:33,720 --> 01:48:37,630
It would be a bug if I just said search the phone book, search the phone book,
2341
01:48:37,630 --> 01:48:39,380
because obviously, you never get anywhere.
2342
01:48:39,380 --> 01:48:41,540
But if you search the half, the half, the half,
2343
01:48:41,540 --> 01:48:43,470
problem gets smaller and smaller.
2344
01:48:43,470 --> 01:48:50,180
So let's reformulate week zero's phone book code to be not procedural as here
2345
01:48:50,180 --> 01:48:55,140
but recursive whereby in this search algorithm,
2346
01:48:55,140 --> 01:48:58,580
AKA binary search, formerly called divide and conquer, I'm
2347
01:48:58,580 --> 01:49:01,940
going to literally use also the keyword search here.
2348
01:49:01,940 --> 01:49:03,950
Notice among the benefits of doing this is it
2349
01:49:03,950 --> 01:49:06,798
kind of tightens the code up, makes it a little more succinct,
2350
01:49:06,798 --> 01:49:08,840
even though that's kind of a fringe benefit here.
2351
01:49:08,840 --> 01:49:12,350
But it's an elegant way too of describing
2352
01:49:12,350 --> 01:49:17,210
a problem by just having a function use itself
2353
01:49:17,210 --> 01:49:21,330
to solve a smaller puzzle at hand.
2354
01:49:21,330 --> 01:49:24,380
So let's now consider a familiar problem, a smaller
2355
01:49:24,380 --> 01:49:27,132
version than the one you've dabbled with-- this sort of pyramid,
2356
01:49:27,132 --> 01:49:28,340
this half pyramid from Mario.
2357
01:49:28,340 --> 01:49:31,070
And let's throw away the parts that aren't that interesting
2358
01:49:31,070 --> 01:49:35,420
and just consider how we might, up until now, implement this in C code,
2359
01:49:35,420 --> 01:49:37,520
this left aligned pyramid, if you will.
2360
01:49:37,520 --> 01:49:44,630
Let me go over here, and let me create a file called-- how about iteration.c?
2361
01:49:44,630 --> 01:49:48,080
And in this file, I'm going to go ahead and include cs50.h.
2362
01:49:48,080 --> 01:49:51,890
And I'm going to include stdio.h.
2363
01:49:51,890 --> 01:49:57,440
And the goal at hand is to implement in C a little program that just prints out
2364
01:49:57,440 --> 01:49:59,280
this and exactly this pyramid.
2365
01:49:59,280 --> 01:50:02,113
So no get string or any of that-- we're just going to keep it simple
2366
01:50:02,113 --> 01:50:05,400
and print exactly this pyramid of height 4 here.
2367
01:50:05,400 --> 01:50:06,570
So how might I do this?
2368
01:50:06,570 --> 01:50:10,945
Well, let me go ahead, and in main, let me first ask the user for--
2369
01:50:10,945 --> 01:50:12,570
well, we'll go ahead and generalize it.
2370
01:50:12,570 --> 01:50:14,403
Let's go ahead and ask the user for heights.
2371
01:50:14,403 --> 01:50:16,250
We're using getint as before.
2372
01:50:16,250 --> 01:50:18,500
And I'll store that in a variable called height.
2373
01:50:18,500 --> 01:50:20,750
And then let me go ahead and simply call the function
2374
01:50:20,750 --> 01:50:22,410
draw passing in that height.
2375
01:50:22,410 --> 01:50:25,040
So for the moment, let me assume that someone somewhere
2376
01:50:25,040 --> 01:50:26,630
has implemented a draw function.
2377
01:50:26,630 --> 01:50:29,810
And this, then, is the entirety of my program.
2378
01:50:29,810 --> 01:50:32,970
All right, unfortunately, C does not come with a draw function.
2379
01:50:32,970 --> 01:50:34,820
So let me go ahead and invent one.
2380
01:50:34,820 --> 01:50:36,300
It doesn't need to return a value.
2381
01:50:36,300 --> 01:50:38,850
It just needs to print something-- so-called side effect.
2382
01:50:38,850 --> 01:50:43,460
So I'm going to define a function called draw that takes as input an int.
2383
01:50:43,460 --> 01:50:46,280
I'll call it n for number, but I could call it anything I want.
2384
01:50:46,280 --> 01:50:47,850
And inside of this.
2385
01:50:47,850 --> 01:50:53,240
I'm going to go ahead and print out a left aligned pyramid like this from top
2386
01:50:53,240 --> 01:50:53,990
to bottom.
2387
01:50:53,990 --> 01:50:57,710
The salient features here are that this is a pyramid, at least in this example,
2388
01:50:57,710 --> 01:50:58,820
of height four.
2389
01:50:58,820 --> 01:51:02,240
And now in height four, the first row has one brick.
2390
01:51:02,240 --> 01:51:03,470
The second row has two.
2391
01:51:03,470 --> 01:51:04,400
The third has three.
2392
01:51:04,400 --> 01:51:05,270
The fourth has four.
2393
01:51:05,270 --> 01:51:08,100
That's a nice pattern that I can probably represent in code.
2394
01:51:08,100 --> 01:51:09,210
So how might I do this?
2395
01:51:09,210 --> 01:51:11,390
Well, how about 4 int i gets--
2396
01:51:11,390 --> 01:51:12,950
let me do it the old school way--
2397
01:51:12,950 --> 01:51:13,730
1.
2398
01:51:13,730 --> 01:51:18,530
And then i is less than or equal to n.
2399
01:51:18,530 --> 01:51:20,180
And then i plus plus--
2400
01:51:20,180 --> 01:51:23,810
so I'm going from 1 to 4 just to keep myself sane here.
2401
01:51:23,810 --> 01:51:26,690
And then inside of this loop, what do I want to do?
2402
01:51:26,690 --> 01:51:28,560
Well, let me keep it conventional, in fact.
2403
01:51:28,560 --> 01:51:31,970
Let me just change this to be the more conventional 0
2404
01:51:31,970 --> 01:51:36,170
to n even though it might not be as intuitive because now on row 0,
2405
01:51:36,170 --> 01:51:37,550
I want one brick.
2406
01:51:37,550 --> 01:51:39,965
On row 1, I want two bricks, dot dot dot.
2407
01:51:39,965 --> 01:51:41,300
On row 3, I want four.
2408
01:51:41,300 --> 01:51:42,530
So it's kind of offset now.
2409
01:51:42,530 --> 01:51:44,240
But I'm being more conventional.
2410
01:51:44,240 --> 01:51:48,440
So on each row, how many bricks do I want to print?
2411
01:51:48,440 --> 01:51:49,880
Well, I think I want to do this.
2412
01:51:49,880 --> 01:51:55,700
For int j, for instance, common to use j after if you have a nested loop,
2413
01:51:55,700 --> 01:52:02,930
let's start j at 0 and do this so long as is less than i plus 1
2414
01:52:02,930 --> 01:52:04,460
and then do j plus plus.
2415
01:52:04,460 --> 01:52:06,500
So why i plus 1?
2416
01:52:06,500 --> 01:52:10,670
Well, again, when I equals 0, that's the first row, and I want one brick.
2417
01:52:10,670 --> 01:52:12,650
When i equals 1, that's the second row.
2418
01:52:12,650 --> 01:52:13,430
I want two bricks.
2419
01:52:13,430 --> 01:52:16,250
And dot dot dot, when i is 3, I want four bricks.
2420
01:52:16,250 --> 01:52:19,460
So again, I have to add 1 to i to get the total number of bricks
2421
01:52:19,460 --> 01:52:21,120
that I want to print to the screen.
2422
01:52:21,120 --> 01:52:25,820
So inside of this nested for loop, I'm going to do printf of a hash
2423
01:52:25,820 --> 01:52:28,970
with no new line.
2424
01:52:28,970 --> 01:52:33,112
I'm going to save the new line for about here instead.
2425
01:52:33,112 --> 01:52:34,820
All right, the last thing I'm going to do
2426
01:52:34,820 --> 01:52:38,550
is copy and paste the prototype at the top of the file.
2427
01:52:38,550 --> 01:52:39,620
So that I can call this.
2428
01:52:39,620 --> 01:52:42,728
And again, this is of now week one, week two.
2429
01:52:42,728 --> 01:52:45,020
Wouldn't necessarily come to your mind as quickly as it
2430
01:52:45,020 --> 01:52:48,380
might to mine after all this practice, but this is something reminiscent
2431
01:52:48,380 --> 01:52:51,230
of what you yourself did already for Mario-- printing out
2432
01:52:51,230 --> 01:52:54,330
a pyramid that hopefully in a moment is going to look like this.
2433
01:52:54,330 --> 01:52:56,370
So let me go back to my code.
2434
01:52:56,370 --> 01:53:00,560
Let me run make iteration, and let me do dot slash iteration.
2435
01:53:00,560 --> 01:53:02,540
I'll type in 4, and voila.
2436
01:53:02,540 --> 01:53:05,840
Seems to be correct, and let's assume it's going to work for other inputs
2437
01:53:05,840 --> 01:53:06,500
as well.
2438
01:53:06,500 --> 01:53:07,310
Oh, thank you.
2439
01:53:07,310 --> 01:53:11,770
2440
01:53:11,770 --> 01:53:15,460
So this is indeed an example of iteration-- doing something
2441
01:53:15,460 --> 01:53:16,780
again and again.
2442
01:53:16,780 --> 01:53:18,160
And it's very procedural.
2443
01:53:18,160 --> 01:53:21,160
Like, I literally have a function called draw that does this thing.
2444
01:53:21,160 --> 01:53:25,570
But I can think about implementing draw in a somewhat different way that's
2445
01:53:25,570 --> 01:53:26,710
kind of clever.
2446
01:53:26,710 --> 01:53:28,720
And it's not strictly necessary for this problem
2447
01:53:28,720 --> 01:53:31,360
because this problem honestly is not that complicated
2448
01:53:31,360 --> 01:53:33,387
to solve once you have practice under your belt.
2449
01:53:33,387 --> 01:53:36,220
Certainly the first time around, probably significantly challenging.
2450
01:53:36,220 --> 01:53:39,250
But now that you kind of associate, OK, row one
2451
01:53:39,250 --> 01:53:42,010
with one brick, row two with two bricks, it kind of comes together
2452
01:53:42,010 --> 01:53:43,340
with these for loops.
2453
01:53:43,340 --> 01:53:45,770
But how else could we think about this problem?
2454
01:53:45,770 --> 01:53:48,940
Well, this physical structure, these bricks, in some sense
2455
01:53:48,940 --> 01:53:54,980
is a recursive structure, a structure that's defined in terms of itself.
2456
01:53:54,980 --> 01:53:56,360
Now what do I mean by that?
2457
01:53:56,360 --> 01:54:01,600
Well, if I were to ask you the question, what does a pyramid of height 4
2458
01:54:01,600 --> 01:54:04,720
look like, you would point, of course, to this picture.
2459
01:54:04,720 --> 01:54:11,170
But you could also kind of cleverly say to me, well,
2460
01:54:11,170 --> 01:54:16,240
it's actually a pyramid of height 3 plus 1 additional row.
2461
01:54:16,240 --> 01:54:18,237
And here's that cyclical argument, right?
2462
01:54:18,237 --> 01:54:21,070
Kind of obnoxious to do typically in English or in a spoken language
2463
01:54:21,070 --> 01:54:23,200
because you're defining one thing in terms of itself.
2464
01:54:23,200 --> 01:54:24,408
What's a pyramid of height 4?
2465
01:54:24,408 --> 01:54:28,180
Well, it's a pyramid of height 3 plus 1 more row.
2466
01:54:28,180 --> 01:54:30,940
But we can kind of leverage this logic in code.
2467
01:54:30,940 --> 01:54:32,590
Well, what's a pyramid of height 3?
2468
01:54:32,590 --> 01:54:34,870
Well, it's a pyramid of height 2 plus 1 more row.
2469
01:54:34,870 --> 01:54:36,940
Fine, what's a pyramid of height 2?
2470
01:54:36,940 --> 01:54:39,370
Well, it's a pyramid of height 1 plus 1 more row.
2471
01:54:39,370 --> 01:54:42,370
And then hopefully, this process ends, and it does because notice,
2472
01:54:42,370 --> 01:54:44,990
the pyramid is getting smaller and smaller.
2473
01:54:44,990 --> 01:54:48,610
So you're not going to have this sort of silly back and forth with me
2474
01:54:48,610 --> 01:54:52,550
infinitely many times because when we finally get to the base case,
2475
01:54:52,550 --> 01:54:54,130
the end of the pyramid, fine.
2476
01:54:54,130 --> 01:54:55,600
What is a pyramid of height 1?
2477
01:54:55,600 --> 01:54:58,630
Well, it's a pyramid of no height plus one more row.
2478
01:54:58,630 --> 01:55:01,270
And at that point, things just get negative--
2479
01:55:01,270 --> 01:55:02,050
no pun intended.
2480
01:55:02,050 --> 01:55:04,035
Things just would otherwise go negative.
2481
01:55:04,035 --> 01:55:05,410
And so you can just kind of stop.
2482
01:55:05,410 --> 01:55:07,930
The base case is when there is no more pyramid.
2483
01:55:07,930 --> 01:55:12,080
So there's a way to draw a line in the sand and say, stop, no more arguments.
2484
01:55:12,080 --> 01:55:16,180
But this idea of defining a physical structure in terms of itself
2485
01:55:16,180 --> 01:55:22,180
or code in terms of itself actually lets us do some interesting new algorithms.
2486
01:55:22,180 --> 01:55:24,160
Let me go back to my code here.
2487
01:55:24,160 --> 01:55:29,800
Let me go ahead and create one final file here called recursion.c
2488
01:55:29,800 --> 01:55:36,100
that leverages this idea of this built-in self-referential nature.
2489
01:55:36,100 --> 01:55:38,080
Let me include cs50.h.
2490
01:55:38,080 --> 01:55:41,740
Let me go ahead and include standardio.h, int main void.
2491
01:55:41,740 --> 01:55:46,000
And then inside of main, I'm going to do the exact same thing-- int
2492
01:55:46,000 --> 01:55:50,470
height equals get int, asking the user for height.
2493
01:55:50,470 --> 01:55:53,870
And then I'm going to go ahead and call draw passing in height.
2494
01:55:53,870 --> 01:55:55,520
So that's going to stay the same.
2495
01:55:55,520 --> 01:56:01,580
I even am going to make my prototype the same-- void draw int n semicolon.
2496
01:56:01,580 --> 01:56:03,820
And now I'm going to implement void down here
2497
01:56:03,820 --> 01:56:05,590
with that same prototype, of course.
2498
01:56:05,590 --> 01:56:08,470
But the code now is going to be a little different.
2499
01:56:08,470 --> 01:56:10,070
What am I going to do here?
2500
01:56:10,070 --> 01:56:15,610
Well, first of all, if you ask me to draw a pyramid of height n,
2501
01:56:15,610 --> 01:56:18,580
I'm going to be kind of a wise ass here and say, well, just
2502
01:56:18,580 --> 01:56:21,010
draw a pyramid of n minus 1--
2503
01:56:21,010 --> 01:56:21,762
done.
2504
01:56:21,762 --> 01:56:24,220
All right, but there's still a little more work to be done.
2505
01:56:24,220 --> 01:56:28,660
What happens after I print or draw a pyramid of height n minus 1
2506
01:56:28,660 --> 01:56:33,340
according to our structural definition a moment ago?
2507
01:56:33,340 --> 01:56:38,470
What remains after drawing a pyramid of height n minus 1 or 3, specifically?
2508
01:56:38,470 --> 01:56:40,840
AUDIENCE: [INAUDIBLE]
2509
01:56:40,840 --> 01:56:42,400
We need one more row of hashes.
2510
01:56:42,400 --> 01:56:43,750
OK, so I can do that, right?
2511
01:56:43,750 --> 01:56:45,213
I'm OK with the single loops.
2512
01:56:45,213 --> 01:56:46,630
There's no nesting necessary here.
2513
01:56:46,630 --> 01:56:51,010
I'm just going to do this-- for int i get 0, i is less than n,
2514
01:56:51,010 --> 01:56:53,185
which is the height that's passed in, i plus plus.
2515
01:56:53,185 --> 01:56:55,060
And then inside of this loop, I'm very simply
2516
01:56:55,060 --> 01:56:56,740
going to print out a single hash.
2517
01:56:56,740 --> 01:57:00,650
And then down here, I'm going to print out a new line at the very end.
2518
01:57:00,650 --> 01:57:01,600
So that's good, right?
2519
01:57:01,600 --> 01:57:03,720
I might not be as comfortable with nested loops.
2520
01:57:03,720 --> 01:57:04,720
This is nice and simple.
2521
01:57:04,720 --> 01:57:07,660
What does this loop do here on line 17 through 20?
2522
01:57:07,660 --> 01:57:13,540
It literally prints n hashes by counting from i equals 0 on up to
2523
01:57:13,540 --> 01:57:15,130
but not through n.
2524
01:57:15,130 --> 01:57:17,930
So that's sort of week one style syntax.
2525
01:57:17,930 --> 01:57:20,740
But this is kind of trippy now because I've somehow
2526
01:57:20,740 --> 01:57:25,480
boiled down the implementation of draw into printing a row after just
2527
01:57:25,480 --> 01:57:27,250
drawing the thing above it.
2528
01:57:27,250 --> 01:57:31,250
But this is problematic as is because in this case,
2529
01:57:31,250 --> 01:57:37,900
my drawer function, notice, is always going to call the draw function forever
2530
01:57:37,900 --> 01:57:38,890
in some sense.
2531
01:57:38,890 --> 01:57:44,200
But ideally, when do I want this cyclical process to stop?
2532
01:57:44,200 --> 01:57:48,010
When do I want to not call draw anymore?
2533
01:57:48,010 --> 01:57:51,297
Yeah, when n is 1, right?
2534
01:57:51,297 --> 01:57:53,380
When I get to the top of the pyramid, when n is 1,
2535
01:57:53,380 --> 01:57:55,720
or heck, when the pyramids all gone and n equals 0.
2536
01:57:55,720 --> 01:57:58,060
I can pick any line in the sand, so long as it's
2537
01:57:58,060 --> 01:57:59,690
sort of at the end of the process.
2538
01:57:59,690 --> 01:58:01,480
Then I don't want to call draw anymore.
2539
01:58:01,480 --> 01:58:03,850
So maybe what I should do is this.
2540
01:58:03,850 --> 01:58:10,450
If n equals equals 0, there's really nothing to draw.
2541
01:58:10,450 --> 01:58:14,340
So I'm just going to go ahead and return like this.
2542
01:58:14,340 --> 01:58:16,920
Otherwise, I'm going to go ahead and draw
2543
01:58:16,920 --> 01:58:20,238
n minus 1 rows and then one more row.
2544
01:58:20,238 --> 01:58:21,780
And I could express this differently.
2545
01:58:21,780 --> 01:58:24,540
I could do something like this, which would be equivalent.
2546
01:58:24,540 --> 01:58:29,490
I could say something like if n is greater than or equal to 0,
2547
01:58:29,490 --> 01:58:31,462
then go ahead and draw the row.
2548
01:58:31,462 --> 01:58:32,670
But I like it this way first.
2549
01:58:32,670 --> 01:58:34,587
For now, I'm going to go with the original way
2550
01:58:34,587 --> 01:58:37,740
just to ask a simple question and then just bail out of the function
2551
01:58:37,740 --> 01:58:38,970
if n equals 0.
2552
01:58:38,970 --> 01:58:41,740
And heck, just to be super safe, just in case
2553
01:58:41,740 --> 01:58:44,040
the user types in a negative number, let me also
2554
01:58:44,040 --> 01:58:46,980
just check if n is a negative number, also, just return immediately.
2555
01:58:46,980 --> 01:58:48,210
Don't do anything.
2556
01:58:48,210 --> 01:58:51,490
I'm not returning a value because again, the function is void.
2557
01:58:51,490 --> 01:58:53,680
It doesn't need or have a return value.
2558
01:58:53,680 --> 01:58:55,860
So just saying return suffices.
2559
01:58:55,860 --> 01:59:00,660
But if n equals 1 or 2 or 3 or anything higher,
2560
01:59:00,660 --> 01:59:06,120
it is reasonable to draw a pyramid of slightly shorter height like, instead
2561
01:59:06,120 --> 01:59:10,680
of 4, 3, and then go ahead and print one more row.
2562
01:59:10,680 --> 01:59:16,450
So this is an example now of code that calls itself within itself.
2563
01:59:16,450 --> 01:59:18,150
Draw is calling draw.
2564
01:59:18,150 --> 01:59:22,920
But this so-called base case ensures, this conditional ensures,
2565
01:59:22,920 --> 01:59:24,690
that we're not going to do this forever.
2566
01:59:24,690 --> 01:59:27,300
Otherwise, we literally would do this infinitely many times,
2567
01:59:27,300 --> 01:59:30,330
and something bad is probably going to happen.
2568
01:59:30,330 --> 01:59:34,230
All right, let me go ahead and compile this code-- make recursion.
2569
01:59:34,230 --> 01:59:38,250
OK, no syntax errors-- dot slash recursion, Enter, height of 4,
2570
01:59:38,250 --> 01:59:40,590
and voila.
2571
01:59:40,590 --> 01:59:44,400
If only because some of you have run into this issue accidentally already,
2572
01:59:44,400 --> 01:59:48,420
let me get rid of the base case here, and let me recompile the code.
2573
01:59:48,420 --> 01:59:49,980
Make recursion.
2574
01:59:49,980 --> 01:59:52,150
Oh, and actually, now it's actually catching it.
2575
01:59:52,150 --> 01:59:55,050
So the compiler is smart enough here to realize
2576
01:59:55,050 --> 01:59:58,470
that all paths through this function will call itself.
2577
01:59:58,470 --> 02:00:01,300
AKA, It's going to loop forever.
2578
02:00:01,300 --> 02:00:03,090
So let me do the first thing.
2579
02:00:03,090 --> 02:00:05,550
Suppose I only check for n equaling 0.
2580
02:00:05,550 --> 02:00:09,390
Let me go ahead and recompile this code with make recursion.
2581
02:00:09,390 --> 02:00:12,030
And now let me just be kind of uncooperative.
2582
02:00:12,030 --> 02:00:16,200
When I run this program, still works for 4, still works for 0.
2583
02:00:16,200 --> 02:00:19,110
What if I do like negative 100?
2584
02:00:19,110 --> 02:00:22,920
Have any of you experienced a segmentation fault or core dump?
2585
02:00:22,920 --> 02:00:24,240
OK, so no shame in this.
2586
02:00:24,240 --> 02:00:28,980
Like, this means I have somehow touched memory that I shouldn't have.
2587
02:00:28,980 --> 02:00:32,910
And in short, I actually called this function thousands of times
2588
02:00:32,910 --> 02:00:35,550
accidentally, it would seem now, until the program just
2589
02:00:35,550 --> 02:00:38,413
bailed on me because I eventually touched memory in the computer
2590
02:00:38,413 --> 02:00:39,330
that I shouldn't have.
2591
02:00:39,330 --> 02:00:41,140
That'll make even more sense next week.
2592
02:00:41,140 --> 02:00:42,640
But for now, it's simply a bug.
2593
02:00:42,640 --> 02:00:44,430
And I can avoid that bug in this context,
2594
02:00:44,430 --> 02:00:48,900
probably not your own pset context, by just making sure we don't even
2595
02:00:48,900 --> 02:00:51,340
allow for negative numbers at all.
2596
02:00:51,340 --> 02:00:53,700
So with this building block in place, what
2597
02:00:53,700 --> 02:00:57,420
can we now do in terms of those same numbers to sort?
2598
02:00:57,420 --> 02:01:00,390
Well, it turns out there's a sorting algorithm called merge sort.
2599
02:01:00,390 --> 02:01:01,980
And there's bunches of others too.
2600
02:01:01,980 --> 02:01:06,923
But merge sort is a nice one to discuss because it fundamentally, we hope,
2601
02:01:06,923 --> 02:01:09,090
is going to do better than selection sort and bubble
2602
02:01:09,090 --> 02:01:11,490
sort that is better than n squared.
2603
02:01:11,490 --> 02:01:14,238
But the catch is it's a little harder to think about.
2604
02:01:14,238 --> 02:01:17,280
In fact, I'll act it out myself with just these numbers on the shelf here
2605
02:01:17,280 --> 02:01:21,360
rather than humans because recursion in general takes a little bit of effort
2606
02:01:21,360 --> 02:01:23,700
to wrap your mind around, typically a bit of practice.
2607
02:01:23,700 --> 02:01:25,908
But I'll see if we can't walk through it methodically
2608
02:01:25,908 --> 02:01:28,260
enough such that this comes to light.
2609
02:01:28,260 --> 02:01:32,460
So here's the pseudo code I propose for this algorithm called merge sort.
2610
02:01:32,460 --> 02:01:35,640
In the spirit of recursion, this sorting algorithm
2611
02:01:35,640 --> 02:01:41,020
literally calls itself by using the verb sort in its pseudo code.
2612
02:01:41,020 --> 02:01:42,600
So how does merge sort work?
2613
02:01:42,600 --> 02:01:46,020
It sort of obnoxiously says, well, if you want to sort all of these things,
2614
02:01:46,020 --> 02:01:48,720
go sort the left half, then go sort the right half,
2615
02:01:48,720 --> 02:01:50,430
and then merge the two together.
2616
02:01:50,430 --> 02:01:51,682
Now obnoxious in what sense?
2617
02:01:51,682 --> 02:01:54,390
Well, if I just asked you to sort something and you just tell me,
2618
02:01:54,390 --> 02:01:56,070
well, go sort that thing and then go sort
2619
02:01:56,070 --> 02:01:58,737
that thing, what was the point of asking you in the first place?
2620
02:01:58,737 --> 02:02:01,080
But the key is that each of these lines is
2621
02:02:01,080 --> 02:02:03,880
sorting a smaller piece of the problem.
2622
02:02:03,880 --> 02:02:06,450
So eventually, we'll be able to pare this down
2623
02:02:06,450 --> 02:02:10,410
into something that doesn't go on forever because in fact, in merge sort,
2624
02:02:10,410 --> 02:02:12,120
there's a base case too.
2625
02:02:12,120 --> 02:02:14,940
There's a scenario where we just check, wait a minute,
2626
02:02:14,940 --> 02:02:17,520
if there's only one number to sort, that's it.
2627
02:02:17,520 --> 02:02:19,470
Quit then because you're all done.
2628
02:02:19,470 --> 02:02:22,590
So there has to be this base case in any use of recursion
2629
02:02:22,590 --> 02:02:27,090
to make sure that you don't mindlessly call yourself forever.
2630
02:02:27,090 --> 02:02:29,380
You've got to stop at some point.
2631
02:02:29,380 --> 02:02:32,560
So let's focus on the third of these steps.
2632
02:02:32,560 --> 02:02:37,328
What does it mean to merge two lists, two halves of a list,
2633
02:02:37,328 --> 02:02:39,120
just because this is apparently going to be
2634
02:02:39,120 --> 02:02:41,290
a key ingredient-- so here, for instance,
2635
02:02:41,290 --> 02:02:43,890
are two halves of a list of size 8.
2636
02:02:43,890 --> 02:02:47,010
We have the numbers 2-- and I'll call it out if you're at a bad angle--
2637
02:02:47,010 --> 02:02:52,200
2457 and 0136.
2638
02:02:52,200 --> 02:02:56,710
Notice that the left half at the moment, 2457, is already sorted,
2639
02:02:56,710 --> 02:03:01,000
and the right half, 0136, is also sorted as well.
2640
02:03:01,000 --> 02:03:03,870
So that's a good thing because it means that theoretically, I've
2641
02:03:03,870 --> 02:03:05,280
sorted the left half already.
2642
02:03:05,280 --> 02:03:07,620
I've sorted the right half already before we began.
2643
02:03:07,620 --> 02:03:09,690
I just need to merge these two halves.
2644
02:03:09,690 --> 02:03:11,800
What does it mean to sort two halves?
2645
02:03:11,800 --> 02:03:13,550
Well, for the sake of discussion, I'm just
2646
02:03:13,550 --> 02:03:19,340
going to turn over most of the numbers except for the first numbers in each
2647
02:03:19,340 --> 02:03:20,360
of these halves.
2648
02:03:20,360 --> 02:03:23,390
There's two halves here, left and right.
2649
02:03:23,390 --> 02:03:25,460
At the moment, I'm only going to consider
2650
02:03:25,460 --> 02:03:29,532
the leftmost element of each half-- that is, the one on the left here
2651
02:03:29,532 --> 02:03:30,740
and the one on the left here.
2652
02:03:30,740 --> 02:03:33,800
How do I merge these two lists together?
2653
02:03:33,800 --> 02:03:38,540
Well, if I look at 2 and I look at 0, which one should presumably come first?
2654
02:03:38,540 --> 02:03:39,502
The smaller one.
2655
02:03:39,502 --> 02:03:41,210
So I'm going to grab the 0, and I'm going
2656
02:03:41,210 --> 02:03:44,150
to put it into its own place on this new shelf here.
2657
02:03:44,150 --> 02:03:50,300
And now I'm going to consider, as part of my iteration,
2658
02:03:50,300 --> 02:03:53,270
the beginning of this list and the new beginning of this list.
2659
02:03:53,270 --> 02:03:55,130
So I'm now comparing 2 and 1.
2660
02:03:55,130 --> 02:03:56,210
Which one's smaller?
2661
02:03:56,210 --> 02:03:58,400
I'm going to go ahead and grab the 1.
2662
02:03:58,400 --> 02:04:01,130
Now I'm going to compare the beginning of the left list
2663
02:04:01,130 --> 02:04:03,440
and the new beginning of the right list, 2 and 3.
2664
02:04:03,440 --> 02:04:05,018
Of course, it's 2.
2665
02:04:05,018 --> 02:04:07,310
Now I'm going to compare the beginning of the left list
2666
02:04:07,310 --> 02:04:09,290
and the beginning of the right list, 4 and 3.
2667
02:04:09,290 --> 02:04:11,510
It's of course 3.
2668
02:04:11,510 --> 02:04:14,250
Now I'm going to compare the 4 against the beginning and end,
2669
02:04:14,250 --> 02:04:15,830
it turns out, of the second list--
2670
02:04:15,830 --> 02:04:16,988
4, of course.
2671
02:04:16,988 --> 02:04:19,280
Now I'm going to compare the beginning of the left list
2672
02:04:19,280 --> 02:04:20,822
and the beginning of the right list--
2673
02:04:20,822 --> 02:04:22,430
5, of course.
2674
02:04:22,430 --> 02:04:25,010
I'm realizing this is not going to end well because I left
2675
02:04:25,010 --> 02:04:26,420
too much distance between the numbers.
2676
02:04:26,420 --> 02:04:28,337
But that has nothing to do with the algorithm.
2677
02:04:28,337 --> 02:04:29,840
7 is the beginning of the left list.
2678
02:04:29,840 --> 02:04:31,460
6 is the beginning of the right list.
2679
02:04:31,460 --> 02:04:32,960
It's, of course, 6.
2680
02:04:32,960 --> 02:04:36,410
And at the risk of knocking all of these over,
2681
02:04:36,410 --> 02:04:43,310
if I now make room for this element, we have hopefully
2682
02:04:43,310 --> 02:04:50,400
sorted the whole thing by having merged together the two halves of the list.
2683
02:04:50,400 --> 02:04:52,040
So in short-- thank you.
2684
02:04:52,040 --> 02:04:56,448
2685
02:04:56,448 --> 02:04:58,740
I'm a little worried that's just getting sarcastic now,
2686
02:04:58,740 --> 02:05:03,870
but we now have merged two half lists.
2687
02:05:03,870 --> 02:05:07,430
We haven't done the guts of the algorithm yet-- sort the left half
2688
02:05:07,430 --> 02:05:08,430
and sort the right half.
2689
02:05:08,430 --> 02:05:11,160
But I claim that that is how mechanically you
2690
02:05:11,160 --> 02:05:12,960
merge two sorted halves.
2691
02:05:12,960 --> 02:05:15,060
You keep looking at the beginning of each list,
2692
02:05:15,060 --> 02:05:17,640
and you just kind of weave them together based
2693
02:05:17,640 --> 02:05:21,070
on which one belongs first based on its size.
2694
02:05:21,070 --> 02:05:23,460
So if you agree that that was a reasonable way
2695
02:05:23,460 --> 02:05:28,020
to merge two lists together, let's go ahead and focus lastly
2696
02:05:28,020 --> 02:05:31,260
on what it means to actually sort the left half
2697
02:05:31,260 --> 02:05:33,437
and sort the right half of a whole bunch of numbers.
2698
02:05:33,437 --> 02:05:35,520
And for this, I'm going to go ahead and order them
2699
02:05:35,520 --> 02:05:37,320
in this seemingly random order.
2700
02:05:37,320 --> 02:05:40,200
And I just have a little cheat sheet above so that I don't mess up.
2701
02:05:40,200 --> 02:05:42,330
And I'm going to start at the very top this time.
2702
02:05:42,330 --> 02:05:45,390
And hopefully, these will not fall down at any point.
2703
02:05:45,390 --> 02:05:51,300
But I'm just deliberately putting them in this random order, 5274.
2704
02:05:51,300 --> 02:05:53,703
And then we have 1630--
2705
02:05:53,703 --> 02:05:57,000
1630.
2706
02:05:57,000 --> 02:05:59,340
Hopefully this won't fall over.
2707
02:05:59,340 --> 02:06:03,990
Here is now an array of size 8 with eight integers.
2708
02:06:03,990 --> 02:06:05,290
And I want to sort this.
2709
02:06:05,290 --> 02:06:08,332
I could use selection sort and just go back and forth and back and forth.
2710
02:06:08,332 --> 02:06:11,100
I could use bubble sort and just compare pairs, pairs, pairs.
2711
02:06:11,100 --> 02:06:13,830
But those are going to be on the order of big O of n squared.
2712
02:06:13,830 --> 02:06:16,000
My hope is to do fundamentally better here.
2713
02:06:16,000 --> 02:06:17,760
So let's see if we can do better.
2714
02:06:17,760 --> 02:06:19,800
All right, so let me look now at my code.
2715
02:06:19,800 --> 02:06:21,490
I'll keep it on the screen.
2716
02:06:21,490 --> 02:06:23,310
How do I implement merge sort?
2717
02:06:23,310 --> 02:06:25,303
Well, if there's only one number, I quit.
2718
02:06:25,303 --> 02:06:26,220
There's obviously not.
2719
02:06:26,220 --> 02:06:28,270
There's eight numbers, so that's not applicable.
2720
02:06:28,270 --> 02:06:30,603
I'm going to go ahead and sort the left half of numbers.
2721
02:06:30,603 --> 02:06:32,460
All right, here's the left half--
2722
02:06:32,460 --> 02:06:34,470
5274.
2723
02:06:34,470 --> 02:06:37,230
Do I sort an array of size 4?
2724
02:06:37,230 --> 02:06:40,170
Well, here's where the recursion kicks in.
2725
02:06:40,170 --> 02:06:42,150
How do you sort a list of size 4?
2726
02:06:42,150 --> 02:06:44,310
Well, there's the pseudo code on the board.
2727
02:06:44,310 --> 02:06:47,490
I sort the left half of the list of size 4.
2728
02:06:47,490 --> 02:06:49,200
So here we go.
2729
02:06:49,200 --> 02:06:50,550
I have a list of size 4.
2730
02:06:50,550 --> 02:06:51,390
How do I sort it?
2731
02:06:51,390 --> 02:06:52,680
I sort the left half.
2732
02:06:52,680 --> 02:06:54,480
All right, now I have a list of size 2.
2733
02:06:54,480 --> 02:06:56,680
How do I sort this?
2734
02:06:56,680 --> 02:06:58,600
Well, sort the left half.
2735
02:06:58,600 --> 02:06:59,520
So here we go.
2736
02:06:59,520 --> 02:07:01,020
Here's a list of size 1.
2737
02:07:01,020 --> 02:07:03,996
How do I sort this?
2738
02:07:03,996 --> 02:07:05,640
I think it's done, right?
2739
02:07:05,640 --> 02:07:06,480
That's quit, right?
2740
02:07:06,480 --> 02:07:08,010
If only one number, I'm done.
2741
02:07:08,010 --> 02:07:09,233
The 5 is sorted.
2742
02:07:09,233 --> 02:07:10,650
All right, what was the next step?
2743
02:07:10,650 --> 02:07:12,180
You have to now rewind in time.
2744
02:07:12,180 --> 02:07:16,510
I just sorted the left half of the left half of the left half.
2745
02:07:16,510 --> 02:07:17,970
What do I now sort?
2746
02:07:17,970 --> 02:07:20,850
The right half, which is 2.
2747
02:07:20,850 --> 02:07:22,030
This is one element.
2748
02:07:22,030 --> 02:07:22,740
So I'm done.
2749
02:07:22,740 --> 02:07:27,060
So now at this point in the story, I have sorted, sort of idiotically--
2750
02:07:27,060 --> 02:07:29,850
the 5 assorted, and the 2 is sorted.
2751
02:07:29,850 --> 02:07:34,410
But what's the third and final step of this phase of the algorithm?
2752
02:07:34,410 --> 02:07:35,650
Merge the two together.
2753
02:07:35,650 --> 02:07:37,600
So here's the left, here's the right list.
2754
02:07:37,600 --> 02:07:38,850
How do I merge these together?
2755
02:07:38,850 --> 02:07:41,530
I compare the lists, and I put the two there.
2756
02:07:41,530 --> 02:07:43,590
I only have the [? 5 ?] left, and I do that.
2757
02:07:43,590 --> 02:07:45,990
So now we see some visible progress.
2758
02:07:45,990 --> 02:07:47,190
But again, let's rewind.
2759
02:07:47,190 --> 02:07:48,220
How did we get here?
2760
02:07:48,220 --> 02:07:52,860
We started to sort the left half of the left half of the left half, then
2761
02:07:52,860 --> 02:07:53,700
the right half.
2762
02:07:53,700 --> 02:07:54,900
And now where are we?
2763
02:07:54,900 --> 02:07:58,090
We've just sorted the left half of the left half.
2764
02:07:58,090 --> 02:08:01,140
So what comes after sorting the left half of anything?
2765
02:08:01,140 --> 02:08:01,692
Right half.
2766
02:08:01,692 --> 02:08:03,900
All right, here's the sort of same nonsensical thing.
2767
02:08:03,900 --> 02:08:05,460
Here's a list of size 2.
2768
02:08:05,460 --> 02:08:06,690
Let's sort the left half.
2769
02:08:06,690 --> 02:08:07,347
Done.
2770
02:08:07,347 --> 02:08:08,430
Let's sort the right half.
2771
02:08:08,430 --> 02:08:09,120
Done.
2772
02:08:09,120 --> 02:08:10,170
What's the third step?
2773
02:08:10,170 --> 02:08:11,590
Merge them together.
2774
02:08:11,590 --> 02:08:14,880
So that's the 4, and that's the 7.
2775
02:08:14,880 --> 02:08:16,110
What have I now done?
2776
02:08:16,110 --> 02:08:21,070
In total, I've now sorted the left half of the original thing.
2777
02:08:21,070 --> 02:08:23,092
So what happens next?
2778
02:08:23,092 --> 02:08:24,300
Wait a minute, wait a minute.
2779
02:08:24,300 --> 02:08:25,560
I have not done that.
2780
02:08:25,560 --> 02:08:26,550
What have I done?
2781
02:08:26,550 --> 02:08:29,220
I have sorted the left half of the left half,
2782
02:08:29,220 --> 02:08:32,350
and I've sorted the right half of the left half.
2783
02:08:32,350 --> 02:08:34,710
What do I now need to do lastly?
2784
02:08:34,710 --> 02:08:36,640
Merge those two lists together.
2785
02:08:36,640 --> 02:08:38,562
So again, I put my finger on the beginning
2786
02:08:38,562 --> 02:08:40,270
of this list, the beginning of this list.
2787
02:08:40,270 --> 02:08:42,360
And if you want, I'll do the same thing when I merged last time
2788
02:08:42,360 --> 02:08:43,830
to be clear what I'm comparing.
2789
02:08:43,830 --> 02:08:46,440
2 and 4-- the 2 obviously comes first.
2790
02:08:46,440 --> 02:08:48,220
What comes next?
2791
02:08:48,220 --> 02:08:50,410
Well, the 4 comes next.
2792
02:08:50,410 --> 02:08:51,670
What comes next?
2793
02:08:51,670 --> 02:08:55,920
The 5 comes next and then lastly, of course, the 7.
2794
02:08:55,920 --> 02:09:00,010
Notice that the 2457 are now sorted.
2795
02:09:00,010 --> 02:09:02,802
So the original left half is sorted.
2796
02:09:02,802 --> 02:09:05,010
And I'll do the rest a little faster because, my God,
2797
02:09:05,010 --> 02:09:06,385
this feels like it takes forever.
2798
02:09:06,385 --> 02:09:08,430
But I bet we're on to something here.
2799
02:09:08,430 --> 02:09:10,320
What step remains next?
2800
02:09:10,320 --> 02:09:12,570
I've just sorted the left half of the original.
2801
02:09:12,570 --> 02:09:14,280
Sort the right half of the original.
2802
02:09:14,280 --> 02:09:15,300
How do I sort this?
2803
02:09:15,300 --> 02:09:17,880
I sort the left half of the right half.
2804
02:09:17,880 --> 02:09:19,050
How do I sort this?
2805
02:09:19,050 --> 02:09:21,360
I sort the left half of the left half.
2806
02:09:21,360 --> 02:09:22,320
Done.
2807
02:09:22,320 --> 02:09:24,360
I sort the right half of the left half.
2808
02:09:24,360 --> 02:09:25,260
Done.
2809
02:09:25,260 --> 02:09:27,390
Now I merge the two together.
2810
02:09:27,390 --> 02:09:30,190
The 1 comes first, the 6 comes next.
2811
02:09:30,190 --> 02:09:34,380
Now I sort the right half of the right half.
2812
02:09:34,380 --> 02:09:35,190
What do I do?
2813
02:09:35,190 --> 02:09:36,600
Sort the left half.
2814
02:09:36,600 --> 02:09:37,200
Done.
2815
02:09:37,200 --> 02:09:38,200
Sort the right half.
2816
02:09:38,200 --> 02:09:38,700
Done.
2817
02:09:38,700 --> 02:09:39,420
What do I do?
2818
02:09:39,420 --> 02:09:41,230
Merge them together.
2819
02:09:41,230 --> 02:09:43,080
So that's the third step of that phase.
2820
02:09:43,080 --> 02:09:47,760
Now where are we in the stor-- oh my God, where are we in the story?
2821
02:09:47,760 --> 02:09:52,230
We have sorted the left half of the right half
2822
02:09:52,230 --> 02:09:54,180
and the right half of the right half.
2823
02:09:54,180 --> 02:09:55,760
What comes next?
2824
02:09:55,760 --> 02:09:56,263
Merge.
2825
02:09:56,263 --> 02:09:58,430
So I'm going to compare, and I'm going to move those
2826
02:09:58,430 --> 02:10:00,590
down just to make clear what I'm comparing,
2827
02:10:00,590 --> 02:10:02,300
the beginning of both sublists.
2828
02:10:02,300 --> 02:10:03,110
What comes first?
2829
02:10:03,110 --> 02:10:04,610
Of course, the 0.
2830
02:10:04,610 --> 02:10:07,110
What comes next?
2831
02:10:07,110 --> 02:10:08,550
What comes next?
2832
02:10:08,550 --> 02:10:10,340
The 1.
2833
02:10:10,340 --> 02:10:11,730
What comes next?
2834
02:10:11,730 --> 02:10:13,010
The 3.
2835
02:10:13,010 --> 02:10:14,930
And then lastly comes the 6.
2836
02:10:14,930 --> 02:10:16,880
All right, where are we in the story?
2837
02:10:16,880 --> 02:10:19,220
We've now sorted the left half of the original
2838
02:10:19,220 --> 02:10:20,780
and the right half of the original.
2839
02:10:20,780 --> 02:10:22,430
What step remains?
2840
02:10:22,430 --> 02:10:23,030
Merge.
2841
02:10:23,030 --> 02:10:25,070
All right, so I'm going to make the same point.
2842
02:10:25,070 --> 02:10:27,690
And this is actually literally what we did earlier
2843
02:10:27,690 --> 02:10:31,760
because I deliberately demoed those original numbers in this order, 2
2844
02:10:31,760 --> 02:10:32,630
and a 0.
2845
02:10:32,630 --> 02:10:34,010
This comes out first.
2846
02:10:34,010 --> 02:10:35,300
What comes next?
2847
02:10:35,300 --> 02:10:36,230
2 and 1.
2848
02:10:36,230 --> 02:10:37,890
The 1 comes out next.
2849
02:10:37,890 --> 02:10:39,450
What comes next?
2850
02:10:39,450 --> 02:10:40,920
The 2 comes next.
2851
02:10:40,920 --> 02:10:42,030
What comes next?
2852
02:10:42,030 --> 02:10:43,410
The 3 comes next.
2853
02:10:43,410 --> 02:10:44,460
What comes next?
2854
02:10:44,460 --> 02:10:47,150
The 4.
2855
02:10:47,150 --> 02:10:48,590
What comes after that?
2856
02:10:48,590 --> 02:10:50,720
The 5.
2857
02:10:50,720 --> 02:10:51,920
What comes after that?
2858
02:10:51,920 --> 02:10:52,700
The 6.
2859
02:10:52,700 --> 02:10:56,760
And lastly-- this is when we run out of memory--
2860
02:10:56,760 --> 02:11:01,010
the 7 over there is actually in place.
2861
02:11:01,010 --> 02:11:03,390
OK.
2862
02:11:03,390 --> 02:11:05,867
OK, so admittedly, a little harder to explain,
2863
02:11:05,867 --> 02:11:07,950
and honestly, it gets a little trippy because it's
2864
02:11:07,950 --> 02:11:10,980
so easy to forget about where you are in the story
2865
02:11:10,980 --> 02:11:13,680
because we're constantly diving into the algorithm
2866
02:11:13,680 --> 02:11:15,250
and then backing back out of it.
2867
02:11:15,250 --> 02:11:17,970
But in code, we could express this pretty correctly
2868
02:11:17,970 --> 02:11:21,000
and, it turns out, pretty efficiently because what
2869
02:11:21,000 --> 02:11:24,660
I was doing, even though it's longer when I do it verbally,
2870
02:11:24,660 --> 02:11:28,260
I was touching these elements a minimal amount of times, right?
2871
02:11:28,260 --> 02:11:31,530
I wasn't going back and forth, back and forth in front of the shelf
2872
02:11:31,530 --> 02:11:32,490
again and again.
2873
02:11:32,490 --> 02:11:37,240
I was deliberately only ever merging the smallest elements in each list.
2874
02:11:37,240 --> 02:11:40,200
So every time we merge, even though I was doing it quickly,
2875
02:11:40,200 --> 02:11:44,370
my fingers were only touching each of the elements once.
2876
02:11:44,370 --> 02:11:50,152
And how many times did we divide, divide, divide in half the list?
2877
02:11:50,152 --> 02:11:52,110
Well, we started with all of the elements here,
2878
02:11:52,110 --> 02:11:53,318
and there were eight of them.
2879
02:11:53,318 --> 02:11:56,860
And then we moved them 1, 2, 3 positions.
2880
02:11:56,860 --> 02:12:03,420
So the height of this visualization, if you will, is actually log n, right?
2881
02:12:03,420 --> 02:12:06,280
If I started with 8, turns out if you do the arithmetic,
2882
02:12:06,280 --> 02:12:10,380
this is log n height because 2 to the 3 is 8.
2883
02:12:10,380 --> 02:12:12,750
But for now, just trust that this is a log n height.
2884
02:12:12,750 --> 02:12:14,280
And how wide is the shelf?
2885
02:12:14,280 --> 02:12:18,270
Well, it's of width n because there's n elements any time
2886
02:12:18,270 --> 02:12:19,470
they were on the shelf.
2887
02:12:19,470 --> 02:12:23,737
So technically, I was kind of cheating this algorithm because this
2888
02:12:23,737 --> 02:12:25,320
is the first time I've needed shelves.
2889
02:12:25,320 --> 02:12:29,010
With the human examples, we just had the humans, and that's it, and only eight
2890
02:12:29,010 --> 02:12:29,520
of them.
2891
02:12:29,520 --> 02:12:32,200
Here, I was sort of using more and more memory.
2892
02:12:32,200 --> 02:12:34,770
In fact, I was using like four times as much memory
2893
02:12:34,770 --> 02:12:36,930
even though that was just for visualization's sake.
2894
02:12:36,930 --> 02:12:41,040
Merge sort actually requires that you have some spare space, an empty array
2895
02:12:41,040 --> 02:12:44,045
to move the elements into when you're merging them together.
2896
02:12:44,045 --> 02:12:46,920
But if I really wanted and if I didn't have this shelf or this shelf,
2897
02:12:46,920 --> 02:12:49,590
honestly, I could have just gone back and forth between the two shelves.
2898
02:12:49,590 --> 02:12:51,250
That would have been sufficient.
2899
02:12:51,250 --> 02:12:56,190
So merge sort uses more memory for this merging process,
2900
02:12:56,190 --> 02:12:59,100
but the advantage of using more memory is
2901
02:12:59,100 --> 02:13:04,680
that the total running time, if you can perhaps infer from that math, is what?
2902
02:13:04,680 --> 02:13:07,560
The big O notation for merge sort, it turns out,
2903
02:13:07,560 --> 02:13:10,530
is actually going to be n times log n.
2904
02:13:10,530 --> 02:13:13,050
And even if you're a little rusty still on your logarithms,
2905
02:13:13,050 --> 02:13:18,390
we saw in week zero and again today that log n is smaller than n.
2906
02:13:18,390 --> 02:13:19,650
That's a good thing.
2907
02:13:19,650 --> 02:13:20,880
Binary search was log n.
2908
02:13:20,880 --> 02:13:23,250
That's faster than linear search, which was n.
2909
02:13:23,250 --> 02:13:28,830
So n times log n is, of course, smaller than n times n or n squared.
2910
02:13:28,830 --> 02:13:31,830
So it's sort of lower on this little cheat sheet that I've been drawing,
2911
02:13:31,830 --> 02:13:35,350
which is to suggest that it's running time is indeed better or faster.
2912
02:13:35,350 --> 02:13:38,340
And in fact, if we consider the best case running time,
2913
02:13:38,340 --> 02:13:42,645
turns out it's not quite as good as bubble sort with omega of n,
2914
02:13:42,645 --> 02:13:45,270
where you can just sort of abort if you realize, wait a minute,
2915
02:13:45,270 --> 02:13:46,320
I've done no work.
2916
02:13:46,320 --> 02:13:50,850
Merge sort, you actually have to do that work to get to the finish line anyway.
2917
02:13:50,850 --> 02:13:56,170
So it's actually in omega and ultimately theta of n log n as well.
2918
02:13:56,170 --> 02:13:58,230
So again, a trade off there because if you
2919
02:13:58,230 --> 02:14:00,540
happen to have a data set that is very often sorted,
2920
02:14:00,540 --> 02:14:02,665
honestly, you might want to stick with bubble sort.
2921
02:14:02,665 --> 02:14:05,490
But in the general case, where the data is unsorted,
2922
02:14:05,490 --> 02:14:08,820
n log n as sounding better than n squared.
2923
02:14:08,820 --> 02:14:10,830
Well, what does it actually look or feel like?
2924
02:14:10,830 --> 02:14:14,230
Give me a moment to just change over to our visualization here.
2925
02:14:14,230 --> 02:14:18,210
And we'll see with this example what merge sort looks like
2926
02:14:18,210 --> 02:14:20,170
depicted with now these vertical bars.
2927
02:14:20,170 --> 02:14:22,860
So same algorithm, but instead of my numbers on shelves,
2928
02:14:22,860 --> 02:14:27,970
here is a random array of numbers being sorted.
2929
02:14:27,970 --> 02:14:30,120
And you can see it being done half at a time.
2930
02:14:30,120 --> 02:14:33,720
And you see sort of remnants of the previous bars.
2931
02:14:33,720 --> 02:14:35,520
Actually, that was unfair.
2932
02:14:35,520 --> 02:14:37,500
Let me zoom out here.
2933
02:14:37,500 --> 02:14:42,030
Let me zoom out so you can actually see the height here.
2934
02:14:42,030 --> 02:14:44,730
Let me go ahead and randomize this again and run merge sort.
2935
02:14:44,730 --> 02:14:45,310
There we go.
2936
02:14:45,310 --> 02:14:50,100
Now you can see the second array and where the values are going temporarily.
2937
02:14:50,100 --> 02:14:54,130
And even though this one looks way more cryptic visualization-wise,
2938
02:14:54,130 --> 02:14:56,170
it does seem to be moving faster.
2939
02:14:56,170 --> 02:14:59,670
And it seems to be merging halves together, and boom, it's done.
2940
02:14:59,670 --> 02:15:03,990
So let's actually see, in conclusion, what these algorithms compare to
2941
02:15:03,990 --> 02:15:07,080
and consider that moving forward as we write more and more code,
2942
02:15:07,080 --> 02:15:10,350
the goal is, again, not just to be correct but to be well-designed.
2943
02:15:10,350 --> 02:15:13,820
And one measure of design is going to indeed be efficiency.
2944
02:15:13,820 --> 02:15:18,120
So here we have, in final, a visualization of three algorithms--
2945
02:15:18,120 --> 02:15:21,000
selection sort, bubble sort, and merge sort--
2946
02:15:21,000 --> 02:15:22,450
from top to bottom.
2947
02:15:22,450 --> 02:15:25,620
And let's see what these algorithms might look or sound like here.
2948
02:15:25,620 --> 02:15:27,855
Oh, if we can dim the lights for dramatic effect--
2949
02:15:27,855 --> 02:15:32,160
2950
02:15:32,160 --> 02:15:35,970
selection's on top, bubble on bottom, merge in the middle.
2951
02:15:35,970 --> 02:16:33,320
[MUSIC PLAYING]
2952
02:16:33,320 --> 02:17:11,000
[MUSIC PLAYING]
242218
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.