All language subtitles for lecture3(1337)-720p-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian Download
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:00,000 --> 00:01:17,345 [MUSIC PLAYING] 1 00:01:17,345 --> 00:01:22,170 DAVID J. MALAN: This is CS50, and this is already week three. 2 00:01:22,170 --> 00:01:25,695 And even as we've gotten much more into the minutia of programming 3 00:01:25,695 --> 00:01:27,570 and some of the C stuff that we've been doing 4 00:01:27,570 --> 00:01:30,840 is all the more cryptic looking, recall that at the end of the day, 5 00:01:30,840 --> 00:01:34,647 like, everything we've been doing ultimately fits into to this model. 6 00:01:34,647 --> 00:01:36,480 So keep that in mind, particularly as things 7 00:01:36,480 --> 00:01:39,230 seem like they're getting more complicated and more sophisticated. 8 00:01:39,230 --> 00:01:41,850 It's just a process of learning a new language that ultimately 9 00:01:41,850 --> 00:01:44,430 lets us express this process. 10 00:01:44,430 --> 00:01:47,490 And of course, last week we really went into the weeds of like how 11 00:01:47,490 --> 00:01:49,240 inputs and outputs are represented. 12 00:01:49,240 --> 00:01:53,580 And this thing here, a photograph thereof, is called what? 13 00:01:53,580 --> 00:01:54,742 This is what? 14 00:01:54,742 --> 00:01:55,628 AUDIENCE: RAM. 15 00:01:55,628 --> 00:01:56,670 DAVID J. MALAN: RAM, I heard-- 16 00:01:56,670 --> 00:01:59,670 Random Access Memory or just generally known as memory. 17 00:01:59,670 --> 00:02:02,280 And recall that we looked at one of these little black chips 18 00:02:02,280 --> 00:02:04,500 that contains all of the bytes-- 19 00:02:04,500 --> 00:02:05,790 all of the bits, ultimately. 20 00:02:05,790 --> 00:02:08,400 It's just kind of a grid, sort of an artist grid, that 21 00:02:08,400 --> 00:02:11,520 allows us to think about every one of these memory locations 22 00:02:11,520 --> 00:02:14,040 as just having a number or an address, so to speak. 23 00:02:14,040 --> 00:02:16,770 Like, this might be byte number 0 and then 1 and then 2 24 00:02:16,770 --> 00:02:19,680 and then, maybe way down here again, something like 2 billion 25 00:02:19,680 --> 00:02:22,080 if you have 2 gigabytes of memory. 26 00:02:22,080 --> 00:02:26,070 And so as we did that, we started to explore how we could use this canvas 27 00:02:26,070 --> 00:02:29,760 to create kind of our own information, our own inputs and outputs, not 28 00:02:29,760 --> 00:02:32,380 just the basics like ints and floats and so forth. 29 00:02:32,380 --> 00:02:34,470 But we also talked about strings. 30 00:02:34,470 --> 00:02:37,080 And what is a string as you now know it? 31 00:02:37,080 --> 00:02:40,200 How would you describe in layperson's terms a string? 32 00:02:40,200 --> 00:02:41,010 Yeah, over there. 33 00:02:41,010 --> 00:02:41,990 AUDIENCE: I was gonna say-- 34 00:02:41,990 --> 00:02:42,480 [AUDIO OUT] 35 00:02:42,480 --> 00:02:43,897 DAVID J. MALAN: An array of characters. 36 00:02:43,897 --> 00:02:45,690 And an array, meanwhile-- let's go there. 37 00:02:45,690 --> 00:02:50,580 How might someone else define an array in more familiar now terms? 38 00:02:50,580 --> 00:02:53,160 What would be an array? 39 00:02:53,160 --> 00:02:53,670 Yeah. 40 00:02:53,670 --> 00:02:57,070 AUDIENCE: Kind of like an indexed set of things. 41 00:02:57,070 --> 00:02:59,060 DAVID J. MALAN: An indexed set of things-- not bad. 42 00:02:59,060 --> 00:03:02,210 And I think a key characteristic to keep in mind with an array is that it 43 00:03:02,210 --> 00:03:03,890 does actually pertain to memory. 44 00:03:03,890 --> 00:03:05,810 And it's contiguous memory. 45 00:03:05,810 --> 00:03:09,200 Byte after byte after byte is what constitutes an array. 46 00:03:09,200 --> 00:03:11,810 And we'll see in a couple of weeks time that there's actually 47 00:03:11,810 --> 00:03:15,770 more interesting ways to use this same primitive Canvas to stitch together 48 00:03:15,770 --> 00:03:19,240 things that are sort of two directional even that have some kind of shape 49 00:03:19,240 --> 00:03:19,740 to them. 50 00:03:19,740 --> 00:03:22,970 But for now, all we've talked about is arrays and just using these things 51 00:03:22,970 --> 00:03:27,240 from left to right, top to bottom, contiguously to represent information. 52 00:03:27,240 --> 00:03:29,900 So today, we'll consider still an array. 53 00:03:29,900 --> 00:03:33,020 But we won't focus so much on representation 54 00:03:33,020 --> 00:03:34,460 of strings or other data types. 55 00:03:34,460 --> 00:03:36,918 We'll actually now focus on the other part of that process, 56 00:03:36,918 --> 00:03:39,950 of inputs becoming outputs, namely the thing in the middle-- 57 00:03:39,950 --> 00:03:40,940 algorithms. 58 00:03:40,940 --> 00:03:45,200 But we have to keep in mind, even though every time we've looked at an array 59 00:03:45,200 --> 00:03:48,610 thus far, certainly on the board like this, you as a human certainly 60 00:03:48,610 --> 00:03:50,360 have the luxury of just kind of eyeballing 61 00:03:50,360 --> 00:03:53,820 the whole thing with a bird's eye view and seeing where all of those numbers 62 00:03:53,820 --> 00:03:54,320 are. 63 00:03:54,320 --> 00:03:56,153 If I asked you where a particular number is, 64 00:03:56,153 --> 00:03:59,150 like zero, odds are your eyes would go right to where it is, 65 00:03:59,150 --> 00:04:02,150 and boom, problem solved in sort of one step. 66 00:04:02,150 --> 00:04:07,260 But the catch is, with a computer that has this memory, even though you, 67 00:04:07,260 --> 00:04:10,730 the human, can [INAUDIBLE] see everything at once, a computer cannot. 68 00:04:10,730 --> 00:04:14,090 It's better to think of your computer's memory, your phone's memory, 69 00:04:14,090 --> 00:04:17,000 or more specifically an array of memory like this 70 00:04:17,000 --> 00:04:21,500 as really being a set of closed doors, not unlike lockers in a school. 71 00:04:21,500 --> 00:04:24,260 And only by opening each of those doors can 72 00:04:24,260 --> 00:04:26,090 the computer actually see what's in there, 73 00:04:26,090 --> 00:04:28,610 which is to say that the computer, unlike you, doesn't 74 00:04:28,610 --> 00:04:32,480 have this bird's eye view of all of the data in all these locations. 75 00:04:32,480 --> 00:04:35,400 It has to much more methodically look here, 76 00:04:35,400 --> 00:04:39,590 maybe look here, maybe look here, and so forth in order to find something. 77 00:04:39,590 --> 00:04:43,670 Now fortunately, we already have some building blocks-- loops, conditions, 78 00:04:43,670 --> 00:04:45,210 Boolean expressions, and the like-- 79 00:04:45,210 --> 00:04:47,120 where you could imagine writing some code 80 00:04:47,120 --> 00:04:51,680 that very methodically goes from left to right or right to left or something 81 00:04:51,680 --> 00:04:55,490 more sophisticated that actually finds something you're looking for. 82 00:04:55,490 --> 00:04:58,850 And just remember that the conventions we've had since last week 83 00:04:58,850 --> 00:05:03,320 now is that these arrays are zero indexed, so to speak. 84 00:05:03,320 --> 00:05:08,150 To be zero indexed just means that the data type starts counting from zero. 85 00:05:08,150 --> 00:05:11,930 So this is location 0, 1, 2, 3, 4, 5, 6. 86 00:05:11,930 --> 00:05:15,320 And notice even though there are seven total doors here, 87 00:05:15,320 --> 00:05:17,930 the right-most one, of course, is called 6 88 00:05:17,930 --> 00:05:19,940 just because we've started counting at 0. 89 00:05:19,940 --> 00:05:24,680 So in the general case, if you had n doors or n bytes of memory, 90 00:05:24,680 --> 00:05:29,480 0 would always be at the left, and n minus 1 would always be at the right. 91 00:05:29,480 --> 00:05:33,860 That's sort of a generalization of just thinking about this kind of convention. 92 00:05:33,860 --> 00:05:37,790 All right, so let's revisit the problem that we started the whole term off 93 00:05:37,790 --> 00:05:40,523 with in week zero, which was this notion of searching. 94 00:05:40,523 --> 00:05:42,440 And what does it mean to search for something? 95 00:05:42,440 --> 00:05:45,218 Well, to find information-- and this, of course, is omnipresent. 96 00:05:45,218 --> 00:05:48,260 Anytime you take out your phone, you're searching for a friend's contact. 97 00:05:48,260 --> 00:05:51,140 Any time you pull up a browser, you're googling for this or that. 98 00:05:51,140 --> 00:05:55,310 So search is kind of one of the most omnipresent topics and features 99 00:05:55,310 --> 00:05:56,877 of any device these days. 100 00:05:56,877 --> 00:05:59,960 So let's consider how the Googles, the Apples, the Microsofts of the world 101 00:05:59,960 --> 00:06:03,660 are implementing something as seemingly familiar as this. 102 00:06:03,660 --> 00:06:06,020 So here might be the problem statement. 103 00:06:06,020 --> 00:06:08,450 We want some input to become some output. 104 00:06:08,450 --> 00:06:09,800 What's that input going to be? 105 00:06:09,800 --> 00:06:13,520 Maybe it's a bunch of closed doors like this out of which we want 106 00:06:13,520 --> 00:06:16,250 to get back an answer, true or false. 107 00:06:16,250 --> 00:06:18,785 Is something we're looking for there or not? 108 00:06:18,785 --> 00:06:21,410 You can imagine taking this one step further and trying to find 109 00:06:21,410 --> 00:06:23,520 where is the thing you're looking for. 110 00:06:23,520 --> 00:06:25,940 But for now, let's just take one bite out of the problem. 111 00:06:25,940 --> 00:06:30,560 Can we tell ourselves, true or false, is some number behind one 112 00:06:30,560 --> 00:06:33,390 of these doors or lockers in memory? 113 00:06:33,390 --> 00:06:37,760 But before we go there and start talking about ways to do that-- that is, 114 00:06:37,760 --> 00:06:38,580 algorithms. 115 00:06:38,580 --> 00:06:42,680 Let's consider how we might lay the foundation of, like, 116 00:06:42,680 --> 00:06:46,108 comparing whether one algorithm is better than another. 117 00:06:46,108 --> 00:06:47,900 We talked about correctness, and it sort of 118 00:06:47,900 --> 00:06:51,440 goes without saying that any code you write, any algorithm you implement, 119 00:06:51,440 --> 00:06:52,460 had better be correct. 120 00:06:52,460 --> 00:06:55,340 Otherwise, what's the point if it doesn't give you the right answers? 121 00:06:55,340 --> 00:06:57,260 But we also talked about design. 122 00:06:57,260 --> 00:07:01,280 And in your own words, what do we mean when we say a program is better 123 00:07:01,280 --> 00:07:04,470 designed at this stage than another? 124 00:07:04,470 --> 00:07:07,880 How do you think about this notion of design now? 125 00:07:07,880 --> 00:07:09,050 Yeah, in the middle? 126 00:07:09,050 --> 00:07:11,300 AUDIENCE: Easier to understand or easier to institute. 127 00:07:11,300 --> 00:07:12,990 DAVID J. MALAN: OK, so easier to understand. 128 00:07:12,990 --> 00:07:13,825 I like that. 129 00:07:13,825 --> 00:07:14,450 Other thoughts? 130 00:07:14,450 --> 00:07:14,950 Yeah. 131 00:07:14,950 --> 00:07:15,980 AUDIENCE: Efficiency. 132 00:07:15,980 --> 00:07:18,813 DAVID J. MALAN: Efficiency, and what do you mean by efficiency precisely? 133 00:07:18,813 --> 00:07:22,163 AUDIENCE: [INAUDIBLE] 134 00:07:22,163 --> 00:07:22,830 DAVID J. MALAN: Nice. 135 00:07:22,830 --> 00:07:25,390 It doesn't use up too much memory, and it isn't redundant. 136 00:07:25,390 --> 00:07:27,120 So you can think about design along a few 137 00:07:27,120 --> 00:07:29,078 of these axes-- sort of the quality of the code 138 00:07:29,078 --> 00:07:31,270 but also the quality of the performance. 139 00:07:31,270 --> 00:07:35,818 And as our programs get bigger and more sophisticated and just longer, 140 00:07:35,818 --> 00:07:37,860 those kinds of things are really going to matter. 141 00:07:37,860 --> 00:07:39,860 And in the real world, if you start writing code 142 00:07:39,860 --> 00:07:42,630 not just by yourself but with someone else, getting the design 143 00:07:42,630 --> 00:07:46,140 right is just going to make it easier to collaborate and ultimately produce, 144 00:07:46,140 --> 00:07:48,670 write code, with just higher probability. 145 00:07:48,670 --> 00:07:52,290 So let's consider how we might focus on exactly the second characteristic, 146 00:07:52,290 --> 00:07:54,390 the efficiency, of an algorithm. 147 00:07:54,390 --> 00:07:58,230 And the way we might talk about the efficiency of algorithms, just how fast 148 00:07:58,230 --> 00:08:01,350 or how slow they are, is in terms of their running time. 149 00:08:01,350 --> 00:08:04,900 That is to say, when they're running, how much time do they take? 150 00:08:04,900 --> 00:08:08,010 And we might measure this in seconds or milliseconds or minutes 151 00:08:08,010 --> 00:08:10,350 or just some number of steps in the general case 152 00:08:10,350 --> 00:08:14,560 because presumably fewer steps, to your point, is better than more steps. 153 00:08:14,560 --> 00:08:16,410 So how might we think about running times? 154 00:08:16,410 --> 00:08:19,470 Well, there's one general notation we should define today. 155 00:08:19,470 --> 00:08:23,610 So computer scientists tend to describe the running time of an algorithm 156 00:08:23,610 --> 00:08:27,840 or a piece of code, for that matter, in terms of what's called big O notation. 157 00:08:27,840 --> 00:08:30,630 This is literally a capitalized O, a big O. 158 00:08:30,630 --> 00:08:34,110 And this generally means that the running time of some algorithm 159 00:08:34,110 --> 00:08:37,559 is on the order of such and such, where such and such, we'll see, 160 00:08:37,559 --> 00:08:40,169 is just going to be a very simple mathematical formula. 161 00:08:40,169 --> 00:08:42,450 It's kind of a way of waving your hands mathematically 162 00:08:42,450 --> 00:08:46,770 to convey the idea of just how fast or how slow some algorithm or code is 163 00:08:46,770 --> 00:08:48,600 without getting into the weeds of like, it 164 00:08:48,600 --> 00:08:52,330 took this many milliseconds or this many specific number of steps. 165 00:08:52,330 --> 00:08:56,280 So you might recall then from week zero, I even introduced this picture 166 00:08:56,280 --> 00:08:57,360 but without much context. 167 00:08:57,360 --> 00:09:00,660 At the time, we just use this to compare those phone book algorithms. 168 00:09:00,660 --> 00:09:03,600 Recall that this red straight line was the first algorithm, 169 00:09:03,600 --> 00:09:04,920 one page at a time. 170 00:09:04,920 --> 00:09:09,870 The yellow line that's still straight differed how if you recall? 171 00:09:09,870 --> 00:09:14,370 That line represented what alternative algorithm? 172 00:09:14,370 --> 00:09:15,370 Looking and back. 173 00:09:15,370 --> 00:09:16,620 What is that second algorithm? 174 00:09:16,620 --> 00:09:17,430 Yeah, over there. 175 00:09:17,430 --> 00:09:18,930 AUDIENCE: Like, two pages at a time. 176 00:09:18,930 --> 00:09:22,080 DAVID J. MALAN: Two pages at a time, which was almost correct so long as we 177 00:09:22,080 --> 00:09:25,340 potentially double back a page if maybe we go a little too far in the phone 178 00:09:25,340 --> 00:09:25,840 book. 179 00:09:25,840 --> 00:09:28,372 So it had a potential bug but arguably solvable. 180 00:09:28,372 --> 00:09:31,080 This last algorithm, though, was the so-called divide and conquer 181 00:09:31,080 --> 00:09:34,590 strategy where I sort of unnecessarily tore the phone book in half 182 00:09:34,590 --> 00:09:36,600 and then in half and then in half, which, 183 00:09:36,600 --> 00:09:40,890 as dramatic as that was unnecessarily, it actually took significantly bigger 184 00:09:40,890 --> 00:09:43,420 bites out of the problem-- like 500 pages 185 00:09:43,420 --> 00:09:49,360 the first time, another 250, another 125 versus just 1 or 2 bytes at a time. 186 00:09:49,360 --> 00:09:52,583 And so we described its running time as this picture 187 00:09:52,583 --> 00:09:55,500 there, though I didn't use that expression at the time, running times. 188 00:09:55,500 --> 00:09:58,230 But indeed, time to solve might be measured just 189 00:09:58,230 --> 00:10:00,150 abstractly in some unit of measure-- 190 00:10:00,150 --> 00:10:03,420 seconds, milliseconds, minutes, pages-- 191 00:10:03,420 --> 00:10:05,110 via this y-axis here. 192 00:10:05,110 --> 00:10:07,830 So let's now slap some numbers on this. 193 00:10:07,830 --> 00:10:11,250 If we had n pages in that phone book, n just representing 194 00:10:11,250 --> 00:10:13,620 a generic number, the first algorithm here 195 00:10:13,620 --> 00:10:15,450 we might describe as taking n steps. 196 00:10:15,450 --> 00:10:18,870 Second algorithm we might describe as taking n divided by 2 steps, 197 00:10:18,870 --> 00:10:21,510 maybe give or take one if we have to double back but generally 198 00:10:21,510 --> 00:10:22,482 n divided by 2. 199 00:10:22,482 --> 00:10:24,690 And then this thing, if you remember your logarithms, 200 00:10:24,690 --> 00:10:26,648 was sort of a fundamentally different formula-- 201 00:10:26,648 --> 00:10:30,100 log base 2 of n or just log of n for short. 202 00:10:30,100 --> 00:10:32,790 So this is of a fundamentally different formula. 203 00:10:32,790 --> 00:10:36,430 But what's noteworthy is that these first two algorithms, 204 00:10:36,430 --> 00:10:40,050 even though, yes, the second algorithm was hands down faster-- 205 00:10:40,050 --> 00:10:41,910 I mean, literally twice as fast-- 206 00:10:41,910 --> 00:10:46,560 when you start to zoom out and if I increase my y-axis and x-axis, 207 00:10:46,560 --> 00:10:52,560 these first two start to look awfully similar to one another. 208 00:10:52,560 --> 00:10:54,810 And if we keep zooming out and zooming out and zooming 209 00:10:54,810 --> 00:10:57,090 out as n gets really large-- 210 00:10:57,090 --> 00:10:59,220 that is, the x-axis gets really long-- 211 00:10:59,220 --> 00:11:03,250 these first two algorithms start to become essentially the same. 212 00:11:03,250 --> 00:11:06,390 And so this is where computer scientists use big O notation. 213 00:11:06,390 --> 00:11:10,420 Instead of saying specifically, this algorithm takes any steps. 214 00:11:10,420 --> 00:11:13,620 And this one n divided by 2, a computer scientist would say, 215 00:11:13,620 --> 00:11:17,010 eh, each of those algorithms takes on the order of n steps 216 00:11:17,010 --> 00:11:19,020 or on the order of n over 2. 217 00:11:19,020 --> 00:11:19,770 But you know what? 218 00:11:19,770 --> 00:11:23,490 On the order of n over 2 is pretty much the same 219 00:11:23,490 --> 00:11:29,620 when n gets really large as being equivalent to big O of n itself. 220 00:11:29,620 --> 00:11:34,480 So yes, in practice, it's obviously fewer steps to move twice as fast. 221 00:11:34,480 --> 00:11:37,950 But in the big picture, when n becomes a million, a billion, 222 00:11:37,950 --> 00:11:40,620 the numbers are already so darn big at that point 223 00:11:40,620 --> 00:11:43,680 that these are as, the shapes of these curves imply, 224 00:11:43,680 --> 00:11:45,930 pretty much functionally equivalent. 225 00:11:45,930 --> 00:11:49,080 But this one still looks better and better 226 00:11:49,080 --> 00:11:52,470 as n gets large because it's rising so much less quickly. 227 00:11:52,470 --> 00:11:54,660 And so here, a computer scientist would say 228 00:11:54,660 --> 00:11:59,318 that that third algorithm was on the order of-- that is, big O of-- log n. 229 00:11:59,318 --> 00:12:01,110 And they don't have to bother with the base 230 00:12:01,110 --> 00:12:04,980 because it's a smaller mathematical detail that is also just in some sense 231 00:12:04,980 --> 00:12:07,890 a constant, multiplicative factor. 232 00:12:07,890 --> 00:12:09,840 So in short, what are the takeaways here? 233 00:12:09,840 --> 00:12:12,270 This is just a new vocabulary that we'll start 234 00:12:12,270 --> 00:12:16,020 to use when we just want to describe the running time of an algorithm. 235 00:12:16,020 --> 00:12:18,600 To make this more real, if any of you have implemented 236 00:12:18,600 --> 00:12:24,290 a for loop at this point in any of your code and that for loop iterated n times 237 00:12:24,290 --> 00:12:27,590 where maybe in was the height of your pyramid or maybe n 238 00:12:27,590 --> 00:12:31,550 was something else that you wanted to do n times, you wrote code 239 00:12:31,550 --> 00:12:36,440 or you implemented an algorithm that operated in big O of n time, 240 00:12:36,440 --> 00:12:37,200 if you will. 241 00:12:37,200 --> 00:12:39,350 So this is just a way now to retroactively start 242 00:12:39,350 --> 00:12:43,490 describing with somewhat mathematical notation what we've 243 00:12:43,490 --> 00:12:45,780 been doing in practice for a while now. 244 00:12:45,780 --> 00:12:51,500 So here's a list of commonly seen running times in the real world. 245 00:12:51,500 --> 00:12:55,010 This is not a thorough list because you could come up 246 00:12:55,010 --> 00:12:57,710 with an infinite number of mathematical formulas, certainly. 247 00:12:57,710 --> 00:13:01,400 But the common ones we'll discuss and you will see in your own code 248 00:13:01,400 --> 00:13:04,070 probably reduce to this list here. 249 00:13:04,070 --> 00:13:06,320 And if you were to study more computer science theory, 250 00:13:06,320 --> 00:13:07,903 this list would get longer and longer. 251 00:13:07,903 --> 00:13:11,930 But for now, these are sort of the most familiar ones that we'll soon see. 252 00:13:11,930 --> 00:13:14,700 All right, two other pieces of vocabulary, if you will, 253 00:13:14,700 --> 00:13:16,160 before we start to use this stuff-- 254 00:13:16,160 --> 00:13:19,430 so this, a big omega, capital omega symbol, 255 00:13:19,430 --> 00:13:25,170 is used now to describe a lower bound on the running time of an algorithm. 256 00:13:25,170 --> 00:13:28,610 So to be clear, big O is on the order of-- that 257 00:13:28,610 --> 00:13:31,910 is, an upper bound-- on how many steps an algorithm might 258 00:13:31,910 --> 00:13:34,700 take, on the order of so many steps. 259 00:13:34,700 --> 00:13:37,580 If you want to talk, though, from the other perspective, well, 260 00:13:37,580 --> 00:13:39,710 how few steps my algorithm take? 261 00:13:39,710 --> 00:13:42,320 Maybe in the so-called best case, it'd be nice 262 00:13:42,320 --> 00:13:45,110 if we had a notation to just describe what a lower 263 00:13:45,110 --> 00:13:47,840 bound is because some algorithms might be super fast 264 00:13:47,840 --> 00:13:49,890 in these so-called best cases. 265 00:13:49,890 --> 00:13:53,660 So the symbology is almost the same, but we replace the big O 266 00:13:53,660 --> 00:13:54,780 with the big omega. 267 00:13:54,780 --> 00:13:58,580 So to be clear, big O describes an upper bound and omega 268 00:13:58,580 --> 00:14:00,110 describes a lower bound. 269 00:14:00,110 --> 00:14:02,540 And we'll see examples of this before long. 270 00:14:02,540 --> 00:14:08,420 And then lastly, last one here, big theta, is used by a computer scientist 271 00:14:08,420 --> 00:14:12,860 when you have a case where both the upper bound on an algorithm's 272 00:14:12,860 --> 00:14:15,890 running time is the same as the lower bound. 273 00:14:15,890 --> 00:14:19,580 You can then describe it in one breath as being in theta of such and such 274 00:14:19,580 --> 00:14:23,660 instead of saying it's in big O and in omega of something else. 275 00:14:23,660 --> 00:14:27,920 All right, so out of context, sort of just seemingly cryptic symbols, 276 00:14:27,920 --> 00:14:31,463 but all they refer to is upper bounds, lower bounds, or when 277 00:14:31,463 --> 00:14:32,880 they happen to be one in the same. 278 00:14:32,880 --> 00:14:36,440 And we'll now introduce over time examples of how we might actually 279 00:14:36,440 --> 00:14:38,990 apply these to concrete problems. 280 00:14:38,990 --> 00:14:43,220 But first, let me pause to see if there's any questions. 281 00:14:43,220 --> 00:14:46,120 Any questions here? 282 00:14:46,120 --> 00:14:48,430 Any questions? 283 00:14:48,430 --> 00:14:50,880 I see pointing somewhere. 284 00:14:50,880 --> 00:14:53,100 Where are you pointing to? 285 00:14:53,100 --> 00:14:54,640 Over here-- there we go. 286 00:14:54,640 --> 00:14:55,980 OK, sorry-- very bright. 287 00:14:55,980 --> 00:14:59,560 AUDIENCE: So, um, smaller-- 288 00:14:59,560 --> 00:15:01,520 DAVID J. MALAN: Smaller n functions move faster. 289 00:15:01,520 --> 00:15:06,260 So yes, if you have something like n, that takes only steps. 290 00:15:06,260 --> 00:15:09,265 If you have a formula like n squared, just by nature of the math, 291 00:15:09,265 --> 00:15:11,900 that take more steps and therefore be slower. 292 00:15:11,900 --> 00:15:13,810 So the larger the mathematical expression, 293 00:15:13,810 --> 00:15:18,280 the slower your algorithm is because the more time or more steps that it takes. 294 00:15:18,280 --> 00:15:20,800 295 00:15:20,800 --> 00:15:23,170 AUDIENCE: So you want your n function to be small? 296 00:15:23,170 --> 00:15:26,410 DAVID J. MALAN: You want your n function, so to speak, to be small, yes. 297 00:15:26,410 --> 00:15:28,220 And in fact, the Holy Grail, so to speak, 298 00:15:28,220 --> 00:15:31,690 would be this last one here either in big O notation or even theta, 299 00:15:31,690 --> 00:15:34,900 when an algorithm is on the order of a single step. 300 00:15:34,900 --> 00:15:39,370 That means it literally takes constant time, one step, or maybe 10 steps, 301 00:15:39,370 --> 00:15:42,490 100 steps, but a fixed, constant number of steps. 302 00:15:42,490 --> 00:15:46,120 That's the best because even as the phone book gets bigger, 303 00:15:46,120 --> 00:15:50,350 even as the data set you're searching gets larger and larger, 304 00:15:50,350 --> 00:15:53,500 if something only takes a finite number of steps constantly, 305 00:15:53,500 --> 00:15:57,640 then it doesn't matter how big the data set actually gets. 306 00:15:57,640 --> 00:16:01,840 Questions as well on these notations-- yep, thank you for the pointing. 307 00:16:01,840 --> 00:16:03,310 This is actually very helpful. 308 00:16:03,310 --> 00:16:05,170 I'm seeing pointing this way? 309 00:16:05,170 --> 00:16:08,227 AUDIENCE: [INAUDIBLE] 310 00:16:08,227 --> 00:16:10,560 DAVID J. MALAN: What is the input to each of these functions? 311 00:16:10,560 --> 00:16:13,710 It is an expression of how many steps an algorithm takes. 312 00:16:13,710 --> 00:16:16,080 So in fact, let me go ahead and make this 313 00:16:16,080 --> 00:16:19,220 more concrete with an actual example here if we could. 314 00:16:19,220 --> 00:16:22,350 So on stage here, we have seven lockers which represent, 315 00:16:22,350 --> 00:16:24,300 if you will, an array of memory. 316 00:16:24,300 --> 00:16:26,400 And this array of memory is maybe storing 317 00:16:26,400 --> 00:16:30,150 seven integers, seven integers that we might actually want to search for. 318 00:16:30,150 --> 00:16:33,090 And if we want to search for these values, how might 319 00:16:33,090 --> 00:16:34,048 we go about doing this? 320 00:16:34,048 --> 00:16:36,257 Well, for this, why don't we make things interesting? 321 00:16:36,257 --> 00:16:37,810 Would a volunteer like to come on up? 322 00:16:37,810 --> 00:16:40,920 Have to be masked and on the internet if you are comfortable. 323 00:16:40,920 --> 00:16:44,190 Both of-- oh, there's someone putting their friend's hand up and back? 324 00:16:44,190 --> 00:16:44,940 Yes, OK. 325 00:16:44,940 --> 00:16:45,660 Come on down. 326 00:16:45,660 --> 00:16:50,380 327 00:16:50,380 --> 00:16:53,080 And in just a moment, our brave volunteer 328 00:16:53,080 --> 00:16:57,070 is going to help me find a specific number in the data set 329 00:16:57,070 --> 00:16:58,520 that we have here on the screen. 330 00:16:58,520 --> 00:17:02,230 So come on down, and I'll get things ready for you in advance here. 331 00:17:02,230 --> 00:17:08,723 Come on down nice to meet. 332 00:17:08,723 --> 00:17:09,640 And what is your name? 333 00:17:09,640 --> 00:17:10,390 AUDIENCE: [? Nomira. ?] 334 00:17:10,390 --> 00:17:10,645 DAVID J. MALAN: Minera? 335 00:17:10,645 --> 00:17:11,530 AUDIENCE: [? Nomira. ?] 336 00:17:11,530 --> 00:17:13,113 DAVID J. MALAN: [? Nomira. ?] Nice to meet. 337 00:17:13,113 --> 00:17:13,700 Come on over. 338 00:17:13,700 --> 00:17:17,650 So here we have for Nomira seven lockers or an array of memory. 339 00:17:17,650 --> 00:17:19,540 And behind each of these doors is a number. 340 00:17:19,540 --> 00:17:22,569 And the goal, quite simply, is, given this array of memory 341 00:17:22,569 --> 00:17:27,680 as input, to return, true or false, is the number I care about actually there? 342 00:17:27,680 --> 00:17:30,250 So suppose I care about the number 0. 343 00:17:30,250 --> 00:17:33,730 What would be the simplest, most correct algorithm you could 344 00:17:33,730 --> 00:17:38,200 apply in order to find us the number 0? 345 00:17:38,200 --> 00:17:41,608 OK, try opening the first one. 346 00:17:41,608 --> 00:17:44,150 All right, and maybe just step aside so the audience can see. 347 00:17:44,150 --> 00:17:46,160 I think you have not found 0 yet. 348 00:17:46,160 --> 00:17:47,480 OK, so keep the door open. 349 00:17:47,480 --> 00:17:50,120 Let's move on to your next choice. 350 00:17:50,120 --> 00:17:51,080 Second door, sure. 351 00:17:51,080 --> 00:17:51,998 AUDIENCE: [INAUDIBLE] 352 00:17:51,998 --> 00:17:53,540 DAVID J. MALAN: Oh, go ahead, second door. 353 00:17:53,540 --> 00:17:54,415 Let's keep it simple. 354 00:17:54,415 --> 00:17:57,557 Let's just move from left to right, sort of searching our way. 355 00:17:57,557 --> 00:17:58,640 And what do you see there? 356 00:17:58,640 --> 00:17:59,770 Oh, 6, not 0. 357 00:17:59,770 --> 00:18:00,770 How about the next door? 358 00:18:00,770 --> 00:18:04,200 359 00:18:04,200 --> 00:18:07,230 All right, also not working out so well yet, but that's OK. 360 00:18:07,230 --> 00:18:11,610 If you want to go on to the next, we're still looking for 0. 361 00:18:11,610 --> 00:18:12,930 All right, I see a 2. 362 00:18:12,930 --> 00:18:14,502 All right, it's not so good yet. 363 00:18:14,502 --> 00:18:15,210 Let's keep going. 364 00:18:15,210 --> 00:18:16,860 Next door. 365 00:18:16,860 --> 00:18:18,540 2, 7-- no. 366 00:18:18,540 --> 00:18:20,100 OK, next door. 367 00:18:20,100 --> 00:18:25,870 No, that's a-- all right, very well done. 368 00:18:25,870 --> 00:18:26,370 Oh. 369 00:18:26,370 --> 00:18:29,030 370 00:18:29,030 --> 00:18:32,330 All right, so I kind of set you up for a fairly slow algorithm, 371 00:18:32,330 --> 00:18:34,550 but let me just ask you to describe what is it 372 00:18:34,550 --> 00:18:37,534 you did by following the steps I gave you. 373 00:18:37,534 --> 00:18:39,660 AUDIENCE: I just went one by one to each character. 374 00:18:39,660 --> 00:18:41,660 DAVID J. MALAN: You went one by one to each character 375 00:18:41,660 --> 00:18:43,380 if you want to talk into here. 376 00:18:43,380 --> 00:18:45,140 So you went one by one by each character. 377 00:18:45,140 --> 00:18:48,620 And would you say that algorithm left or right is correct? 378 00:18:48,620 --> 00:18:49,790 AUDIENCE: No. 379 00:18:49,790 --> 00:18:50,960 DAVID J. MALAN: No? 380 00:18:50,960 --> 00:18:53,000 AUDIENCE: Or, yes, in the scenario. 381 00:18:53,000 --> 00:18:54,500 DAVID J. MALAN: OK, yes in this scenario. 382 00:18:54,500 --> 00:18:55,520 Why are you hesitating? 383 00:18:55,520 --> 00:18:56,060 What's going through your mind? 384 00:18:56,060 --> 00:18:58,100 AUDIENCE: Because it's not the most efficient way to do it. 385 00:18:58,100 --> 00:18:58,980 DAVID J. MALAN: OK, good. 386 00:18:58,980 --> 00:19:01,610 So we see a contrast here between correctness and design. 387 00:19:01,610 --> 00:19:04,360 I mean, I do think it was correct because even though it was slow, 388 00:19:04,360 --> 00:19:05,960 you eventually found zero. 389 00:19:05,960 --> 00:19:08,490 But it took some number of steps. 390 00:19:08,490 --> 00:19:10,280 So in fact, this would be an algorithm. 391 00:19:10,280 --> 00:19:12,350 It has a name, called linear search. 392 00:19:12,350 --> 00:19:14,840 And, [? Nomira, ?] as you did, you kind of walked along 393 00:19:14,840 --> 00:19:16,735 a line going from left to right. 394 00:19:16,735 --> 00:19:17,360 Now let me ask. 395 00:19:17,360 --> 00:19:19,880 If you had gone from right to left, would the algorithm 396 00:19:19,880 --> 00:19:23,030 have been fundamentally better? 397 00:19:23,030 --> 00:19:23,780 AUDIENCE: Yes. 398 00:19:23,780 --> 00:19:24,885 DAVID J. MALAN: OK, and why? 399 00:19:24,885 --> 00:19:27,260 AUDIENCE: Because the zero is here in the first scenario. 400 00:19:27,260 --> 00:19:30,860 But if it was like, the zero is in the middle, it wouldn't have been. 401 00:19:30,860 --> 00:19:35,020 DAVID J. MALAN: Yeah, and so here is where the right way to do things 402 00:19:35,020 --> 00:19:36,270 becomes a little less obvious. 403 00:19:36,270 --> 00:19:38,707 You would absolutely have given yourself a better result 404 00:19:38,707 --> 00:19:40,790 if you would just happened to start from the right 405 00:19:40,790 --> 00:19:42,780 or if I had pointed you to start over there. 406 00:19:42,780 --> 00:19:45,907 But the catch is if I asked her to find another number, like the number 8, 407 00:19:45,907 --> 00:19:47,240 well, that would have backfired. 408 00:19:47,240 --> 00:19:48,948 And this time, it would have taken longer 409 00:19:48,948 --> 00:19:51,450 to find that number because it's way over here instead. 410 00:19:51,450 --> 00:19:55,970 And so in the general case, going left to right or, heck, right to left 411 00:19:55,970 --> 00:19:59,540 is probably as correct as you can get because if you know nothing 412 00:19:59,540 --> 00:20:03,517 about the order of these numbers-- and indeed, they seem to be fairly random. 413 00:20:03,517 --> 00:20:05,600 Some of them are smaller, some of them are bigger. 414 00:20:05,600 --> 00:20:07,308 There doesn't seem to be rhyme or reason. 415 00:20:07,308 --> 00:20:11,330 Linear search is about as good as you can do when you don't know anything 416 00:20:11,330 --> 00:20:13,053 a priori about the numbers. 417 00:20:13,053 --> 00:20:15,720 So I have a little thank you gift here, a little CS stress ball. 418 00:20:15,720 --> 00:20:18,680 Round of applause for our first volunteer. 419 00:20:18,680 --> 00:20:19,550 Thank you so much. 420 00:20:19,550 --> 00:20:23,670 421 00:20:23,670 --> 00:20:27,690 Let's try to formalize what I just described as linear search 422 00:20:27,690 --> 00:20:30,938 because indeed, no matter which end [? Nomira ?] had started on, 423 00:20:30,938 --> 00:20:32,730 I could have kind of changed up the problem 424 00:20:32,730 --> 00:20:35,070 to make sure that it appears to be running slow. 425 00:20:35,070 --> 00:20:36,090 But it is correct. 426 00:20:36,090 --> 00:20:39,270 If zero were among those doors, she absolutely would have found it 427 00:20:39,270 --> 00:20:40,260 and indeed did. 428 00:20:40,260 --> 00:20:45,870 So let's now try to translate what we did into what we might call again 429 00:20:45,870 --> 00:20:48,210 pseudo code as from week zero. 430 00:20:48,210 --> 00:20:50,730 So with pseudo code, we just need a terse English 431 00:20:50,730 --> 00:20:53,770 like, or any language, syntax to describe what we did. 432 00:20:53,770 --> 00:20:56,280 So here might be one formulation of what [? Nomira ?] did. 433 00:20:56,280 --> 00:21:00,690 For each door, from left to right, if the number is behind the door, 434 00:21:00,690 --> 00:21:02,280 return true. 435 00:21:02,280 --> 00:21:07,630 Else, at the very end of the program, you would return false by default. 436 00:21:07,630 --> 00:21:08,800 And now you got lucky. 437 00:21:08,800 --> 00:21:10,800 And by the seventh door, [? Nomira ?] had indeed 438 00:21:10,800 --> 00:21:14,430 returned true by saying, well, there is the zero. 439 00:21:14,430 --> 00:21:17,070 But let's consider if this pseudo code is now correct, 440 00:21:17,070 --> 00:21:18,150 an accurate translation. 441 00:21:18,150 --> 00:21:22,360 First of all, normally, when we've seen ifs, we might see an if else. 442 00:21:22,360 --> 00:21:26,100 And yet down here, return false is aligned with the for. 443 00:21:26,100 --> 00:21:29,700 Why did I not indent the return false, or put another way, 444 00:21:29,700 --> 00:21:36,780 why did I not do if number is behind door, return true, else return false? 445 00:21:36,780 --> 00:21:39,855 Why would that version of this code have been problematic? 446 00:21:39,855 --> 00:21:41,318 Way in back. 447 00:21:41,318 --> 00:21:50,320 AUDIENCE: [INAUDIBLE] 448 00:21:50,320 --> 00:21:52,690 DAVID J. MALAN: OK, I'm not sure it's because of redundancy. 449 00:21:52,690 --> 00:21:55,000 Let me go ahead and just make this explicit. 450 00:21:55,000 --> 00:21:58,510 If I had instead done else return false, I 451 00:21:58,510 --> 00:22:02,942 don't think it's so much redundancy that I'd be worried about. 452 00:22:02,942 --> 00:22:04,150 Let me bounce somewhere else. 453 00:22:04,150 --> 00:22:04,810 Yeah, in front? 454 00:22:04,810 --> 00:22:08,350 AUDIENCE: Um, maybe [INAUDIBLE] for the entire list 455 00:22:08,350 --> 00:22:09,730 after just checking one number. 456 00:22:09,730 --> 00:22:11,920 DAVID J. MALAN: Yeah, it would be returning falls for-- 457 00:22:11,920 --> 00:22:13,795 even though I'd only looked at-- [? Nomira ?] 458 00:22:13,795 --> 00:22:15,100 had only looked at one element. 459 00:22:15,100 --> 00:22:18,142 And it would have been as though if all of these doors were still closed, 460 00:22:18,142 --> 00:22:21,610 she opens this up and says, nope, this is not zero, return false. 461 00:22:21,610 --> 00:22:24,708 That would give me an incorrect result because obviously, 462 00:22:24,708 --> 00:22:27,250 at that stage in the algorithm, she wouldn't have even looked 463 00:22:27,250 --> 00:22:28,720 through any of the other doors. 464 00:22:28,720 --> 00:22:32,510 So just the original indentation of this, if you will, 465 00:22:32,510 --> 00:22:35,200 without the [? else, ?] is correct because only 466 00:22:35,200 --> 00:22:38,950 if I get to the bottom of this algorithm or the pseudo code does 467 00:22:38,950 --> 00:22:41,740 it make sense to conclude at that point, once she's 468 00:22:41,740 --> 00:22:45,220 gone through all of the doors, that nope, there's in fact-- 469 00:22:45,220 --> 00:22:48,550 the number I'm looking for is, in fact, not actually there. 470 00:22:48,550 --> 00:22:52,790 So how might we consider now the running time of this algorithm? 471 00:22:52,790 --> 00:22:55,930 We have a few different types of vocabulary now. 472 00:22:55,930 --> 00:22:59,080 And if we consider now how we might think about this, 473 00:22:59,080 --> 00:23:02,620 let's start to translate it from sort of higher level pseudo code 474 00:23:02,620 --> 00:23:04,420 to something a little lower level. 475 00:23:04,420 --> 00:23:07,820 We've been writing code using n and loops and the like. 476 00:23:07,820 --> 00:23:12,340 So let's take this higher level pseudo code and now just kind of 477 00:23:12,340 --> 00:23:14,890 get a middle ground between English and C. 478 00:23:14,890 --> 00:23:18,910 Let me propose that we think about this version of the same algorithm 479 00:23:18,910 --> 00:23:20,680 as being a little more pedantic. 480 00:23:20,680 --> 00:23:28,820 For i from 0 to n minus 1, if number behind doors bracket i return true. 481 00:23:28,820 --> 00:23:31,520 Otherwise, at the end of the program, return false. 482 00:23:31,520 --> 00:23:33,520 Now I'm kind of mixing English and C here, 483 00:23:33,520 --> 00:23:35,950 but that's reasonable if the reader is familiar with C 484 00:23:35,950 --> 00:23:37,540 or some similar language. 485 00:23:37,540 --> 00:23:39,530 And notice this pattern here. 486 00:23:39,530 --> 00:23:44,650 This is a way of just saying in pseudo code, give myself a variable called i. 487 00:23:44,650 --> 00:23:49,360 Start at 0 and then just count up to n minus 1. 488 00:23:49,360 --> 00:23:53,410 And recall n minus 1 is not one shy of the end of the array. 489 00:23:53,410 --> 00:23:56,170 N minus 1 is the end of the array because again, we 490 00:23:56,170 --> 00:23:57,580 started counting at 0. 491 00:23:57,580 --> 00:24:00,880 So this is a very common way of expressing this kind of loop 492 00:24:00,880 --> 00:24:03,910 from the left all the way to the right of an array. 493 00:24:03,910 --> 00:24:07,210 Doors I'm kind of implicitly treating as the name of this array, 494 00:24:07,210 --> 00:24:10,000 like it's a variable from last week that I defined as being 495 00:24:10,000 --> 00:24:11,780 an array of integers in this case. 496 00:24:11,780 --> 00:24:16,690 So doors bracket i means that when i is 0, it's this location. 497 00:24:16,690 --> 00:24:18,220 When i is 1, it's this. 498 00:24:18,220 --> 00:24:21,790 When i is 7 or, more generally n minus-- 499 00:24:21,790 --> 00:24:26,270 sorry, 6 or, more generally, n minus 1, that's this location here. 500 00:24:26,270 --> 00:24:28,700 So same idea but a translation of it. 501 00:24:28,700 --> 00:24:32,920 So now let's consider what the running time of this algorithm is. 502 00:24:32,920 --> 00:24:36,010 If we have this menu of possible answers to this question, 503 00:24:36,010 --> 00:24:38,930 how efficient or inefficient is this algorithm, 504 00:24:38,930 --> 00:24:41,650 let's take a look in the context of this pseudo code. 505 00:24:41,650 --> 00:24:44,500 We don't even have to bother going all the way to C. 506 00:24:44,500 --> 00:24:47,720 How do we go about analyzing each of these steps? 507 00:24:47,720 --> 00:24:49,600 Well, let's consider this. 508 00:24:49,600 --> 00:24:55,540 This outermost loop here for i from 0 to n minus 1, that line of code 509 00:24:55,540 --> 00:24:57,790 is going to execute how many times? 510 00:24:57,790 --> 00:25:01,420 How many times will that loop execute? 511 00:25:01,420 --> 00:25:04,540 Let me give folks this moment to think on it. 512 00:25:04,540 --> 00:25:07,330 How many times is that going to loop here? 513 00:25:07,330 --> 00:25:08,584 Yeah, over there. 514 00:25:08,584 --> 00:25:10,360 AUDIENCE: [INAUDIBLE] 515 00:25:10,360 --> 00:25:11,530 DAVID J. MALAN: n times, right? 516 00:25:11,530 --> 00:25:13,720 Because it's from 0 to n minus 1. 517 00:25:13,720 --> 00:25:16,330 And if it's a little weird to think in from 0 to n minus 1, 518 00:25:16,330 --> 00:25:19,832 this is essentially the same mathematically as from 1 to n. 519 00:25:19,832 --> 00:25:21,790 And that's perhaps a little more obviously more 520 00:25:21,790 --> 00:25:23,690 intuitively n total steps. 521 00:25:23,690 --> 00:25:28,180 So I might just make a note to myself this loop is going to operate n times. 522 00:25:28,180 --> 00:25:29,770 What about these inner steps? 523 00:25:29,770 --> 00:25:33,160 Well, how many steps or seconds does it take to ask a question? 524 00:25:33,160 --> 00:25:35,290 If the number behind-- 525 00:25:35,290 --> 00:25:38,740 if the number you're looking for is behind doors bracket i, 526 00:25:38,740 --> 00:25:41,570 well, as [? Nomira ?] did, that's kind of like one step. 527 00:25:41,570 --> 00:25:42,820 So you open the door and boom. 528 00:25:42,820 --> 00:25:46,100 All right, maybe it's two steps, but it's a constant number of steps. 529 00:25:46,100 --> 00:25:48,320 So this is some constant number of steps. 530 00:25:48,320 --> 00:25:50,080 Let's just call it one for simplicity. 531 00:25:50,080 --> 00:25:53,500 How many steps or seconds does it take to return true? 532 00:25:53,500 --> 00:25:55,863 I don't know exactly in the computer's memory 533 00:25:55,863 --> 00:25:57,280 but that feels like a single step. 534 00:25:57,280 --> 00:25:58,940 Just return true. 535 00:25:58,940 --> 00:26:01,960 So if this takes one step, this takes one step 536 00:26:01,960 --> 00:26:04,960 but only if the condition is true, it looks 537 00:26:04,960 --> 00:26:09,370 like you're doing a constant number of things n times. 538 00:26:09,370 --> 00:26:12,530 Or maybe you're doing one additional step. 539 00:26:12,530 --> 00:26:15,010 So in short, the only thing that really matters here 540 00:26:15,010 --> 00:26:18,220 in terms of the efficiency or inefficiency of the algorithm 541 00:26:18,220 --> 00:26:21,495 is what are you doing again and again and again because that's obviously 542 00:26:21,495 --> 00:26:22,870 the thing that's going to add up. 543 00:26:22,870 --> 00:26:26,080 Doing one thing or two things a constant number of times? 544 00:26:26,080 --> 00:26:27,010 Not a big deal. 545 00:26:27,010 --> 00:26:32,050 But looping, that's going to add up over time because the more doors there are, 546 00:26:32,050 --> 00:26:35,420 the bigger n is going to be and the more steps that's going to take, 547 00:26:35,420 --> 00:26:38,320 which is all to say if you were to describe roughly 548 00:26:38,320 --> 00:26:43,120 how many steps does this algorithm take in big O notation, 549 00:26:43,120 --> 00:26:46,060 what might your instincts say? 550 00:26:46,060 --> 00:26:51,350 How many steps is this algorithm on the order of given n doors or n integers? 551 00:26:51,350 --> 00:26:51,850 Yeah? 552 00:26:51,850 --> 00:26:52,805 AUDIENCE: [INAUDIBLE] 553 00:26:52,805 --> 00:26:53,680 DAVID J. MALAN: Say again? 554 00:26:53,680 --> 00:26:54,670 AUDIENCE: O n. 555 00:26:54,670 --> 00:26:56,110 DAVID J. MALAN: Big O of n. 556 00:26:56,110 --> 00:26:58,220 And indeed, that's going to be the case here. 557 00:26:58,220 --> 00:26:58,450 Why? 558 00:26:58,450 --> 00:27:00,533 Because you're essentially, at the end of the day, 559 00:27:00,533 --> 00:27:03,992 doing n things as an upper bound on running time. 560 00:27:03,992 --> 00:27:05,950 And that's, in fact, what exactly what happened 561 00:27:05,950 --> 00:27:08,500 with [? Nomira. ?] She had to look at all n lockers 562 00:27:08,500 --> 00:27:11,450 before finally getting to the right answer. 563 00:27:11,450 --> 00:27:14,470 But what if she got lucky and the number we 564 00:27:14,470 --> 00:27:17,380 were looking for was not at the end of the array 565 00:27:17,380 --> 00:27:20,080 but was at the beginning of the array? 566 00:27:20,080 --> 00:27:21,650 How might we think about that? 567 00:27:21,650 --> 00:27:25,120 Well, have a nomenclature for this too, of course-- omega notation. 568 00:27:25,120 --> 00:27:27,730 Remember, omega notation is a lower bound. 569 00:27:27,730 --> 00:27:34,240 So given this menu of possible running times for lower bounds on an algorithm, 570 00:27:34,240 --> 00:27:38,897 what might the omega notation be for [? Nomira's ?] linear search? 571 00:27:38,897 --> 00:27:40,270 AUDIENCE: Omega 1. 572 00:27:40,270 --> 00:27:42,250 DAVID J. MALAN: Omega of 1, and why that? 573 00:27:42,250 --> 00:27:43,973 AUDIENCE: [INAUDIBLE] 574 00:27:43,973 --> 00:27:46,390 DAVID J. MALAN: Right, because if just by chance she gets lucky 575 00:27:46,390 --> 00:27:49,300 and the number she's looking for is right there where 576 00:27:49,300 --> 00:27:51,490 she begins the algorithm, that's it. 577 00:27:51,490 --> 00:27:52,540 It's one step. 578 00:27:52,540 --> 00:27:55,210 Maybe it's two steps if you have to unlock the door and open it, 579 00:27:55,210 --> 00:27:56,740 but it's a constant number of steps. 580 00:27:56,740 --> 00:27:58,810 And the way we describe constant number of steps 581 00:27:58,810 --> 00:28:01,030 is just with a single number like 1. 582 00:28:01,030 --> 00:28:04,990 So the omega notation for linear search might be omega of 1 583 00:28:04,990 --> 00:28:08,650 because in the best case, she might just get the number right from the get go. 584 00:28:08,650 --> 00:28:11,860 But in the worst case, we need to talk about the upper bound, which 585 00:28:11,860 --> 00:28:13,990 might indeed be big O of n. 586 00:28:13,990 --> 00:28:16,690 So again there's this way now of talking symbolically 587 00:28:16,690 --> 00:28:22,150 about best cases and worst cases or lower bounds and upper bounds. 588 00:28:22,150 --> 00:28:24,880 Theta notation, just as a little trivia now, 589 00:28:24,880 --> 00:28:27,965 is it applicable based on the definition I gave earlier? 590 00:28:27,965 --> 00:28:28,840 AUDIENCE: [INAUDIBLE] 591 00:28:28,840 --> 00:28:31,630 DAVID J. MALAN: OK, no, because you only take out the theta notation 592 00:28:31,630 --> 00:28:34,120 when those two bounds, upper and lower, happen 593 00:28:34,120 --> 00:28:36,740 to be the same for shorthand notation, if you will. 594 00:28:36,740 --> 00:28:41,530 So it suffices here to talk about just big O and omega notation. 595 00:28:41,530 --> 00:28:43,880 Well, what if we are a little smarter about this? 596 00:28:43,880 --> 00:28:47,350 Let me go ahead and sort of semi-secretly here 597 00:28:47,350 --> 00:28:48,380 rearrange these numbers. 598 00:28:48,380 --> 00:28:50,605 But first, how about one other volunteer? 599 00:28:50,605 --> 00:28:53,230 One other volunteer-- you have to be comfortable with your mask 600 00:28:53,230 --> 00:28:55,510 and your being on the internet. 601 00:28:55,510 --> 00:28:58,180 How about over here? 602 00:28:58,180 --> 00:28:59,880 Yes, you want to come on down? 603 00:28:59,880 --> 00:29:00,880 All right, come on down. 604 00:29:00,880 --> 00:29:03,640 And don't look at what I'm doing because I'm going to-- 605 00:29:03,640 --> 00:29:08,340 606 00:29:08,340 --> 00:29:10,830 take your time and don't look up this way 607 00:29:10,830 --> 00:29:14,550 because I need a moment to rearrange all of the numbers. 608 00:29:14,550 --> 00:29:17,430 And actually, if you could stay right there before coming up, 609 00:29:17,430 --> 00:29:20,910 just an awkward few seconds while I finish hiding the numbers 610 00:29:20,910 --> 00:29:22,590 behind these doors for you. 611 00:29:22,590 --> 00:29:23,890 AUDIENCE: [INAUDIBLE] 612 00:29:23,890 --> 00:29:26,490 DAVID J. MALAN: I will be right with you. 613 00:29:26,490 --> 00:29:31,110 Actually, if-- do you want to warm up the crowd for a moment 614 00:29:31,110 --> 00:29:32,283 and I'll be right back? 615 00:29:32,283 --> 00:29:33,700 So you want to introduce yourself? 616 00:29:33,700 --> 00:29:34,770 AUDIENCE: Yeah, hi, guys. 617 00:29:34,770 --> 00:29:35,340 I'm Rave. 618 00:29:35,340 --> 00:29:37,900 619 00:29:37,900 --> 00:29:38,400 Yeah! 620 00:29:38,400 --> 00:29:43,500 621 00:29:43,500 --> 00:29:46,200 DAVID J. MALAN: All right, I think I am ready. 622 00:29:46,200 --> 00:29:47,700 Thank you for stalling there. 623 00:29:47,700 --> 00:29:48,970 AUDIENCE: Of course. 624 00:29:48,970 --> 00:29:49,710 DAVID J. MALAN: And I didn't catch your name. 625 00:29:49,710 --> 00:29:50,250 What was your name? 626 00:29:50,250 --> 00:29:51,080 AUDIENCE: I'm Rave. 627 00:29:51,080 --> 00:29:51,450 DAVID J. MALAN: I'm sorry? 628 00:29:51,450 --> 00:29:52,440 AUDIENCE: Rave, like a party. 629 00:29:52,440 --> 00:29:53,130 DAVID J. MALAN: Rave, OK. 630 00:29:53,130 --> 00:29:53,672 Nice to meet. 631 00:29:53,672 --> 00:29:54,930 Come on over. 632 00:29:54,930 --> 00:29:56,760 So Rave has kindly volunteered now. 633 00:29:56,760 --> 00:29:58,725 And I'm going to give you an additional advantage this time. 634 00:29:58,725 --> 00:29:59,400 AUDIENCE: OK. 635 00:29:59,400 --> 00:30:03,180 DAVID J. MALAN: Unbeknownst to you, I now took numbers behind the doors, 636 00:30:03,180 --> 00:30:04,530 but I sorted them for you. 637 00:30:04,530 --> 00:30:06,120 So they're not in the same random order like they 638 00:30:06,120 --> 00:30:08,162 were for [? Nomira. ?] You now have the advantage 639 00:30:08,162 --> 00:30:10,678 to know that the numbers are sorted from small to big. 640 00:30:10,678 --> 00:30:11,220 AUDIENCE: OK. 641 00:30:11,220 --> 00:30:15,180 DAVID J. MALAN: Given that, and given perhaps what we talked about in week zero 642 00:30:15,180 --> 00:30:19,140 with the phone book, where might you propose we begin the story this time? 643 00:30:19,140 --> 00:30:20,850 With which locker? 644 00:30:20,850 --> 00:30:21,808 AUDIENCE: To find zero? 645 00:30:21,808 --> 00:30:23,600 DAVID J. MALAN: Let's find number six this time. 646 00:30:23,600 --> 00:30:24,940 Let's make things interesting. 647 00:30:24,940 --> 00:30:26,200 AUDIENCE: OK. 648 00:30:26,200 --> 00:30:27,300 I'll start in the middle. 649 00:30:27,300 --> 00:30:28,050 DAVID J. MALAN: OK, so the middle. 650 00:30:28,050 --> 00:30:28,990 There's seven total. 651 00:30:28,990 --> 00:30:29,490 So-- 652 00:30:29,490 --> 00:30:29,670 AUDIENCE: OK. 653 00:30:29,670 --> 00:30:30,420 DAVID J. MALAN: --that would be right here. 654 00:30:30,420 --> 00:30:30,920 Go ahead. 655 00:30:30,920 --> 00:30:32,460 Open that up. 656 00:30:32,460 --> 00:30:34,320 And you find, sadly, the number five. 657 00:30:34,320 --> 00:30:36,150 So what do you know now? 658 00:30:36,150 --> 00:30:37,257 AUDIENCE: I know to go up. 659 00:30:37,257 --> 00:30:37,840 DAVID J. MALAN: OK. 660 00:30:37,840 --> 00:30:38,382 AUDIENCE: OK. 661 00:30:38,382 --> 00:30:40,465 DAVID J. MALAN: All right, and just to keep it uniform, 662 00:30:40,465 --> 00:30:43,080 just like I did, I opened to the right half of the phone book. 663 00:30:43,080 --> 00:30:43,350 AUDIENCE: Yes. 664 00:30:43,350 --> 00:30:44,880 DAVID J. MALAN: Let's keep it similar. 665 00:30:44,880 --> 00:30:45,390 Yeah. 666 00:30:45,390 --> 00:30:46,223 AUDIENCE: All right. 667 00:30:46,223 --> 00:30:48,182 DAVID J. MALAN: All right, and, uh, a little too far 668 00:30:48,182 --> 00:30:50,070 even though I know you wanted to go one over. 669 00:30:50,070 --> 00:30:51,080 AUDIENCE: All good, all good. 670 00:30:51,080 --> 00:30:52,800 DAVID J. MALAN: And now we're going to go which direction? 671 00:30:52,800 --> 00:30:54,217 AUDIENCE: Over here in the middle. 672 00:30:54,217 --> 00:30:56,320 DAVID J. MALAN: Right, and voila, the number six. 673 00:30:56,320 --> 00:30:57,840 All right, so very nicely done. 674 00:30:57,840 --> 00:31:00,620 675 00:31:00,620 --> 00:31:02,320 A little stressful for you as well. 676 00:31:02,320 --> 00:31:03,210 Thank you again. 677 00:31:03,210 --> 00:31:05,790 So here we see by nature of the locker door 678 00:31:05,790 --> 00:31:10,350 still being open sort of an artifact of the greater efficiency, 679 00:31:10,350 --> 00:31:13,560 it would seem, of this algorithm because now that Rave 680 00:31:13,560 --> 00:31:16,470 was given the assumption that these numbers are sorted from small 681 00:31:16,470 --> 00:31:20,310 on the left to large on the right, she was able to apply that same divide 682 00:31:20,310 --> 00:31:23,610 and conquer algorithm from week zero which we're now going to give a name-- 683 00:31:23,610 --> 00:31:25,200 binary search. 684 00:31:25,200 --> 00:31:28,650 And simply by starting in the middle and realizing, 685 00:31:28,650 --> 00:31:32,670 OK, too small, then by going to the right half and realizing, oh, 686 00:31:32,670 --> 00:31:35,820 went a little too far, then by going to the left half, which, 687 00:31:35,820 --> 00:31:39,660 Rave able to find in just three steps instead of seven 688 00:31:39,660 --> 00:31:43,720 the number six in this case that we were actually searching for. 689 00:31:43,720 --> 00:31:47,890 So you can see that this would seem to be more efficient. 690 00:31:47,890 --> 00:31:50,700 Let's consider for just a moment is it correct. 691 00:31:50,700 --> 00:31:56,250 If I had used different numbers but still sorted them from left to right, 692 00:31:56,250 --> 00:31:59,378 would it still have worked this algorithm? 693 00:31:59,378 --> 00:32:00,420 You're nodding your head. 694 00:32:00,420 --> 00:32:01,170 Can I call on you? 695 00:32:01,170 --> 00:32:03,920 Like, why would it still have worked, do you think? 696 00:32:03,920 --> 00:32:06,700 AUDIENCE: [INAUDIBLE] 697 00:32:06,700 --> 00:32:08,450 DAVID J. MALAN: Yeah, so so long as the numbers 698 00:32:08,450 --> 00:32:10,760 are always in the same order from left to right 699 00:32:10,760 --> 00:32:13,970 or, heck, they could even be in reverse order, so long as it's consistent, 700 00:32:13,970 --> 00:32:18,410 the decisions that Rave was making-- if greater than, else, if less than-- 701 00:32:18,410 --> 00:32:20,820 would guide us to the solution no matter what. 702 00:32:20,820 --> 00:32:23,460 And it would seem to take fewer steps. 703 00:32:23,460 --> 00:32:26,220 So if we consider now the pseudo code for this algorithm, 704 00:32:26,220 --> 00:32:28,530 let's take a look how we might describe binary search. 705 00:32:28,530 --> 00:32:31,400 So binary search we might describe with something like this. 706 00:32:31,400 --> 00:32:34,640 If the number is behind the middle door, which is where Rave began, 707 00:32:34,640 --> 00:32:36,710 then we can just return true. 708 00:32:36,710 --> 00:32:40,290 Else if the number is less than the middle door, 709 00:32:40,290 --> 00:32:42,800 so if six is less than whatever is behind the middle door, 710 00:32:42,800 --> 00:32:45,140 then Rave would have searched the left half. 711 00:32:45,140 --> 00:32:47,690 Else if the number is greater than the middle door, 712 00:32:47,690 --> 00:32:49,700 Rave would have searched the right half. 713 00:32:49,700 --> 00:32:53,840 Else, if there are no doors-- and we'll see in a moment why I put 714 00:32:53,840 --> 00:32:55,710 this up top just to keep things clean. 715 00:32:55,710 --> 00:32:59,390 If there's no doors, what should Rave have presumably returned immediately 716 00:32:59,390 --> 00:33:02,870 if I gave her no lockers to work with? 717 00:33:02,870 --> 00:33:03,920 Just returned false. 718 00:33:03,920 --> 00:33:06,020 But this is an important case to consider 719 00:33:06,020 --> 00:33:09,980 because if in the process of searching by locker by locker, 720 00:33:09,980 --> 00:33:14,630 we might have whittled down the problem from seven doors to three doors 721 00:33:14,630 --> 00:33:17,370 to one door to zero doors-- and at that point, 722 00:33:17,370 --> 00:33:19,230 we might have had no doors left to search. 723 00:33:19,230 --> 00:33:22,040 So we have to naturally have a scenario for just considering 724 00:33:22,040 --> 00:33:23,120 if there were no doors. 725 00:33:23,120 --> 00:33:26,910 So it's not to say that maybe I don't give Rave any doors to begin with. 726 00:33:26,910 --> 00:33:28,910 But as she divides and divides and divides, 727 00:33:28,910 --> 00:33:32,720 if she runs out of lockers to ask those questions of-- or a few weeks ago, 728 00:33:32,720 --> 00:33:35,660 if I ran out of phone book pages to tear in half, 729 00:33:35,660 --> 00:33:39,210 I too might have had to return false as in this case. 730 00:33:39,210 --> 00:33:42,500 So how can we now describe this a little more like C 731 00:33:42,500 --> 00:33:45,710 just to give ourselves a variable to start thinking and talking about? 732 00:33:45,710 --> 00:33:48,930 Well, I might talk about doors as being an array. 733 00:33:48,930 --> 00:33:52,490 And so if I want to express the middle door, I could just, in pseudo code, 734 00:33:52,490 --> 00:33:54,470 say doors bracket middle. 735 00:33:54,470 --> 00:33:56,270 I'm assuming that someone has done the math 736 00:33:56,270 --> 00:33:59,450 to figure out what the middle door is, but that's easy enough to do. 737 00:33:59,450 --> 00:34:01,940 And then doors, if the number we're looking for 738 00:34:01,940 --> 00:34:05,470 is less than doors bracket middle, then search door 739 00:34:05,470 --> 00:34:09,230 zero through doors middle minus 1. 740 00:34:09,230 --> 00:34:13,610 So again, this is a more pedantic way of taking what's a pretty intuitive idea-- 741 00:34:13,610 --> 00:34:16,159 search the left half, search the right half-- 742 00:34:16,159 --> 00:34:22,790 but start to now describe it in terms of actual indices or indexes 743 00:34:22,790 --> 00:34:24,593 like we did with our array notation. 744 00:34:24,593 --> 00:34:26,510 The last scenario, of course, is if the number 745 00:34:26,510 --> 00:34:28,760 is greater than the door's bracket middle, 746 00:34:28,760 --> 00:34:32,060 then Rave would have wanted to search the middle door plus 1-- 747 00:34:32,060 --> 00:34:37,250 so 1 over-- through doors n minus 1-- 748 00:34:37,250 --> 00:34:38,330 through n minus 1. 749 00:34:38,330 --> 00:34:41,389 So again, just a way of sort of describing a little more syntactically 750 00:34:41,389 --> 00:34:42,989 what it is that's going on. 751 00:34:42,989 --> 00:34:46,820 So how might we translate this now into big O notation? 752 00:34:46,820 --> 00:34:53,870 Well, in the worst case, how many steps total might Rave's binary search 753 00:34:53,870 --> 00:34:55,070 algorithm have taken? 754 00:34:55,070 --> 00:34:59,030 Given seven doors or given more generically n doors, 755 00:34:59,030 --> 00:35:03,620 how many times could she go left or go right before finding herself with one 756 00:35:03,620 --> 00:35:06,110 or no doors left? 757 00:35:06,110 --> 00:35:08,787 What's the way to think about that? 758 00:35:08,787 --> 00:35:09,620 Yeah, in the middle? 759 00:35:09,620 --> 00:35:11,150 AUDIENCE: Log n. 760 00:35:11,150 --> 00:35:11,990 DAVID J. MALAN: Log n. 761 00:35:11,990 --> 00:35:13,280 So there's log n again. 762 00:35:13,280 --> 00:35:16,155 And even if you're not feeling wholly comfortable with your logarithm 763 00:35:16,155 --> 00:35:19,250 still, pretty much in programming and in computer science more generally, 764 00:35:19,250 --> 00:35:22,430 any time we talk about some algorithm that's dividing and conquering 765 00:35:22,430 --> 00:35:26,180 in half, in half, in half, or any other multiple, 766 00:35:26,180 --> 00:35:28,580 it's probably involving logarithms in some sense. 767 00:35:28,580 --> 00:35:31,400 And log base n essentially refers to the number 768 00:35:31,400 --> 00:35:37,160 of times you can divide n by 2 until you bottom out at just a single door 769 00:35:37,160 --> 00:35:39,410 or equivalently zero doors left. 770 00:35:39,410 --> 00:35:40,370 So log n. 771 00:35:40,370 --> 00:35:44,270 So we might say that indeed, binary search is in big O of log n 772 00:35:44,270 --> 00:35:48,440 because the door that Rave opened last, this one, 773 00:35:48,440 --> 00:35:50,360 happened to be three doors away. 774 00:35:50,360 --> 00:35:52,790 And actually, if you do the math here, that roughly 775 00:35:52,790 --> 00:35:54,600 works out to be exactly that case. 776 00:35:54,600 --> 00:35:58,640 If we add one, that's sort of out of seven doors or roughly eight, 777 00:35:58,640 --> 00:36:01,940 we were able to search it in just three total steps. 778 00:36:01,940 --> 00:36:03,980 What about omega notation, though? 779 00:36:03,980 --> 00:36:07,220 Like, in the best case, Rave might have gotten lucky. 780 00:36:07,220 --> 00:36:09,170 She opened the door, and there it is. 781 00:36:09,170 --> 00:36:14,970 So how might we describe a lower bound on the running time of binary search. 782 00:36:14,970 --> 00:36:15,470 Yeah. 783 00:36:15,470 --> 00:36:16,125 AUDIENCE: 1. 784 00:36:16,125 --> 00:36:17,000 DAVID J. MALAN: Say again? 785 00:36:17,000 --> 00:36:17,840 AUDIENCE: 1. 786 00:36:17,840 --> 00:36:19,220 DAVID J. MALAN: Omega of 1. 787 00:36:19,220 --> 00:36:23,810 So here too, we see that in some cases binary search and linear search, eh, 788 00:36:23,810 --> 00:36:25,700 like, they're pretty equivalent. 789 00:36:25,700 --> 00:36:30,830 And so this is why sometimes compelling to consider both the best 790 00:36:30,830 --> 00:36:33,290 case in the worst case because honestly, in general, 791 00:36:33,290 --> 00:36:35,600 who really cares if you just get lucky once in a while 792 00:36:35,600 --> 00:36:37,280 and your algorithm is super fast? 793 00:36:37,280 --> 00:36:40,250 What you probably care about is what's the worst case. 794 00:36:40,250 --> 00:36:41,390 How long are my users-- 795 00:36:41,390 --> 00:36:45,170 how long am I going to be sitting there watching some spinning hourglass 796 00:36:45,170 --> 00:36:50,807 or beach ball trying to give myself an answer to a pretty big problem? 797 00:36:50,807 --> 00:36:53,640 Well, odds are, you're going to generally care about big O notation. 798 00:36:53,640 --> 00:36:55,430 So indeed, moving forward, will generally 799 00:36:55,430 --> 00:36:58,730 talk about the running time of algorithms often in terms of big O, 800 00:36:58,730 --> 00:37:00,780 a little less so in terms of omega. 801 00:37:00,780 --> 00:37:03,140 But understanding the range can be important 802 00:37:03,140 --> 00:37:08,700 depending on the nature of the data that you're going to actually be given here. 803 00:37:08,700 --> 00:37:11,510 All right let me pause and see if there is any questions. 804 00:37:11,510 --> 00:37:14,200 805 00:37:14,200 --> 00:37:16,420 Any questions here? 806 00:37:16,420 --> 00:37:18,850 Yes, thank you. 807 00:37:18,850 --> 00:37:21,430 AUDIENCE: So this method is clearly more efficient, 808 00:37:21,430 --> 00:37:26,440 but it requires that the information is all compiled in a certain order. 809 00:37:26,440 --> 00:37:29,770 How do you ensure that you can compile information 810 00:37:29,770 --> 00:37:31,265 in a particular order at scale? 811 00:37:31,265 --> 00:37:33,140 DAVID J. MALAN: Yeah, it's a really good question. 812 00:37:33,140 --> 00:37:36,015 And if I can generalize it, how do you guarantee that you can do this 813 00:37:36,015 --> 00:37:38,560 at scale, which algorithm is better? 814 00:37:38,560 --> 00:37:41,440 I've sort of led us down this road of implying 815 00:37:41,440 --> 00:37:43,540 that Rave's second algorithm, binary search, 816 00:37:43,540 --> 00:37:45,580 is better because it's so much faster. 817 00:37:45,580 --> 00:37:49,600 It's log of n in the worst case instead of big O of n. 818 00:37:49,600 --> 00:37:53,230 But Rave was given an advantage when she came up here in that the doors were 819 00:37:53,230 --> 00:37:54,222 already sorted. 820 00:37:54,222 --> 00:37:55,930 And so that sort of invites the question, 821 00:37:55,930 --> 00:37:57,670 well, given a whole bunch of random data, 822 00:37:57,670 --> 00:38:00,710 either a small data set or, heck, something Google sized with millions, 823 00:38:00,710 --> 00:38:04,540 billions of pieces of data, should you sort it first 824 00:38:04,540 --> 00:38:07,150 from smallest to largest and then search? 825 00:38:07,150 --> 00:38:11,920 Or should you just dive right in and search it linearly? 826 00:38:11,920 --> 00:38:13,638 Like, how might you think about that? 827 00:38:13,638 --> 00:38:15,430 If you are Google, for instance, and you've 828 00:38:15,430 --> 00:38:19,090 got millions, billions of web pages, should they just go with linear search 829 00:38:19,090 --> 00:38:21,850 because it's always going to work even though it might be slow? 830 00:38:21,850 --> 00:38:24,820 Or should they invest the time in sorting all of that data-- 831 00:38:24,820 --> 00:38:26,630 we'll see how in a bit-- 832 00:38:26,630 --> 00:38:28,900 and then search it more efficiently? 833 00:38:28,900 --> 00:38:31,438 Like, how do you decide between those options? 834 00:38:31,438 --> 00:38:33,730 AUDIENCE: If you're sorting the data, then wouldn't you 835 00:38:33,730 --> 00:38:36,573 have to go through all of the data? 836 00:38:36,573 --> 00:38:38,740 DAVID J. MALAN: Yeah, if you had to sort the data first-- 837 00:38:38,740 --> 00:38:40,700 and we don't yet formally know how to do this. 838 00:38:40,700 --> 00:38:43,117 But obviously, as humans, we could probably figure it out. 839 00:38:43,117 --> 00:38:45,280 You do have to look at all of the data anyway. 840 00:38:45,280 --> 00:38:48,760 And so you're sort of wasting your time if you're sorting it only 841 00:38:48,760 --> 00:38:50,680 then to go in search it. 842 00:38:50,680 --> 00:38:52,898 But maybe it depends a bit more. 843 00:38:52,898 --> 00:38:54,940 Like, that's absolutely right, and if you're just 844 00:38:54,940 --> 00:38:58,060 searching for one thing in life, then that's probably a waste of time 845 00:38:58,060 --> 00:39:01,720 to sort it and then search it because you're just adding to the process. 846 00:39:01,720 --> 00:39:03,880 But what's another scenario in which you might not 847 00:39:03,880 --> 00:39:09,410 worry about that whereby it might make sense to sort it and then search? 848 00:39:09,410 --> 00:39:09,910 Yeah. 849 00:39:09,910 --> 00:39:16,580 AUDIENCE: [INAUDIBLE] you can go and use the other values as a way 850 00:39:16,580 --> 00:39:17,810 to find out what's happening. 851 00:39:17,810 --> 00:39:18,852 DAVID J. MALAN: Yeah, exactly. 852 00:39:18,852 --> 00:39:21,392 So if your problem is a Google-like problem where 853 00:39:21,392 --> 00:39:24,350 you have more than just one user who's searching for more than just one 854 00:39:24,350 --> 00:39:26,780 website page, probably you should incur the cost up front 855 00:39:26,780 --> 00:39:30,620 and sort the whole thing because every subsequent request thereafter 856 00:39:30,620 --> 00:39:32,810 is going to be faster, faster, faster because it's 857 00:39:32,810 --> 00:39:36,440 going to [INAUDIBLE] algorithm of binary search, binary search, binary search 858 00:39:36,440 --> 00:39:39,320 that's going to add up to be way fewer steps 859 00:39:39,320 --> 00:39:41,610 than doing linear search multiple times. 860 00:39:41,610 --> 00:39:43,490 So again, kind of depends on the use case 861 00:39:43,490 --> 00:39:45,350 and kind of depends on how important it is. 862 00:39:45,350 --> 00:39:48,050 And this happens even in real world contexts. 863 00:39:48,050 --> 00:39:50,900 I think back always to graduate school, when I was writing some code 864 00:39:50,900 --> 00:39:52,610 to analyze some large data set. 865 00:39:52,610 --> 00:39:55,400 And honestly, it was actually easier at the time for me 866 00:39:55,400 --> 00:39:58,310 to write pretty inefficient but hopefully correct 867 00:39:58,310 --> 00:39:59,537 code because you know what? 868 00:39:59,537 --> 00:40:02,870 I could just go to sleep for eight hours and let it analyze this really big data 869 00:40:02,870 --> 00:40:03,370 set. 870 00:40:03,370 --> 00:40:06,335 I didn't have to bother writing more complex code to sort it just 871 00:40:06,335 --> 00:40:07,460 to run it more efficiently. 872 00:40:07,460 --> 00:40:07,960 Why? 873 00:40:07,960 --> 00:40:11,520 Because I was the only user, and I only needed to run these queries once. 874 00:40:11,520 --> 00:40:13,700 And so this was kind of a reasonable approach, 875 00:40:13,700 --> 00:40:17,550 reasonable until I woke up eight hours later and my code was incorrect. 876 00:40:17,550 --> 00:40:20,960 And now I had to spend another eight hours rerunning it after fixing it. 877 00:40:20,960 --> 00:40:22,910 But even there, you see an example where, 878 00:40:22,910 --> 00:40:24,890 what is your most precious resource? 879 00:40:24,890 --> 00:40:26,720 Is it time to run the code? 880 00:40:26,720 --> 00:40:28,970 Is it time to write the code? 881 00:40:28,970 --> 00:40:31,098 Is it the amount of memory the computer is using? 882 00:40:31,098 --> 00:40:33,890 These are all resources we'll start to talk about because it really 883 00:40:33,890 --> 00:40:36,080 depends on what your goals are. 884 00:40:36,080 --> 00:40:39,050 Any questions, then, on upper bounds, lower bounds, 885 00:40:39,050 --> 00:40:42,260 or each of these two searches, linear or binary? 886 00:40:42,260 --> 00:40:42,991 Yeah. 887 00:40:42,991 --> 00:40:45,580 AUDIENCE: So just, when you're calculating running time, 888 00:40:45,580 --> 00:40:50,317 does the sorting step count for that time? 889 00:40:50,317 --> 00:40:53,150 DAVID J. MALAN: When analyzing running time, does the sorting step count? 890 00:40:53,150 --> 00:40:55,310 If you want it to if you actually do it. 891 00:40:55,310 --> 00:40:56,900 At the moment, it did not apply. 892 00:40:56,900 --> 00:41:01,100 I just gave Rave the luxury of knowing that the data was sorted. 893 00:41:01,100 --> 00:41:04,520 But if I really wanted to charge her for the amount of time 894 00:41:04,520 --> 00:41:07,730 it took to find that number six, I should have added the time 895 00:41:07,730 --> 00:41:09,712 to sort plus the time to search. 896 00:41:09,712 --> 00:41:11,420 And in fact, that's a road we'll go down. 897 00:41:11,420 --> 00:41:13,170 Why don't we go ahead and pace ourselves as before? 898 00:41:13,170 --> 00:41:14,587 Let's take a 10 minute break here. 899 00:41:14,587 --> 00:41:17,130 And when we come back, we'll write some actual code. 900 00:41:17,130 --> 00:41:21,038 So we've seen a couple of searches-- linear search and binary search, which, 901 00:41:21,038 --> 00:41:22,580 to be fair, we saw back in week zero. 902 00:41:22,580 --> 00:41:25,790 But let's actually translate at least one of those now to some code 903 00:41:25,790 --> 00:41:28,970 using this building block from last week where we can actually 904 00:41:28,970 --> 00:41:32,820 define an array if we want, like an array of integers called numbers. 905 00:41:32,820 --> 00:41:34,550 So let me switch over to BS Code here. 906 00:41:34,550 --> 00:41:37,910 Let me go ahead and start a program called numbers.c. 907 00:41:37,910 --> 00:41:40,940 And in numbers.c, let me go ahead here. 908 00:41:40,940 --> 00:41:44,840 And how about let's include our familiar header files? 909 00:41:44,840 --> 00:41:46,670 So css50.h. 910 00:41:46,670 --> 00:41:51,330 I'll include standardio.h that we can get input and print input if we want. 911 00:41:51,330 --> 00:41:54,410 And now I'm going to go ahead and give myself int main void. 912 00:41:54,410 --> 00:41:56,100 No command line arguments today. 913 00:41:56,100 --> 00:41:57,232 So I'll leave that as void. 914 00:41:57,232 --> 00:41:58,940 And I'm going to go ahead and give myself 915 00:41:58,940 --> 00:42:01,410 an array of how about seven numbers? 916 00:42:01,410 --> 00:42:04,220 So I'll call it int number 7. 917 00:42:04,220 --> 00:42:06,260 And then I can fill this array with numbers. 918 00:42:06,260 --> 00:42:10,100 Like, numbers brackets 0 can be the number 4, and numbers bracket 1 919 00:42:10,100 --> 00:42:14,308 could be the number 6, and numbers bracket 2 can be the number 8. 920 00:42:14,308 --> 00:42:16,850 And this is the same list that we saw with [? Nomira ?] a bit 921 00:42:16,850 --> 00:42:19,170 ago where it was 4, then 6, then 8. 922 00:42:19,170 --> 00:42:19,920 But you know what? 923 00:42:19,920 --> 00:42:22,340 There's actually another syntax I can show you here. 924 00:42:22,340 --> 00:42:25,220 If you know in advance in a C program that you 925 00:42:25,220 --> 00:42:30,390 want an array of certain values and you know therefore how many of those values 926 00:42:30,390 --> 00:42:33,410 you want, you can actually do this little trick using curly braces. 927 00:42:33,410 --> 00:42:36,560 You can say, don't worry about how big this is. 928 00:42:36,560 --> 00:42:39,620 It's going to be implicit by way of these curly braces. 929 00:42:39,620 --> 00:42:44,610 Here, I can do 4, 6, 8, 2, 7, 5, 0, close curly brace. 930 00:42:44,610 --> 00:42:46,730 So it's a somewhat new use of curly braces. 931 00:42:46,730 --> 00:42:50,780 But this has the effect of giving me an array called numbers inside 932 00:42:50,780 --> 00:42:52,440 of which are a whole bunch of integers. 933 00:42:52,440 --> 00:42:53,030 How many? 934 00:42:53,030 --> 00:42:56,960 The compiler can infer it from what's ever inside these curly braces. 935 00:42:56,960 --> 00:43:00,140 And it seems to be of size 1, 2, 3, 4, 5, 6, 7. 936 00:43:00,140 --> 00:43:05,510 And all seven elements will be initialized with 4, 6, 8, 2, 7, 5, 0 937 00:43:05,510 --> 00:43:06,330 respectively. 938 00:43:06,330 --> 00:43:08,780 So just a minor optimization code wise to tighten up 939 00:43:08,780 --> 00:43:12,090 what would have otherwise been like eight separate lines of code. 940 00:43:12,090 --> 00:43:15,080 Now let's go ahead and implement linear search, as we called it. 941 00:43:15,080 --> 00:43:18,122 And you can do this in a bunch of ways, but I'm going to do it like this. 942 00:43:18,122 --> 00:43:24,830 For int i get 0, i is less than 7 i plus plus. 943 00:43:24,830 --> 00:43:27,800 Then inside of my loop, I'm going to ask the question, well, 944 00:43:27,800 --> 00:43:33,020 if the numbers at location i equals equals, as we asked of 945 00:43:33,020 --> 00:43:36,680 [? Nomira, ?] the number 0, then I'm going to go ahead and do something 946 00:43:36,680 --> 00:43:41,450 like printf found backslash n. 947 00:43:41,450 --> 00:43:43,280 And then I'm going to return 0. 948 00:43:43,280 --> 00:43:46,040 Just because of last week's discussion of returning 949 00:43:46,040 --> 00:43:50,150 a value for main when all is well, I'm going to return 0 by convention 950 00:43:50,150 --> 00:43:53,060 just to signal that indeed, I found what I'm looking for. 951 00:43:53,060 --> 00:44:00,560 Otherwise, on what line do I want to go and add a printf, like, not found 952 00:44:00,560 --> 00:44:02,600 and return something other than 0? 953 00:44:02,600 --> 00:44:06,860 Right, I don't think I want an else here per our pseudo code earlier. 954 00:44:06,860 --> 00:44:11,030 So on what line would you prefer I sort of insert a default scenario 955 00:44:11,030 --> 00:44:14,490 of not found and I'll return an error? 956 00:44:14,490 --> 00:44:15,795 Yeah, over here? 957 00:44:15,795 --> 00:44:19,343 [INTERPOSING VOICES] 958 00:44:19,343 --> 00:44:20,010 DAVID J. MALAN: Nice. 959 00:44:20,010 --> 00:44:21,718 So at the end of the for loop because you 960 00:44:21,718 --> 00:44:23,550 want to give the program or our volunteer 961 00:44:23,550 --> 00:44:26,980 earlier a chance to go through all of the doors, all of the numbers. 962 00:44:26,980 --> 00:44:29,700 But if you go through the whole thing, through the whole loop, 963 00:44:29,700 --> 00:44:33,630 at the very end, you probably just want to conclude not found backslash n 964 00:44:33,630 --> 00:44:36,060 and then return something like positive 1 965 00:44:36,060 --> 00:44:38,040 just to signify that an error happened. 966 00:44:38,040 --> 00:44:40,170 And again, this was a minor detail last week. 967 00:44:40,170 --> 00:44:44,370 Any time main is successful, the programming convention is to return 0. 968 00:44:44,370 --> 00:44:45,850 That means all as well. 969 00:44:45,850 --> 00:44:49,020 And if something goes wrong, like you didn't find what you're looking for, 970 00:44:49,020 --> 00:44:52,950 you might return something other than 0, like positive 1, maybe positive 2, 971 00:44:52,950 --> 00:44:55,202 or even negative numbers if you want. 972 00:44:55,202 --> 00:44:57,160 All right, well, let me go ahead and save this. 973 00:44:57,160 --> 00:44:58,800 Let me do make numbers. 974 00:44:58,800 --> 00:45:01,230 Hopefully no syntax errors. 975 00:45:01,230 --> 00:45:02,520 All good so far. 976 00:45:02,520 --> 00:45:05,040 dot slash numbers, enter. 977 00:45:05,040 --> 00:45:07,500 All right, and it's found, as I would hope it would be. 978 00:45:07,500 --> 00:45:10,710 And just as a little check, let's search for something that's definitely 979 00:45:10,710 --> 00:45:14,790 not there, like the number negative 1. 980 00:45:14,790 --> 00:45:17,640 Let me go ahead and recompile the code with make numbers. 981 00:45:17,640 --> 00:45:19,890 Let me rerun the code with dot slash numbers 982 00:45:19,890 --> 00:45:21,900 and hopefully-- whew, OK, not found. 983 00:45:21,900 --> 00:45:24,473 So proof by example seems to be working correctly. 984 00:45:24,473 --> 00:45:26,640 But let's make things a little more interesting now. 985 00:45:26,640 --> 00:45:29,700 Right now, I'm using just an array of integers. 986 00:45:29,700 --> 00:45:34,090 Let me go ahead and introduce maybe an array of strings instead. 987 00:45:34,090 --> 00:45:37,360 And maybe this time, I'll store a bunch of names and not just integers 988 00:45:37,360 --> 00:45:39,100 but actual strings of names. 989 00:45:39,100 --> 00:45:40,330 So how might I do this? 990 00:45:40,330 --> 00:45:42,130 Well, let me go back to my code here. 991 00:45:42,130 --> 00:45:46,470 I'm going to switch us over to maybe a file called names.c. 992 00:45:46,470 --> 00:45:50,130 And in here, I'll go ahead and include cs50.h. 993 00:45:50,130 --> 00:45:53,550 I'll include standardio.h. 994 00:45:53,550 --> 00:45:57,030 And I'm going to go ahead and for now include a new friend 995 00:45:57,030 --> 00:46:00,478 from last week, string.h, which gives me some string-related functionality. 996 00:46:00,478 --> 00:46:03,270 Int main void because I'm not going to bother with any command line 997 00:46:03,270 --> 00:46:04,480 arguments for now. 998 00:46:04,480 --> 00:46:09,330 And now if I want an array of strings, I could do something like this-- 999 00:46:09,330 --> 00:46:12,120 string names bracket 7. 1000 00:46:12,120 --> 00:46:14,100 And then I could start doing like before. 1001 00:46:14,100 --> 00:46:17,580 Names bracket 0 could be someone like Bill, and names bracket 1 1002 00:46:17,580 --> 00:46:20,740 could be someone like Charlie and so forth. 1003 00:46:20,740 --> 00:46:24,352 But there's this new improvement I can make. 1004 00:46:24,352 --> 00:46:27,060 Let me just let the compiler figure out how many names there are. 1005 00:46:27,060 --> 00:46:32,550 And using curly braces, I'll do Bill and then Charlie and then Fred and then 1006 00:46:32,550 --> 00:46:39,690 George and then Ginny and then Percy and then Ron if there's the pattern there. 1007 00:46:39,690 --> 00:46:42,930 All right, so now I have these seven names as strings. 1008 00:46:42,930 --> 00:46:44,230 Let's do something similar. 1009 00:46:44,230 --> 00:46:47,730 So for int, i get 0. 1010 00:46:47,730 --> 00:46:51,330 i is less than 7 as before, i plus plus as before. 1011 00:46:51,330 --> 00:46:54,900 And inside of the, loop lets this time check for the string in question, 1012 00:46:54,900 --> 00:46:57,630 and suppose we're searching for Ron arbitrarily. 1013 00:46:57,630 --> 00:47:00,090 He is there, so we should eventually find him. 1014 00:47:00,090 --> 00:47:07,530 Let me go ahead and say if names bracket i equals quote unquote Ron, then inside 1015 00:47:07,530 --> 00:47:11,340 of my if condition, I'm going to say printf found just like before. 1016 00:47:11,340 --> 00:47:13,590 And I'm going to return 0 just because all is well. 1017 00:47:13,590 --> 00:47:16,390 And I'm going to take your advice from the get go this time 1018 00:47:16,390 --> 00:47:20,560 and, at the end of the loop, print out not found because if I get this far, 1019 00:47:20,560 --> 00:47:23,670 I have not printed found, and I have not returned already. 1020 00:47:23,670 --> 00:47:27,840 So I'm just going to go ahead and return 1 after printing not found. 1021 00:47:27,840 --> 00:47:30,420 All right, let me go ahead and cross my fingers as always. 1022 00:47:30,420 --> 00:47:33,310 Make names this time. 1023 00:47:33,310 --> 00:47:36,300 And it doesn't seem to like my code here. 1024 00:47:36,300 --> 00:47:38,370 This is perhaps a new error that you might not 1025 00:47:38,370 --> 00:47:41,080 have seen yet in names.c line 11. 1026 00:47:41,080 --> 00:47:43,920 So that's this line here, my if condition. 1027 00:47:43,920 --> 00:47:47,970 Result of comparison against a string literal is unspecified. 1028 00:47:47,970 --> 00:47:50,392 Use an explicit string comparison function instead. 1029 00:47:50,392 --> 00:47:53,100 I mean, that's kind of a mouthful, and the first time you see it, 1030 00:47:53,100 --> 00:47:55,600 you're probably not going to know how to make sense of that. 1031 00:47:55,600 --> 00:47:59,130 But it does kind of draw our attention to something being awry 1032 00:47:59,130 --> 00:48:03,840 with the equality checking here, with equal equals and Ron. 1033 00:48:03,840 --> 00:48:06,240 And here's where again we've been telling 1034 00:48:06,240 --> 00:48:08,700 sort of a white lie for the past couple of weeks. 1035 00:48:08,700 --> 00:48:12,900 Strings are a thing in C. Strings are a thing in programming. 1036 00:48:12,900 --> 00:48:14,670 But recall from last week, I did disclaim 1037 00:48:14,670 --> 00:48:16,650 there's no such thing as a string data type 1038 00:48:16,650 --> 00:48:20,670 technically because it's not a primitive in the way an int 1039 00:48:20,670 --> 00:48:24,210 and a float and a bool are that are sort of built into the language. 1040 00:48:24,210 --> 00:48:28,170 You can't just use equation equals to compare two strings. 1041 00:48:28,170 --> 00:48:31,110 You actually have to use a special function that's 1042 00:48:31,110 --> 00:48:34,140 in this header file we talked briefly about last week. 1043 00:48:34,140 --> 00:48:36,780 In that header file was string length or strlen. 1044 00:48:36,780 --> 00:48:39,490 But there's other functions instead as well. 1045 00:48:39,490 --> 00:48:43,080 Let me, in fact, go ahead and open up the manual pages. 1046 00:48:43,080 --> 00:48:46,200 And if we go to string.h-- 1047 00:48:46,200 --> 00:48:47,760 let me scroll down a bit. 1048 00:48:47,760 --> 00:48:52,800 In string.h you can perhaps infer what function will probably take 1049 00:48:52,800 --> 00:48:56,612 the place of equals equals for today. 1050 00:48:56,612 --> 00:48:57,570 What do we want to use? 1051 00:48:57,570 --> 00:48:57,930 Yeah. 1052 00:48:57,930 --> 00:48:58,680 AUDIENCE: Strcmp? 1053 00:48:58,680 --> 00:49:03,305 DAVID J. MALAN: So strcmp, S-T-R-C-M-P, which apparently compares two strings. 1054 00:49:03,305 --> 00:49:05,430 And if I click on that, we'll see more information. 1055 00:49:05,430 --> 00:49:09,510 And indeed, if I click on strcmp, we'll see under the synopsis 1056 00:49:09,510 --> 00:49:14,490 that, OK, I need to use the CS50 header file and string.h, as I already have. 1057 00:49:14,490 --> 00:49:17,850 Here is its prototype, which is telling me 1058 00:49:17,850 --> 00:49:21,360 that strcmp takes two strings, S1 and S2, that 1059 00:49:21,360 --> 00:49:22,890 are presumably going to be compared. 1060 00:49:22,890 --> 00:49:25,230 And it returns an integer, which is interesting. 1061 00:49:25,230 --> 00:49:26,430 So let's read on. 1062 00:49:26,430 --> 00:49:29,730 The description of this function is that it compares two strings case 1063 00:49:29,730 --> 00:49:30,870 sensitively. 1064 00:49:30,870 --> 00:49:34,110 So uppercase or lowercase matters, just FYI. 1065 00:49:34,110 --> 00:49:36,570 And then let's look it the return value here. 1066 00:49:36,570 --> 00:49:40,860 The return value of this function returns an int less than 0 1067 00:49:40,860 --> 00:49:48,480 if S1 comes before S2, 0 if S1 is the same as S2, or an int greater than 0 1068 00:49:48,480 --> 00:49:50,940 if S1 comes after S2. 1069 00:49:50,940 --> 00:49:54,780 So the reason that this function returns an integer and not just a 1070 00:49:54,780 --> 00:49:57,390 bool, true or false, is that it actually will 1071 00:49:57,390 --> 00:50:00,750 allow us to sort these things eventually because if you can tell me 1072 00:50:00,750 --> 00:50:04,980 if two strings come in this order or in this order or they're the same, 1073 00:50:04,980 --> 00:50:07,080 you need three possible return values. 1074 00:50:07,080 --> 00:50:08,910 And a bool, of course, only gives you two, 1075 00:50:08,910 --> 00:50:12,480 but an int gives you like 4 billion even though we just need the 3. 1076 00:50:12,480 --> 00:50:17,520 So 0 or a positive number or a negative number is what this function returns. 1077 00:50:17,520 --> 00:50:21,960 And the documentation goes on to explain what we mean by ASCIIbetical order. 1078 00:50:21,960 --> 00:50:25,260 Recall that capital A is 65, capital B is 66, 1079 00:50:25,260 --> 00:50:27,660 and it's those underlying ASCII or Unicode 1080 00:50:27,660 --> 00:50:31,050 numbers that a computer uses to figure out whether something comes before it 1081 00:50:31,050 --> 00:50:33,180 or after it like in the dictionary. 1082 00:50:33,180 --> 00:50:35,950 But for our purposes now, we only care about equality. 1083 00:50:35,950 --> 00:50:37,680 So I'm going to go ahead and do this. 1084 00:50:37,680 --> 00:50:41,820 If I want to compare names bracket i against Ron, 1085 00:50:41,820 --> 00:50:49,320 I use stir compare or strcmp, names bracket i comma, quote unquote, Ron. 1086 00:50:49,320 --> 00:50:51,510 So it's a little more involved than actually 1087 00:50:51,510 --> 00:50:55,830 using equals equals, which does work for integers, longs, 1088 00:50:55,830 --> 00:50:57,000 and certain other values. 1089 00:50:57,000 --> 00:51:00,850 But for strings, it turns out we need to use a more powerful function. 1090 00:51:00,850 --> 00:51:01,350 Why? 1091 00:51:01,350 --> 00:51:03,630 Well, last week, recall what a string really is. 1092 00:51:03,630 --> 00:51:06,220 It's an array of characters. 1093 00:51:06,220 --> 00:51:09,690 And so whereas you can use equals equals for single characters, 1094 00:51:09,690 --> 00:51:12,600 strcmp, as we'll eventually see, is going 1095 00:51:12,600 --> 00:51:14,438 to compare multiple characters for us. 1096 00:51:14,438 --> 00:51:15,480 There's more logic there. 1097 00:51:15,480 --> 00:51:19,570 There's a loop needed, and that's why it comes with the string library. 1098 00:51:19,570 --> 00:51:22,290 But it doesn't just work out of the box with equals equals alone. 1099 00:51:22,290 --> 00:51:26,267 That would literally be comparing two things, not two arrays of things. 1100 00:51:26,267 --> 00:51:28,350 And we'll come back to this next week as to what's 1101 00:51:28,350 --> 00:51:29,920 really going on under the hood. 1102 00:51:29,920 --> 00:51:34,140 So let me go ahead and fix one bug that I just realized I made. 1103 00:51:34,140 --> 00:51:39,450 I want to check if the return value of str compare is equal to 0 1104 00:51:39,450 --> 00:51:42,660 because per the documentation, that meant they're the same. 1105 00:51:42,660 --> 00:51:45,640 All right, let me go ahead and make names this time. 1106 00:51:45,640 --> 00:51:46,710 Now it compiles. 1107 00:51:46,710 --> 00:51:49,770 Dot slash names, Enter, found. 1108 00:51:49,770 --> 00:51:55,290 And just as a sanity check, let's check someone outside the family. 1109 00:51:55,290 --> 00:51:59,010 Searching now for Hermione after recompiling the code, 1110 00:51:59,010 --> 00:52:00,570 after rerunning the code. 1111 00:52:00,570 --> 00:52:02,280 And she's not, in fact, found. 1112 00:52:02,280 --> 00:52:05,220 So here's just a similar implementation of linear search 1113 00:52:05,220 --> 00:52:09,570 not for integers this time but instead for strings, 1114 00:52:09,570 --> 00:52:13,140 the subtlety really being we need a helper function, str compare, 1115 00:52:13,140 --> 00:52:17,640 to actually do the legwork for us of comparing two arrays of characters. 1116 00:52:17,640 --> 00:52:21,208 All right, questions on either of these implementations-- yeah, in the middle? 1117 00:52:21,208 --> 00:52:22,892 AUDIENCE: So, if I do [INAUDIBLE] 1118 00:52:22,892 --> 00:52:24,100 DAVID J. MALAN: Ah, good question. 1119 00:52:24,100 --> 00:52:28,260 If I had not fixed what I claimed was a mistake earlier and I did this-- 1120 00:52:28,260 --> 00:52:30,840 and we saw an example of this last week, actually. 1121 00:52:30,840 --> 00:52:37,470 If a function returns an integer, be it negative or positive or 0, 1122 00:52:37,470 --> 00:52:40,740 when you get back 0, the expression, the Boolean expression, 1123 00:52:40,740 --> 00:52:42,040 will be considered false. 1124 00:52:42,040 --> 00:52:44,940 So 0 equals false always. 1125 00:52:44,940 --> 00:52:49,470 If a function returns any positive number, or any negative number, 1126 00:52:49,470 --> 00:52:52,770 that's going to be interpreted as true even 1127 00:52:52,770 --> 00:52:57,510 if it's positive or negative, whether it's 1, negative 1, 2, negative 2. 1128 00:52:57,510 --> 00:53:01,570 And so if I did this, this would be saying the opposite. 1129 00:53:01,570 --> 00:53:06,930 So if I were to say this, if str compare of names bracket i and Hermione, that's 1130 00:53:06,930 --> 00:53:13,393 implicitly like saying this does not equal 0, or it means sort of is true, 1131 00:53:13,393 --> 00:53:15,810 but you don't want to check for true because, again, we're 1132 00:53:15,810 --> 00:53:17,350 comparing integers here. 1133 00:53:17,350 --> 00:53:21,000 So the reason I did 0 here in this case is 1134 00:53:21,000 --> 00:53:24,610 that it explicitly checks for the return value that means they're the same. 1135 00:53:24,610 --> 00:53:25,110 And yeah. 1136 00:53:25,110 --> 00:53:26,055 Follow up? 1137 00:53:26,055 --> 00:53:30,948 AUDIENCE: [INAUDIBLE] 1138 00:53:30,948 --> 00:53:32,990 DAVID J. MALAN: Yes, you might not have seen this yet, 1139 00:53:32,990 --> 00:53:36,580 but you can express the equivalent because if you 1140 00:53:36,580 --> 00:53:40,300 want to check if this is false, you can actually 1141 00:53:40,300 --> 00:53:43,300 use an exclamation point, known as a bang in programming, 1142 00:53:43,300 --> 00:53:44,960 that inverts the meaning. 1143 00:53:44,960 --> 00:53:48,163 So false becomes true, true becomes false. 1144 00:53:48,163 --> 00:53:50,080 So this would be another way of expressing it. 1145 00:53:50,080 --> 00:53:54,940 This is arguably a worse design, though, because the documentation explicitly 1146 00:53:54,940 --> 00:53:58,660 says you should be checking for 0 or a positive value 1147 00:53:58,660 --> 00:54:01,750 or a negative value, and this little trick, while correct, 1148 00:54:01,750 --> 00:54:05,500 and I think you can make a reasonable case for it, sort of hides that detail. 1149 00:54:05,500 --> 00:54:07,420 And I would argue instead for the first way, 1150 00:54:07,420 --> 00:54:09,607 checking for equals equals 0 instead. 1151 00:54:09,607 --> 00:54:11,440 And if that's a little subtle, not to worry. 1152 00:54:11,440 --> 00:54:16,240 We'll come back to little syntactic tricks like that before long. 1153 00:54:16,240 --> 00:54:20,770 Other questions on linear search in these two forms. 1154 00:54:20,770 --> 00:54:22,450 Is there another hand or hands? 1155 00:54:22,450 --> 00:54:24,070 Two hands? 1156 00:54:24,070 --> 00:54:24,610 No? 1157 00:54:24,610 --> 00:54:25,900 OK, just holler if I missed. 1158 00:54:25,900 --> 00:54:28,012 So let's now actually take this one step further. 1159 00:54:28,012 --> 00:54:30,970 Suppose that we want to write a program that maybe implements something 1160 00:54:30,970 --> 00:54:35,110 a little more like a phone book that has both names and numbers and not 1161 00:54:35,110 --> 00:54:37,030 just integers but actual phone numbers. 1162 00:54:37,030 --> 00:54:39,340 Well, we could escalate things like this. 1163 00:54:39,340 --> 00:54:42,850 We could now have two arrays-- one called names, one called numbers. 1164 00:54:42,850 --> 00:54:45,370 And I'm going to use strings for the numbers now, 1165 00:54:45,370 --> 00:54:48,010 the phone numbers, because in most communities, 1166 00:54:48,010 --> 00:54:51,730 phone numbers might have dashes, pluses, parentheses, so something 1167 00:54:51,730 --> 00:54:55,030 that really looks more like a string even though we call it a phone number. 1168 00:54:55,030 --> 00:54:58,580 Probably don't want to use an int lest we throw away those kinds of details. 1169 00:54:58,580 --> 00:55:03,160 So let me switch back to BS Code here, and let's do one more program, this one 1170 00:55:03,160 --> 00:55:05,033 in a file called phonebook.c. 1171 00:55:05,033 --> 00:55:06,700 And now let me go ahead and do the same. 1172 00:55:06,700 --> 00:55:08,380 Let me include cs50.h. 1173 00:55:08,380 --> 00:55:14,320 Let me include standardio.h, and let me include string.h. 1174 00:55:14,320 --> 00:55:17,380 I'm going to again do int main void. 1175 00:55:17,380 --> 00:55:20,710 And then inside of my program, I'm going to give myself two arrays-- 1176 00:55:20,710 --> 00:55:22,510 the efficient way this time. 1177 00:55:22,510 --> 00:55:25,000 String names will be just two of us this time. 1178 00:55:25,000 --> 00:55:28,390 How about Carter and me? 1179 00:55:28,390 --> 00:55:30,880 And then I'll give myself-- oops, typo already. 1180 00:55:30,880 --> 00:55:33,790 If I want this to be an array, I don't have to specify the number. 1181 00:55:33,790 --> 00:55:35,380 The compiler can count for me. 1182 00:55:35,380 --> 00:55:37,300 But I do need the square brackets. 1183 00:55:37,300 --> 00:55:43,750 Then for numbers, I'm again going to use a string array specifying with 1184 00:55:43,750 --> 00:55:49,510 the curly braces that how about Carter can be at 1-617-495-1000. 1185 00:55:49,510 --> 00:55:51,280 And how about my own number here-- 1186 00:55:51,280 --> 00:55:55,000 1-949-468-- oh pattern appearing-- 1187 00:55:55,000 --> 00:55:57,760 2750 will be mine. 1188 00:55:57,760 --> 00:55:58,600 Why mine? 1189 00:55:58,600 --> 00:56:00,530 Well, I'm just kind of lined things up. 1190 00:56:00,530 --> 00:56:03,460 So Carter's number is apparently first in this array, 1191 00:56:03,460 --> 00:56:06,800 and I'm claiming that he'll be first in this array, respectively. 1192 00:56:06,800 --> 00:56:09,610 I, David, will be the first-- the second in the names array 1193 00:56:09,610 --> 00:56:12,267 and second in the numbers array. 1194 00:56:12,267 --> 00:56:15,100 If you want to have a little fun with programming, feel free to text 1195 00:56:15,100 --> 00:56:17,270 or call me some time at that number. 1196 00:56:17,270 --> 00:56:20,950 So now let's actually use this data in some way. 1197 00:56:20,950 --> 00:56:24,368 Let's go ahead and actually search for my own name and number here. 1198 00:56:24,368 --> 00:56:24,910 So let me do. 1199 00:56:24,910 --> 00:56:27,490 For int i, get 0. 1200 00:56:27,490 --> 00:56:32,090 There's two of us this time-- so i less than 2 and then i plus plus as before. 1201 00:56:32,090 --> 00:56:34,480 And now I'm going to practice what I preached earlier, 1202 00:56:34,480 --> 00:56:38,440 and I'm going to use str compare to find my name in this case. 1203 00:56:38,440 --> 00:56:45,100 And I'm going to say if strcmp of names bracket i equals quote unquote David 1204 00:56:45,100 --> 00:56:48,740 and that equals 0, meaning they're the same, 1205 00:56:48,740 --> 00:56:51,610 then just as before, I'm going to go ahead and print something out. 1206 00:56:51,610 --> 00:56:53,320 But this time, I'm going to make the program more useful 1207 00:56:53,320 --> 00:56:55,100 and not just say found or not found. 1208 00:56:55,100 --> 00:56:59,050 Now I'm implementing a phone book, like the contacts app on iOS or Android. 1209 00:56:59,050 --> 00:57:02,380 So I'm going to say something like, quote unquote, found percent 1210 00:57:02,380 --> 00:57:08,830 s backslash n and then actually plug in numbers bracket i 1211 00:57:08,830 --> 00:57:12,370 to correspond to the current name bracket i. 1212 00:57:12,370 --> 00:57:14,225 And then I'll return 0 as before. 1213 00:57:14,225 --> 00:57:16,600 And then down here if we get all the way through the loop 1214 00:57:16,600 --> 00:57:20,120 and David's not there for some reason, I'm going to print as before not found 1215 00:57:20,120 --> 00:57:21,580 and then return 1. 1216 00:57:21,580 --> 00:57:26,590 So let me go ahead and compile this with make phone dot slash phonebook, 1217 00:57:26,590 --> 00:57:29,240 and it seems to have found the number. 1218 00:57:29,240 --> 00:57:33,130 So this code I'm going to claim is correct. 1219 00:57:33,130 --> 00:57:36,190 It's kind of stupid because I've just made a phone book or a contacts 1220 00:57:36,190 --> 00:57:37,690 app that only supports two people. 1221 00:57:37,690 --> 00:57:39,503 They're only going to be me and Carter. 1222 00:57:39,503 --> 00:57:41,920 This would be like downloading the contacts app on a phone 1223 00:57:41,920 --> 00:57:43,837 and you can only call two people in the world. 1224 00:57:43,837 --> 00:57:45,820 There's no ability to add names or edit things. 1225 00:57:45,820 --> 00:57:48,860 That, of course, could come later using get string or something else. 1226 00:57:48,860 --> 00:57:50,693 But for now for the sake of discussion, I've 1227 00:57:50,693 --> 00:57:53,450 just hardcoded two names and two numbers. 1228 00:57:53,450 --> 00:57:56,290 But for what it does, I claim this is correct. 1229 00:57:56,290 --> 00:57:59,590 It's going to find me and print out my number. 1230 00:57:59,590 --> 00:58:01,480 But is it well-designed? 1231 00:58:01,480 --> 00:58:05,560 Let's start to now consider if we're not just using arrays, 1232 00:58:05,560 --> 00:58:07,852 but are we using them, well? 1233 00:58:07,852 --> 00:58:10,810 We started to use them last week, but are we using them well this week? 1234 00:58:10,810 --> 00:58:14,680 And what might I even mean by using an array well or designing 1235 00:58:14,680 --> 00:58:16,640 this program well? 1236 00:58:16,640 --> 00:58:21,940 Any critiques or concerns with why this might not 1237 00:58:21,940 --> 00:58:24,100 be the best road for us to be going down when 1238 00:58:24,100 --> 00:58:28,540 I want to implement something like a phone book with pieces of information? 1239 00:58:28,540 --> 00:58:31,540 It seems all too vulnerable to just mistakes. 1240 00:58:31,540 --> 00:58:35,620 For instance, if I screw up the actual number of names in the names array 1241 00:58:35,620 --> 00:58:40,190 such that it's now more or less than is in the numbers array or vise versa, 1242 00:58:40,190 --> 00:58:43,420 it feels like there's not a tight relationship between those pieces 1243 00:58:43,420 --> 00:58:47,140 of data, and it's just sort of is trusting on the honor system 1244 00:58:47,140 --> 00:58:53,440 that any time I use names bracket i that it lines up with numbers bracket i. 1245 00:58:53,440 --> 00:58:54,160 And that's fine. 1246 00:58:54,160 --> 00:58:56,080 If you're the one writing the code, you're probably 1247 00:58:56,080 --> 00:58:57,640 not going to really screw this up. 1248 00:58:57,640 --> 00:59:00,265 But if you start collaborating with someone else or the program 1249 00:59:00,265 --> 00:59:03,700 is getting much, much longer, the odds that you or your colleagues 1250 00:59:03,700 --> 00:59:08,110 remember that you're sort of just trusting that names and numbers line up 1251 00:59:08,110 --> 00:59:10,420 like this is going to fail eventually. 1252 00:59:10,420 --> 00:59:13,540 Someone's not going to realize that, and just, the code is going to break. 1253 00:59:13,540 --> 00:59:16,540 And you're going to start out putting the wrong numbers for names, which 1254 00:59:16,540 --> 00:59:20,710 is to say it'd be much nicer if we could somehow couple these two 1255 00:59:20,710 --> 00:59:24,850 pieces of data, names and numbers, a little more tightly together so 1256 00:59:24,850 --> 00:59:28,900 that you're not just trusting that these two independent variables, names 1257 00:59:28,900 --> 00:59:32,630 and numbers, have this kind of relationship with themselves. 1258 00:59:32,630 --> 00:59:35,200 So let's consider how we might solve this. 1259 00:59:35,200 --> 00:59:39,400 A new feature today that we'll introduce is generally known as a data structure. 1260 00:59:39,400 --> 00:59:43,280 In C, we have the ability to invent our own data types, 1261 00:59:43,280 --> 00:59:46,555 if you will-- data types that the authors of C decades 1262 00:59:46,555 --> 00:59:48,430 ago just didn't envision or just didn't think 1263 00:59:48,430 --> 00:59:51,880 were necessary because we can implement them ourselves-- similar to Scratch 1264 00:59:51,880 --> 00:59:54,280 just as you could create custom puzzle pieces, 1265 00:59:54,280 --> 00:59:56,360 or in C, you can create custom functions. 1266 00:59:56,360 --> 01:00:00,760 So in C, can you create your own types of data 1267 01:00:00,760 --> 01:00:04,900 that go beyond the built in ints and floats and even strings? 1268 01:00:04,900 --> 01:00:10,540 You can make, for instance, a person data type or a candidate data 1269 01:00:10,540 --> 01:00:13,090 type in the context of elections or a person data type 1270 01:00:13,090 --> 01:00:15,950 more generically that might have a name and a number. 1271 01:00:15,950 --> 01:00:17,720 So how might we do this? 1272 01:00:17,720 --> 01:00:23,470 Well, let me go here and propose that if we want to define a person, 1273 01:00:23,470 --> 01:00:26,920 wouldn't it be nice if we could have a person data type, 1274 01:00:26,920 --> 01:00:29,470 and then we could have an array called people? 1275 01:00:29,470 --> 01:00:32,770 And maybe that array is our only array with two things 1276 01:00:32,770 --> 01:00:35,200 in it, two persons in it. 1277 01:00:35,200 --> 01:00:38,140 But somehow, those data types, these persons, 1278 01:00:38,140 --> 01:00:41,158 would have both a name and a number associated with them. 1279 01:00:41,158 --> 01:00:42,700 So we don't need two separate arrays. 1280 01:00:42,700 --> 01:00:47,240 We need one array of persons, a brand new data type. 1281 01:00:47,240 --> 01:00:48,890 So how might we do this? 1282 01:00:48,890 --> 01:00:51,070 Well, if we want every person in the world 1283 01:00:51,070 --> 01:00:53,320 or in this program to have a name and a number, 1284 01:00:53,320 --> 01:00:56,380 we literally right out first those two data types. 1285 01:00:56,380 --> 01:00:57,860 Give me a string called name. 1286 01:00:57,860 --> 01:01:00,850 Give me a string called number semicolon, after each. 1287 01:01:00,850 --> 01:01:04,030 And then we wrap that, those two lines of code, 1288 01:01:04,030 --> 01:01:06,730 with this syntax, which at first glance is a little cryptic. 1289 01:01:06,730 --> 01:01:08,410 It's a lot of words all of a sudden. 1290 01:01:08,410 --> 01:01:12,730 But typedef is a new keyword today that defines a new data type. 1291 01:01:12,730 --> 01:01:16,510 This is the C key word that lets you create your own data 1292 01:01:16,510 --> 01:01:18,100 type for the very first time. 1293 01:01:18,100 --> 01:01:22,840 Struct is another related key word that tells the compiler that this isn't just 1294 01:01:22,840 --> 01:01:27,310 a simple data type, like an int or a float renamed or something like that. 1295 01:01:27,310 --> 01:01:28,940 It actually is a structure. 1296 01:01:28,940 --> 01:01:32,710 It's got some dimensions to it, like two things in it or three things in it 1297 01:01:32,710 --> 01:01:35,260 or even 50 things inside of it. 1298 01:01:35,260 --> 01:01:39,310 The last word down here is the name that you want to give your data type, 1299 01:01:39,310 --> 01:01:41,980 and it weirdly goes after the curly braces. 1300 01:01:41,980 --> 01:01:45,760 But this is how you invent a data type called person. 1301 01:01:45,760 --> 01:01:48,670 And what this code is implying is that henceforth, 1302 01:01:48,670 --> 01:01:53,980 the compiler clang will know that a person is composed of a name that's 1303 01:01:53,980 --> 01:01:56,770 a string and a number that's a string. 1304 01:01:56,770 --> 01:01:59,770 And you don't have to worry about having multiple arrays now. 1305 01:01:59,770 --> 01:02:03,950 You can just have an array of people moving forward. 1306 01:02:03,950 --> 01:02:05,918 So how can we go about using this? 1307 01:02:05,918 --> 01:02:07,960 Well, let me go back to my code from before where 1308 01:02:07,960 --> 01:02:09,340 I was implementing a phone book. 1309 01:02:09,340 --> 01:02:11,715 And why don't we enhance the phone book code a little bit 1310 01:02:11,715 --> 01:02:14,230 by borrowing some of that new syntax? 1311 01:02:14,230 --> 01:02:16,720 Let me go to the top of my program above main 1312 01:02:16,720 --> 01:02:19,660 and define a type that's a structure or a data 1313 01:02:19,660 --> 01:02:24,500 structure that has a name inside of it and that has a number inside of it. 1314 01:02:24,500 --> 01:02:28,150 And the name of this new structure again is going to be called person. 1315 01:02:28,150 --> 01:02:33,550 Inside of my code now, let me go ahead and delete this old stuff temporarily. 1316 01:02:33,550 --> 01:02:37,510 Let me give myself an array called people of size 2. 1317 01:02:37,510 --> 01:02:40,997 And I'm going to use the non-terse way to do this. 1318 01:02:40,997 --> 01:02:42,580 I'm not going to use the curly braces. 1319 01:02:42,580 --> 01:02:47,140 I'm going to more pedantic spell out what I want in this array of size 2 1320 01:02:47,140 --> 01:02:50,860 at location 0, which is the first person in an array 1321 01:02:50,860 --> 01:02:52,690 because you always start counting at 0. 1322 01:02:52,690 --> 01:02:56,500 I'm going to give that person a name of quote unquote Carter. 1323 01:02:56,500 --> 01:02:59,980 And the dot is admittedly one new piece of syntax today too. 1324 01:02:59,980 --> 01:03:02,410 The dot means go inside of that structure 1325 01:03:02,410 --> 01:03:06,190 and access the variable called name and give it this value Carter. 1326 01:03:06,190 --> 01:03:08,350 Similarly, if I'm going to give Carter a number, 1327 01:03:08,350 --> 01:03:13,030 I can go into people bracket 0 dot number and give that the same thing 1328 01:03:13,030 --> 01:03:17,950 as before plus 1-617-495-1000. 1329 01:03:17,950 --> 01:03:20,230 And then I can do the same for myself here-- 1330 01:03:20,230 --> 01:03:24,145 people bracket-- where should I go? 1331 01:03:24,145 --> 01:03:26,107 OK, one because again, two elements. 1332 01:03:26,107 --> 01:03:27,440 But we started counting at zero. 1333 01:03:27,440 --> 01:03:29,420 Bracket name equals quote unquote David. 1334 01:03:29,420 --> 01:03:34,340 And then lastly, people bracket 1 dot number equals quote unquote plus 1335 01:03:34,340 --> 01:03:40,370 1-949-468-2750. 1336 01:03:40,370 --> 01:03:43,250 So now if I scroll down here to my logic, 1337 01:03:43,250 --> 01:03:46,130 I don't think this part needs to change too much. 1338 01:03:46,130 --> 01:03:50,680 I'm still, for the sake of discussion, going to iterate 2 times from i 1339 01:03:50,680 --> 01:03:53,630 is 0 on up to but not through 2. 1340 01:03:53,630 --> 01:03:56,670 But I think this line of code needs to change. 1341 01:03:56,670 --> 01:04:04,910 How should I now refer to the i-th person's name as I iterate? 1342 01:04:04,910 --> 01:04:08,250 What should I compare quote unquote David to this time? 1343 01:04:08,250 --> 01:04:08,750 Let me see. 1344 01:04:08,750 --> 01:04:10,050 On the end here? 1345 01:04:10,050 --> 01:04:13,078 AUDIENCE: People bracket i dot name. 1346 01:04:13,078 --> 01:04:14,870 DAVID J. MALAN: Yeah, people bracket i dot name. 1347 01:04:14,870 --> 01:04:15,170 Why? 1348 01:04:15,170 --> 01:04:16,837 Because people is the name of the array. 1349 01:04:16,837 --> 01:04:20,480 Bracket i is the i-th person that we're iterating over in the current loop-- 1350 01:04:20,480 --> 01:04:23,240 first zero, then one, maybe higher if it had more people. 1351 01:04:23,240 --> 01:04:26,570 Then dot is our new syntax for going inside of a data structure 1352 01:04:26,570 --> 01:04:29,870 and accessing a variable therein which in this case is name. 1353 01:04:29,870 --> 01:04:32,280 And so I can compare David just as before. 1354 01:04:32,280 --> 01:04:36,890 So it's a little more verbose, but now arguably this is a better program 1355 01:04:36,890 --> 01:04:42,470 because now these people are full fledged data types unto themselves. 1356 01:04:42,470 --> 01:04:44,810 There's no more honor system inside of my loop 1357 01:04:44,810 --> 01:04:47,102 that this is going to line up because in just a moment, 1358 01:04:47,102 --> 01:04:49,867 I'm going to fix this one last remnant of the previous version. 1359 01:04:49,867 --> 01:04:51,950 And if I can call back on you again, what should I 1360 01:04:51,950 --> 01:04:54,830 change numbers bracket i to this time? 1361 01:04:54,830 --> 01:05:00,760 AUDIENCE: [INAUDIBLE] dot number. 1362 01:05:00,760 --> 01:05:02,390 DAVID J. MALAN: Dot number, exactly. 1363 01:05:02,390 --> 01:05:04,940 So gone is the honor system that just assumes 1364 01:05:04,940 --> 01:05:08,060 that bracket i in this array lines up with bracket i in this other array. 1365 01:05:08,060 --> 01:05:08,810 Now why? 1366 01:05:08,810 --> 01:05:10,430 There's only one array. 1367 01:05:10,430 --> 01:05:11,990 It's an array called people. 1368 01:05:11,990 --> 01:05:14,060 The things it stores are persons. 1369 01:05:14,060 --> 01:05:15,800 A person has a name and a number. 1370 01:05:15,800 --> 01:05:18,217 And so even though it's kind of marginal admittedly given 1371 01:05:18,217 --> 01:05:21,050 that this is a short program and given that this kind of made things 1372 01:05:21,050 --> 01:05:23,300 look more complicated at first glance, we're 1373 01:05:23,300 --> 01:05:26,540 now laying the foundation for just a better design because you really 1374 01:05:26,540 --> 01:05:29,180 can't screw up now the association of names 1375 01:05:29,180 --> 01:05:32,780 with numbers because every person's name and number is, so to speak, 1376 01:05:32,780 --> 01:05:36,710 encapsulated inside of the same data type. 1377 01:05:36,710 --> 01:05:38,270 And that's a term of art in CS. 1378 01:05:38,270 --> 01:05:41,720 Encapsulation means to encapsulate-- that is, contain-- 1379 01:05:41,720 --> 01:05:43,830 related pieces of information. 1380 01:05:43,830 --> 01:05:49,670 And thus, we have a person that encapsulates two other data types, name 1381 01:05:49,670 --> 01:05:50,450 and number. 1382 01:05:50,450 --> 01:05:52,310 And this just sets the foundation for all 1383 01:05:52,310 --> 01:05:55,190 of the cool stuff we've talked about and you use every day. 1384 01:05:55,190 --> 01:05:55,993 What is an image? 1385 01:05:55,993 --> 01:05:58,910 Well, recall that an image is a bunch of pixels or dots on the screen. 1386 01:05:58,910 --> 01:06:02,570 Every one of those dots has RGB values associated 1387 01:06:02,570 --> 01:06:04,430 with it-- red, green, and blue. 1388 01:06:04,430 --> 01:06:07,760 You could imagine now creating a structure in C probably where 1389 01:06:07,760 --> 01:06:11,540 maybe you have three values, three variables-- one called red, 1390 01:06:11,540 --> 01:06:13,400 one called green, one called blue. 1391 01:06:13,400 --> 01:06:15,980 And then you could name the thing not person but pixel. 1392 01:06:15,980 --> 01:06:19,910 And now you could store in C three different colors-- some amount of red, 1393 01:06:19,910 --> 01:06:23,978 some green, some blue-- and collectively treat it as the color of a pixel. 1394 01:06:23,978 --> 01:06:27,020 And you could imagine doing something similar perhaps for video or music. 1395 01:06:27,020 --> 01:06:30,440 Music, you might have three variables-- one for the musical note, 1396 01:06:30,440 --> 01:06:32,660 the duration, the loudness of it. 1397 01:06:32,660 --> 01:06:36,120 And you can imagine coming up with your own data type for music as well. 1398 01:06:36,120 --> 01:06:37,370 So this is a little low level. 1399 01:06:37,370 --> 01:06:39,920 We're just using like a familiar contacts application. 1400 01:06:39,920 --> 01:06:44,270 But we now have the way in code to express most any type of data 1401 01:06:44,270 --> 01:06:48,260 that we might want to implement or discuss ultimately. 1402 01:06:48,260 --> 01:06:53,510 So any questions now on struct or defining our own types, 1403 01:06:53,510 --> 01:06:58,110 the purposes for which are to use arrays but use them more responsibly 1404 01:06:58,110 --> 01:07:01,280 now in a better design but also to lay the foundation 1405 01:07:01,280 --> 01:07:05,880 for implementing cooler and cooler stuff per our week zero discussion? 1406 01:07:05,880 --> 01:07:06,380 Yeah. 1407 01:07:06,380 --> 01:07:07,713 AUDIENCE: What's the [INAUDIBLE] 1408 01:07:07,713 --> 01:07:10,713 DAVID J. MALAN: What's the difference between this and an object in an object 1409 01:07:10,713 --> 01:07:11,550 oriented language? 1410 01:07:11,550 --> 01:07:14,390 So slight side note, C is not object-oriented. 1411 01:07:14,390 --> 01:07:17,990 Languages like Java and C++ and others which you might have heard 1412 01:07:17,990 --> 01:07:21,200 of, programmed yourself, had friends program in, are object oriented 1413 01:07:21,200 --> 01:07:25,430 languages in those languages they have things called classes or objects which 1414 01:07:25,430 --> 01:07:26,450 are interrelated. 1415 01:07:26,450 --> 01:07:30,020 And objects can store not just data, like variables. 1416 01:07:30,020 --> 01:07:34,490 Objects can also store functions, and you can kind of sort of do this in C. 1417 01:07:34,490 --> 01:07:36,380 But it's not sort of conventional. 1418 01:07:36,380 --> 01:07:39,770 In C, you have data structures that store data. 1419 01:07:39,770 --> 01:07:44,780 In languages like Java and C+, you have objects that store data and functions 1420 01:07:44,780 --> 01:07:45,410 together. 1421 01:07:45,410 --> 01:07:47,790 Python is an object-oriented language as well. 1422 01:07:47,790 --> 01:07:51,270 So we'll see this issue in a few weeks, but let me wave my hands at it for now. 1423 01:07:51,270 --> 01:07:51,770 Yeah. 1424 01:07:51,770 --> 01:07:53,755 AUDIENCE: Could you use this [INAUDIBLE]?? 1425 01:07:53,755 --> 01:07:54,380 DAVID J. MALAN: Yes. 1426 01:07:54,380 --> 01:07:57,020 Could you use this struct to redefine how an int is defined? 1427 01:07:57,020 --> 01:07:58,250 Short answer, yes. 1428 01:07:58,250 --> 01:08:01,700 We talked a couple of times now about integer overflow. 1429 01:08:01,700 --> 01:08:05,900 And most recently, you might have seen me mention the bug in iOS and Mac OS 1430 01:08:05,900 --> 01:08:08,480 that was literally related to an int overflow. 1431 01:08:08,480 --> 01:08:12,890 That's the result of ints only storing 4 bytes or 32 bits 1432 01:08:12,890 --> 01:08:15,800 or even as long as 64 bits or 8 bytes. 1433 01:08:15,800 --> 01:08:16,880 But it's finite. 1434 01:08:16,880 --> 01:08:19,520 But if you want to implement some financial software 1435 01:08:19,520 --> 01:08:22,100 or some scientific or mathematical software that 1436 01:08:22,100 --> 01:08:25,970 allows you to count way bigger than a typical int or a long, 1437 01:08:25,970 --> 01:08:29,135 you could imagine John coming up with your own structure. 1438 01:08:29,135 --> 01:08:31,260 And in fact, in some languages there is a structure 1439 01:08:31,260 --> 01:08:35,370 called big int, which allows you to express even bigger numbers. 1440 01:08:35,370 --> 01:08:35,970 How? 1441 01:08:35,970 --> 01:08:40,410 Well, maybe you store inside of a big ant an array of values. 1442 01:08:40,410 --> 01:08:43,492 And you somehow allow yourself to store more and more bits 1443 01:08:43,492 --> 01:08:45,450 based on how high you want to be able to count. 1444 01:08:45,450 --> 01:08:46,470 So in short, yes. 1445 01:08:46,470 --> 01:08:49,840 We now have the ability now to do most anything we want in the language 1446 01:08:49,840 --> 01:08:52,200 even if it's not built in for us. 1447 01:08:52,200 --> 01:08:53,100 Other questions. 1448 01:08:53,100 --> 01:08:58,762 AUDIENCE: [INAUDIBLE] 1449 01:08:58,762 --> 01:09:01,470 DAVID J. MALAN: Could you define a name and a number in the same line? 1450 01:09:01,470 --> 01:09:02,069 Sort of. 1451 01:09:02,069 --> 01:09:03,986 It starts to get syntactically a little messy, 1452 01:09:03,986 --> 01:09:07,229 so I did it a little more pedantic line by line. 1453 01:09:07,229 --> 01:09:07,870 Good question. 1454 01:09:07,870 --> 01:09:08,430 Over here. 1455 01:09:08,430 --> 01:09:12,910 AUDIENCE: [INAUDIBLE] function you use for the function 1456 01:09:12,910 --> 01:09:15,340 at the bottom of the [INAUDIBLE]. 1457 01:09:15,340 --> 01:09:19,029 Could you do something like that [INAUDIBLE]?? 1458 01:09:19,029 --> 01:09:21,399 DAVID J. MALAN: Prototypes-- you have to do A and C. You 1459 01:09:21,399 --> 01:09:25,029 have to define anything you're going to use or declare anything you're going 1460 01:09:25,029 --> 01:09:26,779 to use before you actually use it. 1461 01:09:26,779 --> 01:09:30,830 So it is deliberate that I put it at the top of my code in this file. 1462 01:09:30,830 --> 01:09:34,870 Otherwise, the compiler would not know what I mean by person when I first 1463 01:09:34,870 --> 01:09:37,750 use it here on what's line 14. 1464 01:09:37,750 --> 01:09:41,479 So it has to come first, or it has to be put into something like a header file 1465 01:09:41,479 --> 01:09:44,979 so that you include it at the very top of your code. 1466 01:09:44,979 --> 01:09:46,779 Other questions over here. 1467 01:09:46,779 --> 01:09:47,662 Yeah. 1468 01:09:47,662 --> 01:09:53,282 AUDIENCE: [INAUDIBLE] 1469 01:09:53,282 --> 01:09:54,990 DAVID J. MALAN: Yeah, good question, and we'll 1470 01:09:54,990 --> 01:09:58,500 come back to this later in the term when we talk about SQL, a database language, 1471 01:09:58,500 --> 01:10:00,630 and storing things in actual databases. 1472 01:10:00,630 --> 01:10:04,320 Generally speaking, even though we humans call things phone numbers, 1473 01:10:04,320 --> 01:10:07,800 or in the US, we have social security numbers, those types of numbers 1474 01:10:07,800 --> 01:10:12,060 often have other punctuation in it, like dashes, parentheses, pluses, 1475 01:10:12,060 --> 01:10:13,380 and so forth. 1476 01:10:13,380 --> 01:10:17,340 You could not store any of that syntax or that punctuation inside of an int. 1477 01:10:17,340 --> 01:10:18,880 You could only store numbers. 1478 01:10:18,880 --> 01:10:20,940 So one motivation for using a string is just 1479 01:10:20,940 --> 01:10:24,570 I can store whatever the human wanted me to store, including parentheses 1480 01:10:24,570 --> 01:10:25,690 and so forth. 1481 01:10:25,690 --> 01:10:29,428 Another reason for storing things as strings, 1482 01:10:29,428 --> 01:10:31,470 even if they look like numbers, is in the context 1483 01:10:31,470 --> 01:10:33,088 of zip codes in the United States. 1484 01:10:33,088 --> 01:10:34,380 Again, we'll come back to this. 1485 01:10:34,380 --> 01:10:36,690 But long story short-- years ago, actually-- 1486 01:10:36,690 --> 01:10:39,732 I was using Microsoft Outlook for my email client. 1487 01:10:39,732 --> 01:10:41,190 And eventually I switched to Gmail. 1488 01:10:41,190 --> 01:10:42,900 And this is like 10 plus years ago now. 1489 01:10:42,900 --> 01:10:47,430 And Outlook at the time lets you export all of your contacts as a CSV file-- 1490 01:10:47,430 --> 01:10:48,780 Comma Separated Values. 1491 01:10:48,780 --> 01:10:50,610 More on that in the weeks to come too. 1492 01:10:50,610 --> 01:10:52,402 And that just means I could download a text 1493 01:10:52,402 --> 01:10:55,710 file with all of my friends and family and their numbers inside of it. 1494 01:10:55,710 --> 01:10:59,812 Unfortunately, I open that same CSV file with Excel, I think, at the time 1495 01:10:59,812 --> 01:11:01,770 just to kind of spot check it and see if what's 1496 01:11:01,770 --> 01:11:03,400 in there was what it was expected. 1497 01:11:03,400 --> 01:11:06,870 And I must have instinctively hit, like, Command or Control-S to save it. 1498 01:11:06,870 --> 01:11:09,900 And Excel at least has this habit of sort of reformatting your data. 1499 01:11:09,900 --> 01:11:12,630 If things look like numbers, it treats them as numbers. 1500 01:11:12,630 --> 01:11:14,040 And Apple Numbers does this too. 1501 01:11:14,040 --> 01:11:16,080 Google Spreadsheets does this to nowadays. 1502 01:11:16,080 --> 01:11:23,040 But long story short, I then imported my mildly saved CSV file into Gmail. 1503 01:11:23,040 --> 01:11:26,760 And now 10 plus years later, I'm still occasionally finding friends and family 1504 01:11:26,760 --> 01:11:32,640 members whose zip codes are in Cambridge, Massachusetts 2138, 1505 01:11:32,640 --> 01:11:36,300 which is missing the 0 because we here in Cambridge are 02138. 1506 01:11:36,300 --> 01:11:39,420 And that's because I treated or I let Excel 1507 01:11:39,420 --> 01:11:42,630 treat what looks like a number as an actual number or int, 1508 01:11:42,630 --> 01:11:45,540 and now leading zeros become a problem because mathematically, they 1509 01:11:45,540 --> 01:11:48,900 mean nothing, but in the mail system, they do-- 1510 01:11:48,900 --> 01:11:50,070 sending envelopes and such. 1511 01:11:50,070 --> 01:11:51,653 All right, other final questions here. 1512 01:11:51,653 --> 01:11:54,660 AUDIENCE: [INAUDIBLE] 1513 01:11:54,660 --> 01:11:58,500 DAVID J. MALAN: Yeah, so could I have used a 2D or two dimensional 1514 01:11:58,500 --> 01:12:02,730 array to solve the problem earlier of having just one array? 1515 01:12:02,730 --> 01:12:06,750 Yes, but one, I would argue it's less readable, especially 1516 01:12:06,750 --> 01:12:08,640 as I get lots of names and numbers. 1517 01:12:08,640 --> 01:12:11,730 And two, that too is also kind of relying on the honor system. 1518 01:12:11,730 --> 01:12:14,940 It would be all too easy to omit some of the square brackets in the two 1519 01:12:14,940 --> 01:12:15,940 dimensional array. 1520 01:12:15,940 --> 01:12:20,010 So I would argue it too is not as good as introducing a struct. 1521 01:12:20,010 --> 01:12:21,180 More on that down the road. 1522 01:12:21,180 --> 01:12:26,070 Two dimensional arrays just means arrays of arrays, as you might infer. 1523 01:12:26,070 --> 01:12:28,080 All right, so now that we have this ability 1524 01:12:28,080 --> 01:12:32,210 to store different types of data like contacts in a phone book, 1525 01:12:32,210 --> 01:12:33,960 having names and addresses, let's actually 1526 01:12:33,960 --> 01:12:36,780 take a step back and consider how we might now 1527 01:12:36,780 --> 01:12:41,850 solve one of the original problems by actually sorting the information we're 1528 01:12:41,850 --> 01:12:45,930 given in advance and considering, per our discussion earlier, just how 1529 01:12:45,930 --> 01:12:48,900 costly, how time consuming is that because that might tip 1530 01:12:48,900 --> 01:12:53,010 the scales in favor of sorting, then searching, or maybe just 1531 01:12:53,010 --> 01:12:54,960 not sorting and only searching. 1532 01:12:54,960 --> 01:12:58,470 It'll give us a sense of just how expensive, so to speak, 1533 01:12:58,470 --> 01:13:00,345 sorting something actually is. 1534 01:13:00,345 --> 01:13:02,220 Well, what's the formulation of this problem? 1535 01:13:02,220 --> 01:13:03,780 It's the same thing as week zero. 1536 01:13:03,780 --> 01:13:04,950 We've got input to sort. 1537 01:13:04,950 --> 01:13:07,120 We want it to be output as sorted. 1538 01:13:07,120 --> 01:13:10,560 So for instance, if we're taking unsorted input as input, 1539 01:13:10,560 --> 01:13:13,895 we want the sorted output as the result. More concretely, 1540 01:13:13,895 --> 01:13:15,270 if we've got numbers like these-- 1541 01:13:15,270 --> 01:13:20,100 63852741, which are just randomly arranged numbers-- 1542 01:13:20,100 --> 01:13:24,510 we want to get back out 12345678. 1543 01:13:24,510 --> 01:13:26,440 So we just want those things to be sorted. 1544 01:13:26,440 --> 01:13:28,470 So again, inside of the black box here is 1545 01:13:28,470 --> 01:13:33,460 going to be one or more algorithms that actually gets this job done. 1546 01:13:33,460 --> 01:13:35,680 So how might we go about doing this? 1547 01:13:35,680 --> 01:13:39,180 Well, just to vary things a bit more, I think we have a chance here 1548 01:13:39,180 --> 01:13:41,580 for a bit more audience participation. 1549 01:13:41,580 --> 01:13:43,790 But this time, we need eight people if we may. 1550 01:13:43,790 --> 01:13:46,290 All of you have to be comfortable appearing on the internet. 1551 01:13:46,290 --> 01:13:49,165 OK, so this is actually quite convenient that you're all quite close. 1552 01:13:49,165 --> 01:13:52,752 How about 1, 2, 3, 4, 5, 6, 7-- 1553 01:13:52,752 --> 01:13:57,060 oh, OK, and someone volunteering their friend-- number eight. 1554 01:13:57,060 --> 01:13:58,020 Come on down. 1555 01:13:58,020 --> 01:13:58,875 Come on down. 1556 01:13:58,875 --> 01:14:00,750 And if you could, I'm going to set things up. 1557 01:14:00,750 --> 01:14:03,630 If you all could join Valerie, my colleague over there, 1558 01:14:03,630 --> 01:14:08,910 to give you a prop to use here, we'll go ahead in just a moment 1559 01:14:08,910 --> 01:14:11,835 and try to find some numbers at hand. 1560 01:14:11,835 --> 01:14:15,180 1561 01:14:15,180 --> 01:14:20,820 In just a moment, each of our volunteers is going to be representing an integer. 1562 01:14:20,820 --> 01:14:25,380 And that integer is initially going to be in unsorted order. 1563 01:14:25,380 --> 01:14:29,100 And I claim that using an algorithm, step by step instructions, 1564 01:14:29,100 --> 01:14:33,820 we can probably sort these folks in at least a couple of different ways. 1565 01:14:33,820 --> 01:14:38,430 So they're in wardrobe right now just getting their very own Harvard T-shirts 1566 01:14:38,430 --> 01:14:42,945 with a Jersey number on it, which will then represent an element of our array. 1567 01:14:42,945 --> 01:14:46,650 1568 01:14:46,650 --> 01:14:51,270 Give us just a moment to finish getting the attire ready. 1569 01:14:51,270 --> 01:14:56,040 They're being handed a shirt and a number. 1570 01:14:56,040 --> 01:14:58,620 And let me ask the audience for just a moment. 1571 01:14:58,620 --> 01:15:02,760 As we have these numbers up here on the screen, these numbers too are unsorted. 1572 01:15:02,760 --> 01:15:04,353 They're just in random order. 1573 01:15:04,353 --> 01:15:05,520 And let me ask the audience. 1574 01:15:05,520 --> 01:15:10,980 How would you go about sorting these eight numbers on the screen? 1575 01:15:10,980 --> 01:15:12,930 How would you go about sorting these? 1576 01:15:12,930 --> 01:15:14,138 Yeah, what are your thoughts? 1577 01:15:14,138 --> 01:15:20,327 AUDIENCE: [INAUDIBLE] the number at the end, the following number. 1578 01:15:20,327 --> 01:15:20,910 DAVID J. MALAN: OK. 1579 01:15:20,910 --> 01:15:24,547 AUDIENCE: The following number is bigger, then I keep it as it is. 1580 01:15:24,547 --> 01:15:25,130 DAVID J. MALAN: OK. 1581 01:15:25,130 --> 01:15:26,800 AUDIENCE: If not, then [INAUDIBLE]. 1582 01:15:26,800 --> 01:15:29,352 DAVID J. MALAN: OK, so just to recap, you would start 1583 01:15:29,352 --> 01:15:30,810 with one of the numbers on the end. 1584 01:15:30,810 --> 01:15:33,210 You would look to the number to the right or to the left of it, 1585 01:15:33,210 --> 01:15:34,530 depending on which end you start at. 1586 01:15:34,530 --> 01:15:37,113 And if it's out of order, you would just start to swap things. 1587 01:15:37,113 --> 01:15:38,350 And that seems reasonable. 1588 01:15:38,350 --> 01:15:40,435 There's a whole bunch of mistakes to fix here 1589 01:15:40,435 --> 01:15:42,060 because things are pretty out of order. 1590 01:15:42,060 --> 01:15:45,060 But probably, if you start to solve small problems at a time, 1591 01:15:45,060 --> 01:15:47,910 you can achieve the end result of getting the whole thing sorted. 1592 01:15:47,910 --> 01:15:50,820 Other instincts, if you were just handed these numbers, how 1593 01:15:50,820 --> 01:15:54,077 you might go about sorting them? 1594 01:15:54,077 --> 01:15:54,660 How might you? 1595 01:15:54,660 --> 01:15:55,856 Yeah, in the back. 1596 01:15:55,856 --> 01:16:00,140 AUDIENCE: [INAUDIBLE] 1597 01:16:00,140 --> 01:16:01,920 DAVID J. MALAN: OK, I like that. 1598 01:16:01,920 --> 01:16:05,840 So to recap there, find the smallest one first and put it at the beginning, 1599 01:16:05,840 --> 01:16:07,070 if I heard you correctly. 1600 01:16:07,070 --> 01:16:10,342 And then presumably, you could do that again and again and again. 1601 01:16:10,342 --> 01:16:13,050 And that would seem to give you a couple of different algorithms. 1602 01:16:13,050 --> 01:16:15,710 And if you all are attired here-- 1603 01:16:15,710 --> 01:16:18,800 do you want to come on up if you're ready? 1604 01:16:18,800 --> 01:16:20,660 We had some [? felt ?] volunteers too. 1605 01:16:20,660 --> 01:16:23,030 Come on over. 1606 01:16:23,030 --> 01:16:25,520 So if you all would like to line yourselves up 1607 01:16:25,520 --> 01:16:27,770 facing the audience in exactly this order-- so 1608 01:16:27,770 --> 01:16:30,170 whoever is number zero should be way over here, 1609 01:16:30,170 --> 01:16:33,710 and whoever is number five should be way over there. 1610 01:16:33,710 --> 01:16:36,920 Feel free to distance as much as you'd like and scooch a little with this way 1611 01:16:36,920 --> 01:16:38,480 if you could. 1612 01:16:38,480 --> 01:16:39,653 OK, all right. 1613 01:16:39,653 --> 01:16:40,820 And make a little more room. 1614 01:16:40,820 --> 01:16:41,870 So seven-- let's see. 1615 01:16:41,870 --> 01:16:44,170 5, 2, 7, 4-- 1616 01:16:44,170 --> 01:16:45,090 AUDIENCE: [INAUDIBLE] 1617 01:16:45,090 --> 01:16:46,733 DAVID J. MALAN: 4, hopefully 1. 1618 01:16:46,733 --> 01:16:47,900 Yeah, keep them to the side. 1619 01:16:47,900 --> 01:16:51,210 OK, 1, 6, and there we go-- 1620 01:16:51,210 --> 01:16:51,710 3. 1621 01:16:51,710 --> 01:16:52,543 Come on over, three. 1622 01:16:52,543 --> 01:16:53,690 I was looking for you. 1623 01:16:53,690 --> 01:16:57,075 All right, so here, we have an array of eight numbers-- 1624 01:16:57,075 --> 01:16:58,200 eight integers if you will. 1625 01:16:58,200 --> 01:17:00,770 And do you want to each say a quick hello to the group? 1626 01:17:00,770 --> 01:17:02,750 AUDIENCE: Hello, I'm Quinn. 1627 01:17:02,750 --> 01:17:04,892 Go [INAUDIBLE]. 1628 01:17:04,892 --> 01:17:05,850 AUDIENCE: Hi, everyone. 1629 01:17:05,850 --> 01:17:08,060 I'm [INAUDIBLE]. 1630 01:17:08,060 --> 01:17:09,460 AUDIENCE: Hey, I'm Mitchell. 1631 01:17:09,460 --> 01:17:10,460 AUDIENCE: Hi, I'm Brett. 1632 01:17:10,460 --> 01:17:12,675 And also, go [INAUDIBLE]. 1633 01:17:12,675 --> 01:17:13,550 AUDIENCE: I'm Hannah. 1634 01:17:13,550 --> 01:17:15,137 Go [INAUDIBLE]. 1635 01:17:15,137 --> 01:17:16,220 AUDIENCE: Hi, I'm Matthew. 1636 01:17:16,220 --> 01:17:18,058 Go [INAUDIBLE] 1637 01:17:18,058 --> 01:17:19,100 AUDIENCE: Hi, I'm Miriam. 1638 01:17:19,100 --> 01:17:20,720 Go Winthrop. 1639 01:17:20,720 --> 01:17:22,905 AUDIENCE: Hi, I'm Celeste, and go Strauss. 1640 01:17:22,905 --> 01:17:23,780 DAVID J. MALAN: Wonderful. 1641 01:17:23,780 --> 01:17:26,930 Well, welcome all to the stage, and let's just visualize, 1642 01:17:26,930 --> 01:17:29,430 perhaps organically, how you eight would solve this problem. 1643 01:17:29,430 --> 01:17:32,540 So we currently have the numbers 0 through 7 quite out of order. 1644 01:17:32,540 --> 01:17:36,508 Could you go ahead and just yourselves from 0 through 7? 1645 01:17:36,508 --> 01:17:37,341 AUDIENCE: Thank you. 1646 01:17:37,341 --> 01:17:41,400 1647 01:17:41,400 --> 01:17:44,200 DAVID J. MALAN: OK, so what did they just do? 1648 01:17:44,200 --> 01:17:44,700 OK, yes. 1649 01:17:44,700 --> 01:17:46,260 First of all, yes, very well done. 1650 01:17:46,260 --> 01:17:50,030 1651 01:17:50,030 --> 01:17:53,037 How would you describe what they just did? 1652 01:17:53,037 --> 01:17:53,870 Well, let's do this. 1653 01:17:53,870 --> 01:17:56,060 Could you go back into that order on the screen-- 1654 01:17:56,060 --> 01:18:00,320 52741630? 1655 01:18:00,320 --> 01:18:03,440 And could you do exactly what you just did again? 1656 01:18:03,440 --> 01:18:04,750 Sort yourselves. 1657 01:18:04,750 --> 01:18:08,040 1658 01:18:08,040 --> 01:18:09,030 All right, what did-- 1659 01:18:09,030 --> 01:18:09,630 OK, yes. 1660 01:18:09,630 --> 01:18:10,470 Well done again. 1661 01:18:10,470 --> 01:18:14,190 1662 01:18:14,190 --> 01:18:17,480 All right, so admittedly, there's kind of a lot going on because each of you, 1663 01:18:17,480 --> 01:18:21,150 except number four, are doing something in parallel all at the same time. 1664 01:18:21,150 --> 01:18:23,390 And that's not really how a computer typically works. 1665 01:18:23,390 --> 01:18:26,900 Just like a computer can only look at one memory location, at one locker, 1666 01:18:26,900 --> 01:18:31,490 at a time, so can a computer only move one number at a time-- sort of opening 1667 01:18:31,490 --> 01:18:33,717 a locker, checking what's there, moving it as needed. 1668 01:18:33,717 --> 01:18:36,800 So let's try this more methodically based on the two audience suggestions. 1669 01:18:36,800 --> 01:18:42,350 If you all could randomize yourself again to 52741630, 1670 01:18:42,350 --> 01:18:44,762 let's take the second of those approaches first. 1671 01:18:44,762 --> 01:18:46,220 I'm going to look at these numbers. 1672 01:18:46,220 --> 01:18:48,870 And even though I as the human can obviously see all the numbers 1673 01:18:48,870 --> 01:18:50,930 and I just kind of have the intuition for how to fix this, 1674 01:18:50,930 --> 01:18:53,180 we got to be more methodical because eventually, we've 1675 01:18:53,180 --> 01:18:55,500 got to translate this to pseudo code and then code. 1676 01:18:55,500 --> 01:18:56,390 So let me see. 1677 01:18:56,390 --> 01:18:59,200 I'm going to search for, as you proposed, the smallest number. 1678 01:18:59,200 --> 01:19:00,950 And I'm going to start from left to right. 1679 01:19:00,950 --> 01:19:04,100 I could do it right to left, but left to right just tends to be convention. 1680 01:19:04,100 --> 01:19:07,080 All right, 5 at this moment is the smallest number I've seen. 1681 01:19:07,080 --> 01:19:09,818 So I'm going to remember that in a variable, if you will. 1682 01:19:09,818 --> 01:19:11,360 Now I'm going to take one more step-- 1683 01:19:11,360 --> 01:19:11,930 2. 1684 01:19:11,930 --> 01:19:15,260 OK, 2 I'm going to compare to the variable in mind, obviously smaller. 1685 01:19:15,260 --> 01:19:19,160 I'm going to forget about 5 and only now remember 2 as the now smallest 1686 01:19:19,160 --> 01:19:19,820 elements. 1687 01:19:19,820 --> 01:19:23,090 7, nope-- I'm going to ignore that because it's not smaller than the 2 1688 01:19:23,090 --> 01:19:23,810 I have in mind. 1689 01:19:23,810 --> 01:19:27,170 4, 1-- OK, I'm going to update the variable in mind 1690 01:19:27,170 --> 01:19:28,430 because that's indeed smaller. 1691 01:19:28,430 --> 01:19:31,130 Now obviously, we the humans know that's getting pretty small. 1692 01:19:31,130 --> 01:19:32,180 Maybe it's the end. 1693 01:19:32,180 --> 01:19:35,630 I have to check all values to see if there's something even smaller 1694 01:19:35,630 --> 01:19:38,435 because 6 is not, 3 is not, but 0 is. 1695 01:19:38,435 --> 01:19:39,560 And what's your name again? 1696 01:19:39,560 --> 01:19:40,430 AUDIENCE: Celeste. 1697 01:19:40,430 --> 01:19:41,270 DAVID J. MALAN: Celeste. 1698 01:19:41,270 --> 01:19:47,960 Where should Celeste or number 0 go according to this proposed algorithm? 1699 01:19:47,960 --> 01:19:49,590 All right, I'm seeing a lot of this. 1700 01:19:49,590 --> 01:19:52,820 So at the beginning of the array, so before doing this for real, 1701 01:19:52,820 --> 01:19:54,560 let's have you pop out in front. 1702 01:19:54,560 --> 01:19:58,100 And could you all shift and make room for Celeste? 1703 01:19:58,100 --> 01:20:02,030 Is this a good idea to have all of them move or equivalently 1704 01:20:02,030 --> 01:20:04,550 move everything in the array to make room for Celeste 1705 01:20:04,550 --> 01:20:06,920 and number 0 over there? 1706 01:20:06,920 --> 01:20:07,670 No, probably not. 1707 01:20:07,670 --> 01:20:08,878 That felt like a lot of work. 1708 01:20:08,878 --> 01:20:12,200 And even though it happened pretty quickly, that's like seven steps 1709 01:20:12,200 --> 01:20:14,040 to happen just to move her in place. 1710 01:20:14,040 --> 01:20:16,580 So what would be marginally smarter perhaps-- 1711 01:20:16,580 --> 01:20:18,960 a little more efficient, perhaps? 1712 01:20:18,960 --> 01:20:19,460 What's that? 1713 01:20:19,460 --> 01:20:19,910 AUDIENCE: Swapping. 1714 01:20:19,910 --> 01:20:20,490 DAVID J. MALAN: Swapping. 1715 01:20:20,490 --> 01:20:21,532 What do you mean by swap? 1716 01:20:21,532 --> 01:20:23,053 AUDIENCE: Replacing swaps. 1717 01:20:23,053 --> 01:20:24,470 DAVID J. MALAN: OK, replace two values. 1718 01:20:24,470 --> 01:20:28,090 So if you want to go back to where you were, one step Over, number 5, 1719 01:20:28,090 --> 01:20:29,560 he's not in the right place. 1720 01:20:29,560 --> 01:20:30,790 He's got to move eventually. 1721 01:20:30,790 --> 01:20:31,498 So you know what? 1722 01:20:31,498 --> 01:20:34,300 If that's where Celeste belongs, why don't we just swap 5 and 0? 1723 01:20:34,300 --> 01:20:36,925 So if you want to go ahead and exchange places with each other. 1724 01:20:36,925 --> 01:20:38,230 Notice what's just happened. 1725 01:20:38,230 --> 01:20:41,420 The problem I'm trying to solve has gotten smaller. 1726 01:20:41,420 --> 01:20:44,020 Instead of being size 8, now it's size 7. 1727 01:20:44,020 --> 01:20:47,050 Now granted, I moved 5 to another wrong location. 1728 01:20:47,050 --> 01:20:48,940 But if these numbers started off randomly, 1729 01:20:48,940 --> 01:20:52,790 it doesn't really matter where 5 goes until we get him into the right place. 1730 01:20:52,790 --> 01:20:54,040 So I think we've improved. 1731 01:20:54,040 --> 01:20:57,430 And now if I go back, my loop is sort of coming back around. 1732 01:20:57,430 --> 01:21:01,690 I can ignore Celeste and make this a seven step problem and not eight 1733 01:21:01,690 --> 01:21:03,430 because I know she's in the right place. 1734 01:21:03,430 --> 01:21:04,840 2 seems to be the smallest. 1735 01:21:04,840 --> 01:21:05,650 I'll remember that. 1736 01:21:05,650 --> 01:21:07,300 Not 7, not 4-- 1737 01:21:07,300 --> 01:21:09,070 1 seems to be the smallest. 1738 01:21:09,070 --> 01:21:13,600 Now I know as a human this should be my next smallest. 1739 01:21:13,600 --> 01:21:18,100 But why, intuitively, should I keep going, do you think? 1740 01:21:18,100 --> 01:21:20,950 I can't sort of optimize as a human and just say, number 1, 1741 01:21:20,950 --> 01:21:22,760 let's get you into the right place. 1742 01:21:22,760 --> 01:21:24,610 I still want to check the whole array. 1743 01:21:24,610 --> 01:21:25,330 Why? 1744 01:21:25,330 --> 01:21:25,870 Yeah. 1745 01:21:25,870 --> 01:21:28,477 AUDIENCE: Perhaps there's another 1. 1746 01:21:28,477 --> 01:21:30,310 DAVID J. MALAN: Maybe there's another 1, and that 1747 01:21:30,310 --> 01:21:32,030 could be another problem altogether. 1748 01:21:32,030 --> 01:21:32,770 Other thoughts? 1749 01:21:32,770 --> 01:21:33,270 Yeah. 1750 01:21:33,270 --> 01:21:34,458 AUDIENCE: Could be another 0 1751 01:21:34,458 --> 01:21:36,250 DAVID J. MALAN: There could be another 0 indeed, 1752 01:21:36,250 --> 01:21:38,560 but I did go through the list once, right? 1753 01:21:38,560 --> 01:21:39,970 And I kind of know there isn't. 1754 01:21:39,970 --> 01:21:40,570 Your thoughts? 1755 01:21:40,570 --> 01:21:43,300 AUDIENCE: You don't know that every value is represented. 1756 01:21:43,300 --> 01:21:47,387 So maybe there's a [INAUDIBLE] You just don't know what kind of data 1757 01:21:47,387 --> 01:21:48,220 you're working with. 1758 01:21:48,220 --> 01:21:50,600 DAVID J. MALAN: Yeah, I don't necessarily know what is there. 1759 01:21:50,600 --> 01:21:54,940 And honestly, I only stipulated earlier that I'm using one variable in my mind. 1760 01:21:54,940 --> 01:21:58,180 I could use two and remember the two smallest elements I've seen. 1761 01:21:58,180 --> 01:22:00,130 I could use three variables, four. 1762 01:22:00,130 --> 01:22:03,440 But then I'm going to start to use a lot of space in addition to time. 1763 01:22:03,440 --> 01:22:06,880 So if I've stipulated that I only have one variable to solve this problem, 1764 01:22:06,880 --> 01:22:09,190 I don't know anything more about these elements 1765 01:22:09,190 --> 01:22:11,398 because the only thing I'm remembering at this moment 1766 01:22:11,398 --> 01:22:13,490 is number 1 is the smallest element I've seen. 1767 01:22:13,490 --> 01:22:14,290 So I'm going to keep going. 1768 01:22:14,290 --> 01:22:14,710 6? 1769 01:22:14,710 --> 01:22:15,070 Nope. 1770 01:22:15,070 --> 01:22:15,490 3? 1771 01:22:15,490 --> 01:22:15,790 Nope. 1772 01:22:15,790 --> 01:22:16,210 5? 1773 01:22:16,210 --> 01:22:16,710 Nope. 1774 01:22:16,710 --> 01:22:18,670 OK, I know that number 1, and your name was-- 1775 01:22:18,670 --> 01:22:19,378 AUDIENCE: Hannah. 1776 01:22:19,378 --> 01:22:21,850 DAVID J. MALAN: --Hannah is the next smallest element. 1777 01:22:21,850 --> 01:22:24,340 I could have everyone move over to make room, but nope. 1778 01:22:24,340 --> 01:22:25,060 2? 1779 01:22:25,060 --> 01:22:26,980 You know, even though you're so close to where I want you, 1780 01:22:26,980 --> 01:22:29,170 I'm just going to keep it simple and swap you two. 1781 01:22:29,170 --> 01:22:31,570 So granted, I've made the problem a little worse. 1782 01:22:31,570 --> 01:22:35,200 But on average, I could get lucky too and just pop number 2 1783 01:22:35,200 --> 01:22:36,280 into the right place. 1784 01:22:36,280 --> 01:22:38,210 Now let me just accelerate this. 1785 01:22:38,210 --> 01:22:42,550 I can now ignore Hannah and Celeste, making the problem size 6 instead of 8. 1786 01:22:42,550 --> 01:22:43,870 So it's getting smaller. 1787 01:22:43,870 --> 01:22:45,190 7 is the smallest. 1788 01:22:45,190 --> 01:22:46,450 Nope, now 4 is-- 1789 01:22:46,450 --> 01:22:47,920 2 is the smallest. 1790 01:22:47,920 --> 01:22:50,470 Still 2, still 2, still 2. 1791 01:22:50,470 --> 01:22:53,560 So let's go ahead and swap 2 and 7. 1792 01:22:53,560 --> 01:22:56,230 And now I'll just kind of orchestrate it verbally. 1793 01:22:56,230 --> 01:22:57,950 4, you're about to have to do something. 1794 01:22:57,950 --> 01:23:01,750 So we now have 4, 7, 6 3, 5. 1795 01:23:01,750 --> 01:23:04,480 OK, 3-- could you swap with 4? 1796 01:23:04,480 --> 01:23:07,675 All right, now we have 7, 6, 4, 5. 1797 01:23:07,675 --> 01:23:10,420 OK, 4, could you swap with 7? 1798 01:23:10,420 --> 01:23:13,180 Now we have 6, 7, 5. 1799 01:23:13,180 --> 01:23:15,430 5, could you swap with 6? 1800 01:23:15,430 --> 01:23:16,835 And now we have 7, 6. 1801 01:23:16,835 --> 01:23:18,160 6, would you swap at 7? 1802 01:23:18,160 --> 01:23:19,690 And now perhaps round of applause. 1803 01:23:19,690 --> 01:23:20,980 They've sorted themselves. 1804 01:23:20,980 --> 01:23:25,250 OK, hang on there one minute. 1805 01:23:25,250 --> 01:23:27,130 So we'll do this one other approach. 1806 01:23:27,130 --> 01:23:30,220 And my God, that felt so much slower than the first approach, 1807 01:23:30,220 --> 01:23:33,040 but that's, one, because I was kind of providing a long voiceover. 1808 01:23:33,040 --> 01:23:37,690 But two, we were doing one thing at a time whereas the first time, you guys 1809 01:23:37,690 --> 01:23:41,110 had the luxury of moving like eight different CPUs-- 1810 01:23:41,110 --> 01:23:43,690 brains, if you will-- were all operating at the same time. 1811 01:23:43,690 --> 01:23:45,140 And computers like that exist. 1812 01:23:45,140 --> 01:23:47,893 If you have a computer with multiple cores, so to speak, 1813 01:23:47,893 --> 01:23:49,810 that's like having a computer that technically 1814 01:23:49,810 --> 01:23:51,490 can do multiple things at once. 1815 01:23:51,490 --> 01:23:54,470 But software typically, at least as we've written it thus far, 1816 01:23:54,470 --> 01:23:56,195 can only do one thing at a time. 1817 01:23:56,195 --> 01:23:58,070 So in a bit, we'll add up all of these steps. 1818 01:23:58,070 --> 01:23:59,862 But for now, let's take one other approach. 1819 01:23:59,862 --> 01:24:02,140 If you all could reorder yourselves like that-- 1820 01:24:02,140 --> 01:24:06,940 52741630-- let's take the other approach that 1821 01:24:06,940 --> 01:24:10,610 was recommended by just fixing small problems and see where this gets us. 1822 01:24:10,610 --> 01:24:12,730 So we're back in the original order. 1823 01:24:12,730 --> 01:24:14,812 5 and 2 are clearly out of order. 1824 01:24:14,812 --> 01:24:15,520 So you know what? 1825 01:24:15,520 --> 01:24:17,350 Let's just bite this problem off now. 1826 01:24:17,350 --> 01:24:19,090 5 and 2, could you swap? 1827 01:24:19,090 --> 01:24:20,530 Now let me take a next step. 1828 01:24:20,530 --> 01:24:22,330 5 and 7, I think you're OK. 1829 01:24:22,330 --> 01:24:25,000 There's a gap, yes, but that might not be a big deal. 1830 01:24:25,000 --> 01:24:26,360 7 and 4-- problem. 1831 01:24:26,360 --> 01:24:28,420 Let's have you swap. 1832 01:24:28,420 --> 01:24:31,150 OK, 7 and 1, let's have you swap. 1833 01:24:31,150 --> 01:24:34,120 7 and 6, let's have you swap. 1834 01:24:34,120 --> 01:24:35,830 7 and 3, you swap. 1835 01:24:35,830 --> 01:24:37,690 7 and 0, you swap. 1836 01:24:37,690 --> 01:24:39,490 Now let me pause for just a moment. 1837 01:24:39,490 --> 01:24:40,790 Still not sorted. 1838 01:24:40,790 --> 01:24:42,470 So I'm clearly not done. 1839 01:24:42,470 --> 01:24:45,340 But have I improved the problem? 1840 01:24:45,340 --> 01:24:48,100 Right, I can't see-- like before, I can't optimize like 1841 01:24:48,100 --> 01:24:50,210 before because 0 is obviously not here. 1842 01:24:50,210 --> 01:24:54,160 So unless they're still way back there, so it's not like I've gone from 8 steps 1843 01:24:54,160 --> 01:24:56,300 to 7 to 6 just yet. 1844 01:24:56,300 --> 01:24:58,047 But have I made any improvements? 1845 01:24:58,047 --> 01:24:58,630 AUDIENCE: Yes. 1846 01:24:58,630 --> 01:24:59,255 DAVID J. MALAN: Yes. 1847 01:24:59,255 --> 01:25:01,480 In what sense is this improved? 1848 01:25:01,480 --> 01:25:06,050 What's a concrete thing you could point to is better? 1849 01:25:06,050 --> 01:25:06,550 Yeah. 1850 01:25:06,550 --> 01:25:08,110 AUDIENCE: Sorted the highest number. 1851 01:25:08,110 --> 01:25:10,690 DAVID J. MALAN: I've sorted the highest number, which is indeed 7. 1852 01:25:10,690 --> 01:25:15,400 And conversely, if you prefer, Celeste is one step closer to the beginning. 1853 01:25:15,400 --> 01:25:19,970 Now worst case, Celeste is going to have to move one step on each iteration. 1854 01:25:19,970 --> 01:25:22,540 So I might need to do this thing like n total times 1855 01:25:22,540 --> 01:25:24,215 to move her all the way over. 1856 01:25:24,215 --> 01:25:25,340 But that might work out OK. 1857 01:25:25,340 --> 01:25:26,510 Let me see. 1858 01:25:26,510 --> 01:25:27,980 2 and 5, you're good. 1859 01:25:27,980 --> 01:25:29,480 5 and 4, swap you. 1860 01:25:29,480 --> 01:25:31,370 5 and 1, let's swap you. 1861 01:25:31,370 --> 01:25:32,570 5 and 6, you're good. 1862 01:25:32,570 --> 01:25:34,550 6 and 3, let's swap you. 1863 01:25:34,550 --> 01:25:36,650 6 and 0, let's swap you. 1864 01:25:36,650 --> 01:25:38,240 6 and 7, you're good. 1865 01:25:38,240 --> 01:25:39,380 And I think now-- 1866 01:25:39,380 --> 01:25:41,510 notice that the high values, as you noted, 1867 01:25:41,510 --> 01:25:44,150 are sort of bubbling up, if you will, to the end of the list. 1868 01:25:44,150 --> 01:25:45,530 2 and 4, you're good. 1869 01:25:45,530 --> 01:25:46,820 4 and 1, let's swap. 1870 01:25:46,820 --> 01:25:48,080 4 and 5, good. 1871 01:25:48,080 --> 01:25:49,430 5 and 3, swap. 1872 01:25:49,430 --> 01:25:51,780 5 and 0, swap. 1873 01:25:51,780 --> 01:25:53,370 5, 6, 7, of course, are good. 1874 01:25:53,370 --> 01:25:56,175 So now you can sort of see the problem resolving itself. 1875 01:25:56,175 --> 01:25:57,800 And let's just do this part now faster. 1876 01:25:57,800 --> 01:26:00,140 2 and 1, 2 and 4. 1877 01:26:00,140 --> 01:26:04,160 OK, 4 and 3, 4 and 0. 1878 01:26:04,160 --> 01:26:08,877 All right, now 1 and 2, 2, and 3, and 0, and good. 1879 01:26:08,877 --> 01:26:10,460 So we do have some optimization there. 1880 01:26:10,460 --> 01:26:12,860 We don't need to keep going because those all are sorted. 1881 01:26:12,860 --> 01:26:13,910 1 and 2, you're good. 1882 01:26:13,910 --> 01:26:16,730 2 and 0, all right, done. 1883 01:26:16,730 --> 01:26:20,330 1 and 0-- and big round of applause in closing. 1884 01:26:20,330 --> 01:26:24,890 OK, so thank you all. 1885 01:26:24,890 --> 01:26:27,140 We need the puppets back, but you can keep the shirts. 1886 01:26:27,140 --> 01:26:28,840 Thank you for volunteering here. 1887 01:26:28,840 --> 01:26:31,810 Feel free to make your way exits left or right. 1888 01:26:31,810 --> 01:26:33,760 And let's see if, thanks to our volunteers 1889 01:26:33,760 --> 01:26:40,000 here, we can't now formalize a little bit what we did on both passes here. 1890 01:26:40,000 --> 01:26:44,170 I claim that the first algorithm our volunteers kindly acted out 1891 01:26:44,170 --> 01:26:45,730 is what's called selection sort. 1892 01:26:45,730 --> 01:26:50,980 And as the name implied, we selected the smallest elements again and again 1893 01:26:50,980 --> 01:26:53,530 and again, working our way from left to right, 1894 01:26:53,530 --> 01:26:58,100 putting Celeste into the right place, and then continuing with everyone else. 1895 01:26:58,100 --> 01:27:01,060 So selection sort, as it's formally called, 1896 01:27:01,060 --> 01:27:04,000 can be described, for instance, with this pseudo code here-- 1897 01:27:04,000 --> 01:27:06,910 4i from 0 to n minus 1. 1898 01:27:06,910 --> 01:27:08,350 And again, why this? 1899 01:27:08,350 --> 01:27:10,420 This is just how talk about arrays. 1900 01:27:10,420 --> 01:27:14,920 The left end is 0, the right end is n minus 1 where in this case, 1901 01:27:14,920 --> 01:27:16,720 n happened to be eight people. 1902 01:27:16,720 --> 01:27:18,700 So that's 0 through 7. 1903 01:27:18,700 --> 01:27:22,240 So for i from 0 to n minus 1, what did I do? 1904 01:27:22,240 --> 01:27:27,400 I found the smallest number between numbers bracket i and numbers bracket 1905 01:27:27,400 --> 01:27:28,810 n minus 1. 1906 01:27:28,810 --> 01:27:31,030 It's a little cryptic at first glance, but this 1907 01:27:31,030 --> 01:27:34,510 is just a very pseudo code-like way of saying 1908 01:27:34,510 --> 01:27:37,960 find the smallest element among all eight volunteers 1909 01:27:37,960 --> 01:27:43,120 because if i starts at 0 and n minus 1 never changes because there's always 1910 01:27:43,120 --> 01:27:47,230 8, 8 people, so 8 minus 1 is 7, this first 1911 01:27:47,230 --> 01:27:50,320 says find the smallest number between numbers bracket 0 1912 01:27:50,320 --> 01:27:53,350 and numbers bracket 7, if you will. 1913 01:27:53,350 --> 01:27:54,550 Then what do I do? 1914 01:27:54,550 --> 01:27:57,730 Swap the smallest number with numbers bracket i. 1915 01:27:57,730 --> 01:28:01,210 So that's how we got Celeste from over here all the way over there. 1916 01:28:01,210 --> 01:28:03,340 We just swapped those two values. 1917 01:28:03,340 --> 01:28:05,840 What then happens next in this pseudo code? 1918 01:28:05,840 --> 01:28:07,900 i, of course, goes from 0 to 1. 1919 01:28:07,900 --> 01:28:10,060 And that's the technical way of saying now 1920 01:28:10,060 --> 01:28:13,810 find the smallest element among the 7 remaining volunteers, 1921 01:28:13,810 --> 01:28:17,570 ignoring Celeste this time because she was already in the correct location. 1922 01:28:17,570 --> 01:28:19,960 So the problem went from size 8 to size 7. 1923 01:28:19,960 --> 01:28:23,500 And if we repeat, size 6, 5, 4, 3, 2, 1, until boom, 1924 01:28:23,500 --> 01:28:25,820 it's all done at the very end. 1925 01:28:25,820 --> 01:28:29,200 So this is just one way of expressing in pseudo code what 1926 01:28:29,200 --> 01:28:33,040 we did a little more organically and a formalization of what someone 1927 01:28:33,040 --> 01:28:35,420 volunteered out in the audience. 1928 01:28:35,420 --> 01:28:40,300 So if we consider, then, the efficiency of this algorithm, 1929 01:28:40,300 --> 01:28:42,730 maybe abstracting it away now as a bunch of doors 1930 01:28:42,730 --> 01:28:46,960 where the left most again is always 0, the right most is always n minus 1, 1931 01:28:46,960 --> 01:28:50,350 or equivalently, the second to last is n minus 2, the third to last 1932 01:28:50,350 --> 01:28:54,550 is n minus 3 where n might be 8 or anything else, 1933 01:28:54,550 --> 01:28:59,620 how do we think about or quantify the running time of selection sort? 1934 01:28:59,620 --> 01:29:02,230 Big O of what? 1935 01:29:02,230 --> 01:29:05,230 I mean, that was a lot of steps to be adding up. 1936 01:29:05,230 --> 01:29:09,130 It's probably more than n, right, because I went through the list 1937 01:29:09,130 --> 01:29:10,030 again and again. 1938 01:29:10,030 --> 01:29:14,740 It was like n plus n minus 1 plus n minus 2. 1939 01:29:14,740 --> 01:29:17,080 Any instincts here? 1940 01:29:17,080 --> 01:29:20,620 We got like the whole team in the orchestra now. 1941 01:29:20,620 --> 01:29:25,180 Let me propose we think about it this way with just a bit of formula, say. 1942 01:29:25,180 --> 01:29:29,350 So the first time, I had to look at n different volunteers. 1943 01:29:29,350 --> 01:29:33,130 n was 8 in this case, but generically, I looked at all eight numbers 1944 01:29:33,130 --> 01:29:35,230 in order to decide who was the smallest. 1945 01:29:35,230 --> 01:29:37,270 And sure enough, Celeste was at the very end. 1946 01:29:37,270 --> 01:29:39,103 She happened to be all the way to the right. 1947 01:29:39,103 --> 01:29:43,510 But I only knew that once I looked at all 8 or all n volunteers. 1948 01:29:43,510 --> 01:29:45,880 So that took me n steps first. 1949 01:29:45,880 --> 01:29:49,270 But once the list was swapped into the right place, then 1950 01:29:49,270 --> 01:29:53,290 my problem with size n minus 1, and I had n minus 1 other people 1951 01:29:53,290 --> 01:29:54,320 to look through. 1952 01:29:54,320 --> 01:29:55,870 So that's n minus 1 steps. 1953 01:29:55,870 --> 01:29:59,825 Then after that, it's n minus 2 plus n minus 3 plus n minus 4 plus dot dot 1954 01:29:59,825 --> 01:30:01,270 dot until I had one final step. 1955 01:30:01,270 --> 01:30:04,460 And it's obvious that I only have one human left to consider. 1956 01:30:04,460 --> 01:30:07,180 So we might wave our hands at this with a little ellipsis 1957 01:30:07,180 --> 01:30:10,400 and just say dot dot dot plus 1 for the final step. 1958 01:30:10,400 --> 01:30:11,890 Now what does this actually equal? 1959 01:30:11,890 --> 01:30:13,480 Well, this is where you might think back on, like, 1960 01:30:13,480 --> 01:30:15,400 your high school math or physics textbook that 1961 01:30:15,400 --> 01:30:18,640 has a little cheat sheet at the end that shows these kinds of recurrences. 1962 01:30:18,640 --> 01:30:21,490 That happens to work out mathematically to be 1963 01:30:21,490 --> 01:30:25,120 n times n plus 1 all divided by 2. 1964 01:30:25,120 --> 01:30:28,690 That's just what that recurrence, that series, actually adds up to. 1965 01:30:28,690 --> 01:30:31,300 So if you take on faith that that math is correct, let's 1966 01:30:31,300 --> 01:30:35,530 just now multiply this out mathematically. 1967 01:30:35,530 --> 01:30:41,920 That's n squared plus n divided by 2 or n squared divided by 2 plus n over 2. 1968 01:30:41,920 --> 01:30:44,890 And here's where we're starting to get annoyingly into the weeds. 1969 01:30:44,890 --> 01:30:50,080 Like, honestly, as n gets really large, like a million doors or integers 1970 01:30:50,080 --> 01:30:54,940 or a billion web pages in Google search engine, honestly, which of these terms 1971 01:30:54,940 --> 01:30:57,400 is going to matter the most mathematically 1972 01:30:57,400 --> 01:30:59,200 if n is a really big number? 1973 01:30:59,200 --> 01:31:01,840 Is n squared divided by 2 the dominant factor, 1974 01:31:01,840 --> 01:31:04,260 or is n divided by 2 the dominant factor? 1975 01:31:04,260 --> 01:31:05,995 AUDIENCE: n squared. 1976 01:31:05,995 --> 01:31:07,120 DAVID J. MALAN: Yeah, n squared. 1977 01:31:07,120 --> 01:31:09,290 I mean, no matter what n is-- and the bigger it is, 1978 01:31:09,290 --> 01:31:12,580 the bigger raising it to the power 2 is going to be. 1979 01:31:12,580 --> 01:31:13,330 So you know what? 1980 01:31:13,330 --> 01:31:16,270 Let's just wave our hands at this because at the end of the day, 1981 01:31:16,270 --> 01:31:19,780 as n gets really large, the dominant factor is indeed that first one. 1982 01:31:19,780 --> 01:31:20,530 And you know what? 1983 01:31:20,530 --> 01:31:24,290 Even the divided 2, as I claimed earlier with our two phone book examples, where 1984 01:31:24,290 --> 01:31:26,960 the two straight lines if you keep zooming out essentially 1985 01:31:26,960 --> 01:31:31,760 looked the same when n is large enough, let's just call this on the order of n 1986 01:31:31,760 --> 01:31:32,490 squared. 1987 01:31:32,490 --> 01:31:37,130 So that is to say a computer scientist would describe bubble sort as taking 1988 01:31:37,130 --> 01:31:39,860 on the order of n squared steps. 1989 01:31:39,860 --> 01:31:41,510 That's an oversimplification. 1990 01:31:41,510 --> 01:31:44,600 If we really added it up, it's actually this many steps-- n 1991 01:31:44,600 --> 01:31:46,610 squared divided by 2 plus n over 2. 1992 01:31:46,610 --> 01:31:50,420 But again, if we want to just be able to generally compare two algorithms' 1993 01:31:50,420 --> 01:31:53,810 performance, I think it's going to suffice if we look at that highest 1994 01:31:53,810 --> 01:31:59,720 order term to get a sense of what the algorithm feels like, if you will, 1995 01:31:59,720 --> 01:32:02,540 or what it even looks like graphically. 1996 01:32:02,540 --> 01:32:06,170 All right, so with that said, we might describe bubble sort 1997 01:32:06,170 --> 01:32:07,790 as being in big O-- 1998 01:32:07,790 --> 01:32:11,700 sorry, selection sort as being in big O of n squared. 1999 01:32:11,700 --> 01:32:17,030 But what if we consider now the best case scenario-- an opportunity 2000 01:32:17,030 --> 01:32:19,070 to talk about a lower bound? 2001 01:32:19,070 --> 01:32:23,305 In the best case, how many steps does selection sort take? 2002 01:32:23,305 --> 01:32:24,680 Well, here, we need some context. 2003 01:32:24,680 --> 01:32:27,320 Like, what does it mean to be the best case or the worst case 2004 01:32:27,320 --> 01:32:29,090 when it comes to sorting? 2005 01:32:29,090 --> 01:32:32,600 Like, what could you imagine meaning the best possible scenario when you're 2006 01:32:32,600 --> 01:32:35,651 trying to sort a bunch of numbers? 2007 01:32:35,651 --> 01:32:37,100 I got the whole crew here again. 2008 01:32:37,100 --> 01:32:37,400 Yeah. 2009 01:32:37,400 --> 01:32:39,050 AUDIENCE: They would already be sorted. 2010 01:32:39,050 --> 01:32:40,370 DAVID J. MALAN: All right, they're already sorted, right? 2011 01:32:40,370 --> 01:32:44,030 I can't really imagine a better scenario than I have to sort some numbers, 2012 01:32:44,030 --> 01:32:46,040 but they're already sorted for me. 2013 01:32:46,040 --> 01:32:51,170 But does this algorithm leverage that fact in practice? 2014 01:32:51,170 --> 01:32:54,110 Even if all of our humans had lined up from 0 to 7, 2015 01:32:54,110 --> 01:32:56,930 I'm pretty sure I would have pretty naively started here. 2016 01:32:56,930 --> 01:32:58,670 And yes, Celeste happens to be here. 2017 01:32:58,670 --> 01:33:03,375 But I only know she needs to be here once I've looked at all eight people. 2018 01:33:03,375 --> 01:33:06,000 And then I would have realized, well, that was a waste of time. 2019 01:33:06,000 --> 01:33:07,280 I can leave Celeste be. 2020 01:33:07,280 --> 01:33:09,500 But then what would I have done? 2021 01:33:09,500 --> 01:33:12,860 I would have ignored her position because we've solved one problem. 2022 01:33:12,860 --> 01:33:16,260 I would have done the same thing now for seven people, then six people. 2023 01:33:16,260 --> 01:33:19,320 So every time I walk through, I'm not doing much useful work. 2024 01:33:19,320 --> 01:33:21,860 But I am doing those comparisons because I 2025 01:33:21,860 --> 01:33:25,500 don't know until I do the work that the people were in the right order. 2026 01:33:25,500 --> 01:33:30,440 So this would seem to imply that the omega notation, the best case 2027 01:33:30,440 --> 01:33:33,740 scenario, even, a lower bound on the running time would be what, then? 2028 01:33:33,740 --> 01:33:35,840 AUDIENCE: [INAUDIBLE] 2029 01:33:35,840 --> 01:33:37,470 DAVID J. MALAN: A little louder? 2030 01:33:37,470 --> 01:33:38,410 AUDIENCE: N squared. 2031 01:33:38,410 --> 01:33:40,250 DAVID J. MALAN: It's still going to be n squared, 2032 01:33:40,250 --> 01:33:45,820 in fact, because the code I'm giving myself doesn't leverage or benefit 2033 01:33:45,820 --> 01:33:50,260 from any of that scenario because it just mindlessly continues 2034 01:33:50,260 --> 01:33:51,770 to do this again and again. 2035 01:33:51,770 --> 01:33:57,010 So in this case, yes, I would claim that the omega notation for selection sort 2036 01:33:57,010 --> 01:33:58,830 is also big O of n squared. 2037 01:33:58,830 --> 01:34:00,580 So those are the kinds of numbers to beat. 2038 01:34:00,580 --> 01:34:03,760 It seems like the upper bound and lower bound of selection 2039 01:34:03,760 --> 01:34:06,130 sort are indeed n squared. 2040 01:34:06,130 --> 01:34:08,380 And so we can also describe selection sort, therefore, 2041 01:34:08,380 --> 01:34:09,717 as being in theta of n squared. 2042 01:34:09,717 --> 01:34:12,550 That's the first algorithm we've had the chance to describe that in, 2043 01:34:12,550 --> 01:34:14,650 which is to say that it's kind of slow. 2044 01:34:14,650 --> 01:34:16,595 I mean, maybe other algorithms are slower, 2045 01:34:16,595 --> 01:34:18,220 but this isn't the best starting point. 2046 01:34:18,220 --> 01:34:19,370 Can we do better? 2047 01:34:19,370 --> 01:34:22,930 Well, there's a reason that I guided us to doing the second algorithm second. 2048 01:34:22,930 --> 01:34:25,430 Even though you verbally proposed them in a different order, 2049 01:34:25,430 --> 01:34:28,600 this second algorithm we did is generally known as bubble sort. 2050 01:34:28,600 --> 01:34:31,210 And I deliberately used that word a bit ago, 2051 01:34:31,210 --> 01:34:34,930 saying the big values are bubbling their way up to the right 2052 01:34:34,930 --> 01:34:37,990 to kind of capture the fact that, indeed, this algorithm works 2053 01:34:37,990 --> 01:34:38,620 differently. 2054 01:34:38,620 --> 01:34:40,870 But let's consider if it's better or worse. 2055 01:34:40,870 --> 01:34:43,960 So here, we have pseudo code for bubble sort. 2056 01:34:43,960 --> 01:34:45,930 You could write this too in different ways. 2057 01:34:45,930 --> 01:34:48,550 But let's consider what we did on the stage. 2058 01:34:48,550 --> 01:34:51,850 We repeated the following n minus 1 times. 2059 01:34:51,850 --> 01:34:55,390 We initialized at least, even though I didn't verbalize it this way, 2060 01:34:55,390 --> 01:35:00,430 a variable like i from 0 to n minus 2, n minus 2. 2061 01:35:00,430 --> 01:35:01,810 And then I asked this question. 2062 01:35:01,810 --> 01:35:08,470 If numbers bracket i and numbers bracket i plus 1 are out of order, 2063 01:35:08,470 --> 01:35:10,250 then swap them. 2064 01:35:10,250 --> 01:35:12,580 So again, I just did it more intuitively by pointing, 2065 01:35:12,580 --> 01:35:14,950 but this would be a way, with a bit of pseudo code, 2066 01:35:14,950 --> 01:35:16,147 to describe what's going on. 2067 01:35:16,147 --> 01:35:18,730 But notice that I'm doing something a little differently here. 2068 01:35:18,730 --> 01:35:22,730 I'm iterating from if equals 0 to n minus 2. 2069 01:35:22,730 --> 01:35:23,230 Why? 2070 01:35:23,230 --> 01:35:26,740 Well, if I'm comparing two things, left hand and right hand, 2071 01:35:26,740 --> 01:35:28,690 I'd still want to start at 0. 2072 01:35:28,690 --> 01:35:31,450 But I don't want to go all the way to n minus 1 2073 01:35:31,450 --> 01:35:35,048 because then, I'd be going past the boundary of my array, which 2074 01:35:35,048 --> 01:35:35,590 would be bad. 2075 01:35:35,590 --> 01:35:38,080 I want to make sure that my left hand-- i, if you will-- 2076 01:35:38,080 --> 01:35:42,820 stops at n minus 2 so that when I plus 1 in my pseudo code, 2077 01:35:42,820 --> 01:35:45,610 I'm looking at the last two elements, not the last element 2078 01:35:45,610 --> 01:35:46,900 and then pass the boundary. 2079 01:35:46,900 --> 01:35:48,733 That's actually a common programming mistake 2080 01:35:48,733 --> 01:35:50,620 that we'll undoubtedly soon make by going 2081 01:35:50,620 --> 01:35:52,880 beyond the boundaries of your array. 2082 01:35:52,880 --> 01:35:59,530 So this pseudo code, then, allows me to say compare every one again and again 2083 01:35:59,530 --> 01:36:01,730 and swap them if they're out of order. 2084 01:36:01,730 --> 01:36:06,190 Why do I repeat the whole thing n minus 1 times? 2085 01:36:06,190 --> 01:36:11,500 Like, why does it not suffice just to do this loop here? 2086 01:36:11,500 --> 01:36:14,980 Think what happened with Celeste. 2087 01:36:14,980 --> 01:36:19,480 Why do I repeat this whole thing n minus 1 times? 2088 01:36:19,480 --> 01:36:20,230 Yeah, in the back? 2089 01:36:20,230 --> 01:36:27,855 AUDIENCE: [INAUDIBLE] 2090 01:36:27,855 --> 01:36:30,230 DAVID J. MALAN: Indeed, and I think if I can recap accurately, 2091 01:36:30,230 --> 01:36:31,640 think back to Celeste again. 2092 01:36:31,640 --> 01:36:34,130 And I'm sorry to keep calling on you as our number 0. 2093 01:36:34,130 --> 01:36:37,670 Each time through bubble sort, she only moved one step. 2094 01:36:37,670 --> 01:36:41,280 And so in total, if there's n locations, at the end of the day, 2095 01:36:41,280 --> 01:36:45,930 she needs to move n minus 1 steps to get 0 all the way to where it needs to be. 2096 01:36:45,930 --> 01:36:50,330 And so this inner loop, if you will, where we're iterating using i, 2097 01:36:50,330 --> 01:36:52,700 that just fixes some of the problems. 2098 01:36:52,700 --> 01:36:56,150 But it doesn't fix all of the problems until we do that same logic again 2099 01:36:56,150 --> 01:36:57,660 and again and again. 2100 01:36:57,660 --> 01:37:01,410 And so how might we quantify the running time of this algorithm? 2101 01:37:01,410 --> 01:37:04,670 Well, one way to see it is to just literally look at the pseudo code. 2102 01:37:04,670 --> 01:37:09,170 The outer loop repeats n minus 1 times by definition. 2103 01:37:09,170 --> 01:37:10,490 It literally says that. 2104 01:37:10,490 --> 01:37:15,960 The inner loop, the for loop, also iterates n minus 1 times. 2105 01:37:15,960 --> 01:37:16,460 Why? 2106 01:37:16,460 --> 01:37:18,540 Because it's going from 0 to n minus 2. 2107 01:37:18,540 --> 01:37:22,580 And if that's hard to think about, that's the same thing is 1 to n minus 1 2108 01:37:22,580 --> 01:37:25,320 if you just add 1 to both ends of the formula. 2109 01:37:25,320 --> 01:37:29,910 So that means you're doing n minus 1 things n minus 1 times. 2110 01:37:29,910 --> 01:37:32,540 So I literally multiply how many times the outer loop 2111 01:37:32,540 --> 01:37:35,960 is running by how many times the inner loop is running, which gives me 2112 01:37:35,960 --> 01:37:39,320 sort of FOIL method n minus 1 squared. 2113 01:37:39,320 --> 01:37:41,300 And I could multiply that whole thing out. 2114 01:37:41,300 --> 01:37:44,330 Well, let's consider this just a little more methodically here. 2115 01:37:44,330 --> 01:37:48,382 If I have n minus 1 on the outer, n minus 1 on the inner-- 2116 01:37:48,382 --> 01:37:49,590 let's go ahead and FOIL this. 2117 01:37:49,590 --> 01:37:53,180 So n squared minus n minus n plus 1, combine 2118 01:37:53,180 --> 01:37:56,570 like terms-- n squared minus 2n plus 1. 2119 01:37:56,570 --> 01:38:01,470 And now which of these terms is clearly going to be dominant, so to speak? 2120 01:38:01,470 --> 01:38:01,970 The-- 2121 01:38:01,970 --> 01:38:02,450 AUDIENCE: N squared. 2122 01:38:02,450 --> 01:38:03,720 DAVID J. MALAN: --the n squared. 2123 01:38:03,720 --> 01:38:06,260 So yes, even though minus 2n is a good thing 2124 01:38:06,260 --> 01:38:08,660 because it's subtracting off some of the time required, 2125 01:38:08,660 --> 01:38:11,690 plus 1 is not that big a thing, there's such drops in the bucket when 2126 01:38:11,690 --> 01:38:14,120 n gets really large, like in the millions or billions, 2127 01:38:14,120 --> 01:38:18,510 certainly, that bubble sort 2 is on the order of n squared. 2128 01:38:18,510 --> 01:38:21,110 It's not the same exactly as selection sort. 2129 01:38:21,110 --> 01:38:23,030 But as n gets big, honestly, we're barely 2130 01:38:23,030 --> 01:38:25,530 going to be able to notice the difference most likely. 2131 01:38:25,530 --> 01:38:29,190 And so it too might be said to be on the order of n squared. 2132 01:38:29,190 --> 01:38:34,530 And if we consider now the lower bound on bubble sort's running time, 2133 01:38:34,530 --> 01:38:38,720 here's where things get potentially interesting. 2134 01:38:38,720 --> 01:38:44,450 What might you claim is the running time of bubble sort in the best case? 2135 01:38:44,450 --> 01:38:48,230 And the best case, I claim, is when the numbers are already sorted. 2136 01:38:48,230 --> 01:38:50,720 Is our pseudo code going to take that into account? 2137 01:38:50,720 --> 01:38:52,442 AUDIENCE: N 2138 01:38:52,442 --> 01:38:53,150 DAVID J. MALAN: OK, n. 2139 01:38:53,150 --> 01:38:54,350 Why do you propose n? 2140 01:38:54,350 --> 01:38:59,043 AUDIENCE: [INAUDIBLE] 2141 01:38:59,043 --> 01:39:00,710 DAVID J. MALAN: Yes, and that's the key word. 2142 01:39:00,710 --> 01:39:05,150 To summarize, in bubble sort, I do have to minimally make one pass because if I 2143 01:39:05,150 --> 01:39:07,760 don't look at all n elements, that I'm theoretically 2144 01:39:07,760 --> 01:39:09,260 just guessing if it's sorted or not. 2145 01:39:09,260 --> 01:39:11,480 Like, I obviously intuitively have to look 2146 01:39:11,480 --> 01:39:14,390 at every element to decide yay or nay, it's in the right order. 2147 01:39:14,390 --> 01:39:17,180 And my original pseudo code, though, is pretty naive. 2148 01:39:17,180 --> 01:39:22,640 It's just going to blindly go back and forth n minus 1 times again and again, 2149 01:39:22,640 --> 01:39:23,810 and that's going to add up. 2150 01:39:23,810 --> 01:39:25,760 But what if I add a bit of an optimization 2151 01:39:25,760 --> 01:39:28,010 that you might have glimpsed on the slide a moment ago 2152 01:39:28,010 --> 01:39:31,610 where if I compare two people and I don't swap them, compare two people, 2153 01:39:31,610 --> 01:39:34,550 don't swap them, and I go all the way through the list comparing 2154 01:39:34,550 --> 01:39:38,120 every pair of adjacent people, and I make no swaps, 2155 01:39:38,120 --> 01:39:40,820 it would be kind of not just naive but stupid 2156 01:39:40,820 --> 01:39:44,360 to do that same process again because if the humans have not moved, 2157 01:39:44,360 --> 01:39:47,150 I'm not going to make any different decisions. 2158 01:39:47,150 --> 01:39:49,740 I'm going to do nothing again, nothing again. 2159 01:39:49,740 --> 01:39:53,150 So at that point, it would be stupid, very inefficient, to go back and forth 2160 01:39:53,150 --> 01:39:54,210 and back and forth. 2161 01:39:54,210 --> 01:39:58,070 So if I modify our pseudo code with just an additional if condition, 2162 01:39:58,070 --> 01:40:00,050 I bet we can speed this up. 2163 01:40:00,050 --> 01:40:06,200 Inside of that same pseudo code, what if I say, hey, if no swaps, quit? 2164 01:40:06,200 --> 01:40:10,010 Like quit, prematurely before the loops are finished running. 2165 01:40:10,010 --> 01:40:13,380 One of the loops has gone through per the indentation here. 2166 01:40:13,380 --> 01:40:16,190 But if I do a loop from left to right and I 2167 01:40:16,190 --> 01:40:18,680 have made no swaps, which you can think of as just being 2168 01:40:18,680 --> 01:40:22,250 one other variable that's plus plusing as I go keeping, track of how many 2169 01:40:22,250 --> 01:40:22,820 swaps-- 2170 01:40:22,820 --> 01:40:24,740 if I've made no swaps from left to right, 2171 01:40:24,740 --> 01:40:27,240 I'm not going to make any swaps the next time around either. 2172 01:40:27,240 --> 01:40:30,030 So let's just quit at that point. 2173 01:40:30,030 --> 01:40:32,630 And that is to say in the best case, if you will, 2174 01:40:32,630 --> 01:40:36,410 when the list is already sorted, the omega notation for bubble sort 2175 01:40:36,410 --> 01:40:41,460 might indeed be omega of n if you add that optimization 2176 01:40:41,460 --> 01:40:44,060 so as to short circuit all of that inefficient 2177 01:40:44,060 --> 01:40:49,830 looping to do it only as many times as is necessary. 2178 01:40:49,830 --> 01:40:52,280 Let me pause to see if there's any questions here. 2179 01:40:52,280 --> 01:40:53,192 Yeah. 2180 01:40:53,192 --> 01:41:02,038 AUDIENCE: [INAUDIBLE] to optimize the running time for all cases possible? 2181 01:41:02,038 --> 01:41:03,080 DAVID J. MALAN: Good question. 2182 01:41:03,080 --> 01:41:08,690 If the running time of selection sort and bubble sort are both in big O 2183 01:41:08,690 --> 01:41:14,480 of n squared but selection sort is in omega of n squared while bubble sort is 2184 01:41:14,480 --> 01:41:17,300 in omega of n, which sounds better-- 2185 01:41:17,300 --> 01:41:20,510 I think if I may, should we just always use bubble sort? 2186 01:41:20,510 --> 01:41:24,680 Yes if we think that we might benefit over time 2187 01:41:24,680 --> 01:41:29,250 from a lot of good case scenarios or best case scenarios. 2188 01:41:29,250 --> 01:41:31,340 However, the goal at hand in just a bit is 2189 01:41:31,340 --> 01:41:33,480 going to be to do even better than both of these. 2190 01:41:33,480 --> 01:41:35,330 So hold that question further for a moment. 2191 01:41:35,330 --> 01:41:35,830 Yeah. 2192 01:41:35,830 --> 01:41:41,357 AUDIENCE: [INAUDIBLE] n minus 1? 2193 01:41:41,357 --> 01:41:41,940 DAVID J. MALAN: No. 2194 01:41:41,940 --> 01:41:43,080 So yes, good question. 2195 01:41:43,080 --> 01:41:46,890 So I say omega of n, but is it technically omega of n minus 1? 2196 01:41:46,890 --> 01:41:50,110 Maybe, but again, we're throwing away lower order terms. 2197 01:41:50,110 --> 01:41:53,670 And that's an advantage because we're not comparing things ever so precisely. 2198 01:41:53,670 --> 01:41:57,180 Just like I plotted with the green and yellow and red chart, 2199 01:41:57,180 --> 01:41:59,980 I just want to get a sense of the shape of these algorithms 2200 01:41:59,980 --> 01:42:03,330 so that when n gets really large, which of these choices 2201 01:42:03,330 --> 01:42:05,185 is going to matter the most? 2202 01:42:05,185 --> 01:42:07,560 At the end of the day, it's actually perfectly reasonable 2203 01:42:07,560 --> 01:42:09,460 to use selection sort or bubble sort if you 2204 01:42:09,460 --> 01:42:12,210 don't have that much data because they're going to be pretty fast. 2205 01:42:12,210 --> 01:42:14,520 My God, our computers nowadays are 1 gigahertz, 2206 01:42:14,520 --> 01:42:18,440 2 gigahertz, 1 billion things per second, 2 billion things per second. 2207 01:42:18,440 --> 01:42:20,940 But if we have large data sets, as we will later in the term 2208 01:42:20,940 --> 01:42:23,470 and as you might in the real world, that the Googles of the world, 2209 01:42:23,470 --> 01:42:25,530 then you're going to want to be more thoughtful. 2210 01:42:25,530 --> 01:42:27,240 And that's where we're going today. 2211 01:42:27,240 --> 01:42:30,090 All right, so let's actually see this visualized a little bit. 2212 01:42:30,090 --> 01:42:32,010 In a moment, I'm going to change screens here 2213 01:42:32,010 --> 01:42:37,350 to open up what is a little visualization tool that will give us 2214 01:42:37,350 --> 01:42:40,920 a sense of how these things actually work and look at a faster rate 2215 01:42:40,920 --> 01:42:42,820 than our humans are able to do here on stage. 2216 01:42:42,820 --> 01:42:47,580 So here is another visualization of a bunch of numbers, an array of numbers. 2217 01:42:47,580 --> 01:42:50,670 Short bars mean small numbers, tall bars mean big numbers. 2218 01:42:50,670 --> 01:42:52,920 So instead of having the numbers on their torsos here, 2219 01:42:52,920 --> 01:42:57,690 we just have bars that are small or tall based on the magnitude of the number. 2220 01:42:57,690 --> 01:43:00,630 Let me go ahead, and I preconfigured this in advance 2221 01:43:00,630 --> 01:43:02,050 to operate somewhat quickly. 2222 01:43:02,050 --> 01:43:05,550 Let's go ahead and do selections sort by clicking this button. 2223 01:43:05,550 --> 01:43:08,220 And you'll see some pink bars flying by. 2224 01:43:08,220 --> 01:43:11,640 And that's like me walking left and right, left and right, 2225 01:43:11,640 --> 01:43:14,140 to select the next smallest number. 2226 01:43:14,140 --> 01:43:17,580 And so what you'll see happening on the left of this array of numbers 2227 01:43:17,580 --> 01:43:20,520 is Celeste, if you will, and all of the other smaller numbers 2228 01:43:20,520 --> 01:43:24,030 are appearing on the left while we continue to solve the remaining 2229 01:43:24,030 --> 01:43:25,810 problems to the right. 2230 01:43:25,810 --> 01:43:29,177 So again, we no longer have to touch the smaller numbers here. 2231 01:43:29,177 --> 01:43:32,010 So that's why the problem is getting smaller and smaller and smaller 2232 01:43:32,010 --> 01:43:32,790 over time. 2233 01:43:32,790 --> 01:43:36,420 But you can notice now visually, look at how many times 2234 01:43:36,420 --> 01:43:37,920 we're retracing our steps. 2235 01:43:37,920 --> 01:43:40,710 This is why things that are n squared tend 2236 01:43:40,710 --> 01:43:44,830 to be frowned upon if avoidable because I'm touching the same elements again 2237 01:43:44,830 --> 01:43:45,330 and again. 2238 01:43:45,330 --> 01:43:47,610 When I was walking through, I kept pointing at the same humans 2239 01:43:47,610 --> 01:43:48,360 again and again. 2240 01:43:48,360 --> 01:43:49,660 And that adds up. 2241 01:43:49,660 --> 01:43:52,750 So let's see if bubble sort looks or feels a little different. 2242 01:43:52,750 --> 01:43:56,188 Let me re-randomize the thing, and let me now click Bubble Sort at the top. 2243 01:43:56,188 --> 01:43:58,980 And as you might infer, there's other sorting algorithms out there, 2244 01:43:58,980 --> 01:44:00,272 not all of which we'll look at. 2245 01:44:00,272 --> 01:44:01,740 But here's bubble sort. 2246 01:44:01,740 --> 01:44:04,570 Same pink coloration, but it's doing something different. 2247 01:44:04,570 --> 01:44:07,860 It's two pink bars going through again and again comparing 2248 01:44:07,860 --> 01:44:09,550 the adjacent numbers. 2249 01:44:09,550 --> 01:44:13,050 And you'll see that the largest numbers are indeed bubbling the way up 2250 01:44:13,050 --> 01:44:18,120 to the right, but the smaller numbers, like our number 0 was, 2251 01:44:18,120 --> 01:44:20,287 is only slowly making its way over. 2252 01:44:20,287 --> 01:44:21,120 Here's a comparable. 2253 01:44:21,120 --> 01:44:22,200 Here's the number one. 2254 01:44:22,200 --> 01:44:24,900 And it's going to take a while to get all the way to the left. 2255 01:44:24,900 --> 01:44:28,560 And here too, notice how many times the same bars 2256 01:44:28,560 --> 01:44:32,590 are becoming pink, how many times the algorithm is retracing and retracing 2257 01:44:32,590 --> 01:44:33,090 its steps. 2258 01:44:33,090 --> 01:44:33,600 Why? 2259 01:44:33,600 --> 01:44:37,140 Because it's only solving one problem at a time on each pass. 2260 01:44:37,140 --> 01:44:40,717 And each time we do that, we're stepping through practically the whole array. 2261 01:44:40,717 --> 01:44:43,800 And now granted, I could speed this up even further if I really wanted to, 2262 01:44:43,800 --> 01:44:48,120 but my God, this is only, what, like 50 or 60 elements, something like that? 2263 01:44:48,120 --> 01:44:49,140 This is slow. 2264 01:44:49,140 --> 01:44:52,320 Like, this is what n squared looks like and feels like. 2265 01:44:52,320 --> 01:44:54,570 And now I'm just trying to come up with words to say 2266 01:44:54,570 --> 01:44:56,280 until we get to the finish line here. 2267 01:44:56,280 --> 01:44:59,250 Like, this would be annoying if this is the speed of sorting, 2268 01:44:59,250 --> 01:45:03,133 and this is why I sort of secretly sorted the numbers for Rave in advance 2269 01:45:03,133 --> 01:45:05,550 because it would have taken us an annoying number of steps 2270 01:45:05,550 --> 01:45:07,210 to get that in place for her. 2271 01:45:07,210 --> 01:45:10,170 So those two algorithms are n squared. 2272 01:45:10,170 --> 01:45:12,265 Can we do, in fact, better? 2273 01:45:12,265 --> 01:45:15,390 Well, to save the best algorithm for last, let's take a shorter five minute 2274 01:45:15,390 --> 01:45:15,890 break here. 2275 01:45:15,890 --> 01:45:20,410 And when we come back, we'll do even better than n squared. 2276 01:45:20,410 --> 01:45:22,480 All right. 2277 01:45:22,480 --> 01:45:27,180 So the challenge at hand is to do better than selection sort 2278 01:45:27,180 --> 01:45:30,270 and better than bubble sort and ideally not just marginally 2279 01:45:30,270 --> 01:45:32,460 better but fundamentally better. 2280 01:45:32,460 --> 01:45:35,970 Just like in week zero, that third and final divide and conquer algorithm 2281 01:45:35,970 --> 01:45:38,860 was sort of fundamentally faster than the other two. 2282 01:45:38,860 --> 01:45:41,950 So can we do better than something on the order of n squared? 2283 01:45:41,950 --> 01:45:44,310 Well, I bet we can if we start to approach 2284 01:45:44,310 --> 01:45:46,080 the problem a little differently. 2285 01:45:46,080 --> 01:45:48,202 The sorts we've done thus far, generally known 2286 01:45:48,202 --> 01:45:50,160 as comparison sorts-- and that kind of captures 2287 01:45:50,160 --> 01:45:54,000 the reality that we were doing a huge number of comparisons again and again. 2288 01:45:54,000 --> 01:45:57,360 And you kind of saw that in the vertical bars that were going pink as everything 2289 01:45:57,360 --> 01:45:58,855 was being compared again and again. 2290 01:45:58,855 --> 01:46:01,230 But there's this programming technique, and it's actually 2291 01:46:01,230 --> 01:46:04,140 a mathematical technique known as recursion 2292 01:46:04,140 --> 01:46:05,950 that we've actually seen before. 2293 01:46:05,950 --> 01:46:09,000 And this is a building block or a mental model 2294 01:46:09,000 --> 01:46:12,300 we can bring to bear on the problem to solve the sorting problem 2295 01:46:12,300 --> 01:46:13,740 sort of fundamentally differently. 2296 01:46:13,740 --> 01:46:16,620 But first, let's look at it in a more familiar context. 2297 01:46:16,620 --> 01:46:23,190 A little bit ago, I proposed this pseudo code for the binary search algorithm. 2298 01:46:23,190 --> 01:46:26,130 And notice that what was interesting about this code, 2299 01:46:26,130 --> 01:46:29,970 even though I didn't call it out at the time, it's kind of cyclically defined. 2300 01:46:29,970 --> 01:46:32,730 Like, I claim this is an algorithm for search, 2301 01:46:32,730 --> 01:46:36,750 and yet it seems a little unfair that I'm using the verb search 2302 01:46:36,750 --> 01:46:38,960 inside of the algorithm for search. 2303 01:46:38,960 --> 01:46:41,720 It's like an English sort of defining a word by using the word. 2304 01:46:41,720 --> 01:46:43,850 Normally, you shouldn't really get away with that. 2305 01:46:43,850 --> 01:46:46,430 But there's something interesting about this technique 2306 01:46:46,430 --> 01:46:51,110 here because even though this whole thing is a search algorithm 2307 01:46:51,110 --> 01:46:56,510 and I'm using my own algorithm to search the left half or the right half, 2308 01:46:56,510 --> 01:46:58,520 the key feature here that doesn't normally 2309 01:46:58,520 --> 01:47:01,670 happen in English when you define a word in terms of a word 2310 01:47:01,670 --> 01:47:05,300 is that when I search the left half or search the right half, yes, 2311 01:47:05,300 --> 01:47:06,810 I'm doing the same thing. 2312 01:47:06,810 --> 01:47:08,090 I'm using the same algorithm. 2313 01:47:08,090 --> 01:47:11,060 But the problem is, by definition, half as large. 2314 01:47:11,060 --> 01:47:14,180 So this isn't going to be a cyclical argument in the same way. 2315 01:47:14,180 --> 01:47:17,750 This approach, by using search within search 2316 01:47:17,750 --> 01:47:21,290 is going to whittle the problem down and down and down until hopefully, 2317 01:47:21,290 --> 01:47:24,180 one door or no doors remains. 2318 01:47:24,180 --> 01:47:26,720 And so recursion is a programming technique 2319 01:47:26,720 --> 01:47:29,930 whereby a function calls itself. 2320 01:47:29,930 --> 01:47:34,220 And we haven't seen this yet in C, and we haven't seen this really in Scratch. 2321 01:47:34,220 --> 01:47:38,122 But in C, you can have a function call itself. 2322 01:47:38,122 --> 01:47:39,830 And the form that takes is like literally 2323 01:47:39,830 --> 01:47:44,180 using the function's name inside of the function's implementation itself. 2324 01:47:44,180 --> 01:47:48,310 We've actually seen an opportunity for this once before too. 2325 01:47:48,310 --> 01:47:49,310 Think back to week zero. 2326 01:47:49,310 --> 01:47:51,560 Here's that same pseudo code for searching for someone 2327 01:47:51,560 --> 01:47:53,180 in an actual, physical phone book. 2328 01:47:53,180 --> 01:47:55,700 And notice these yellow lines here. 2329 01:47:55,700 --> 01:47:59,810 We described those in week zero as inducing a loop, a cycle. 2330 01:47:59,810 --> 01:48:04,310 And this is a very procedural approach, if you will, because lines 8 and 11 2331 01:48:04,310 --> 01:48:06,470 are very mechanically, if you will, telling 2332 01:48:06,470 --> 01:48:10,370 me to go back to line three to do this kind of looping thing. 2333 01:48:10,370 --> 01:48:14,750 But really, what that's doing in the binary search algorithm for the phone 2334 01:48:14,750 --> 01:48:20,120 book is it's just telling me to search the left half or search the right half. 2335 01:48:20,120 --> 01:48:23,900 I'm doing it more mechanically again by sort of telling myself 2336 01:48:23,900 --> 01:48:25,490 what line number to go back to. 2337 01:48:25,490 --> 01:48:28,400 But that's equivalent to just telling myself go search the left half, 2338 01:48:28,400 --> 01:48:30,950 search the right half, the key thing being the left 2339 01:48:30,950 --> 01:48:33,720 have and the right half are smaller than the original problem. 2340 01:48:33,720 --> 01:48:37,630 It would be a bug if I just said search the phone book, search the phone book, 2341 01:48:37,630 --> 01:48:39,380 because obviously, you never get anywhere. 2342 01:48:39,380 --> 01:48:41,540 But if you search the half, the half, the half, 2343 01:48:41,540 --> 01:48:43,470 problem gets smaller and smaller. 2344 01:48:43,470 --> 01:48:50,180 So let's reformulate week zero's phone book code to be not procedural as here 2345 01:48:50,180 --> 01:48:55,140 but recursive whereby in this search algorithm, 2346 01:48:55,140 --> 01:48:58,580 AKA binary search, formerly called divide and conquer, I'm 2347 01:48:58,580 --> 01:49:01,940 going to literally use also the keyword search here. 2348 01:49:01,940 --> 01:49:03,950 Notice among the benefits of doing this is it 2349 01:49:03,950 --> 01:49:06,798 kind of tightens the code up, makes it a little more succinct, 2350 01:49:06,798 --> 01:49:08,840 even though that's kind of a fringe benefit here. 2351 01:49:08,840 --> 01:49:12,350 But it's an elegant way too of describing 2352 01:49:12,350 --> 01:49:17,210 a problem by just having a function use itself 2353 01:49:17,210 --> 01:49:21,330 to solve a smaller puzzle at hand. 2354 01:49:21,330 --> 01:49:24,380 So let's now consider a familiar problem, a smaller 2355 01:49:24,380 --> 01:49:27,132 version than the one you've dabbled with-- this sort of pyramid, 2356 01:49:27,132 --> 01:49:28,340 this half pyramid from Mario. 2357 01:49:28,340 --> 01:49:31,070 And let's throw away the parts that aren't that interesting 2358 01:49:31,070 --> 01:49:35,420 and just consider how we might, up until now, implement this in C code, 2359 01:49:35,420 --> 01:49:37,520 this left aligned pyramid, if you will. 2360 01:49:37,520 --> 01:49:44,630 Let me go over here, and let me create a file called-- how about iteration.c? 2361 01:49:44,630 --> 01:49:48,080 And in this file, I'm going to go ahead and include cs50.h. 2362 01:49:48,080 --> 01:49:51,890 And I'm going to include stdio.h. 2363 01:49:51,890 --> 01:49:57,440 And the goal at hand is to implement in C a little program that just prints out 2364 01:49:57,440 --> 01:49:59,280 this and exactly this pyramid. 2365 01:49:59,280 --> 01:50:02,113 So no get string or any of that-- we're just going to keep it simple 2366 01:50:02,113 --> 01:50:05,400 and print exactly this pyramid of height 4 here. 2367 01:50:05,400 --> 01:50:06,570 So how might I do this? 2368 01:50:06,570 --> 01:50:10,945 Well, let me go ahead, and in main, let me first ask the user for-- 2369 01:50:10,945 --> 01:50:12,570 well, we'll go ahead and generalize it. 2370 01:50:12,570 --> 01:50:14,403 Let's go ahead and ask the user for heights. 2371 01:50:14,403 --> 01:50:16,250 We're using getint as before. 2372 01:50:16,250 --> 01:50:18,500 And I'll store that in a variable called height. 2373 01:50:18,500 --> 01:50:20,750 And then let me go ahead and simply call the function 2374 01:50:20,750 --> 01:50:22,410 draw passing in that height. 2375 01:50:22,410 --> 01:50:25,040 So for the moment, let me assume that someone somewhere 2376 01:50:25,040 --> 01:50:26,630 has implemented a draw function. 2377 01:50:26,630 --> 01:50:29,810 And this, then, is the entirety of my program. 2378 01:50:29,810 --> 01:50:32,970 All right, unfortunately, C does not come with a draw function. 2379 01:50:32,970 --> 01:50:34,820 So let me go ahead and invent one. 2380 01:50:34,820 --> 01:50:36,300 It doesn't need to return a value. 2381 01:50:36,300 --> 01:50:38,850 It just needs to print something-- so-called side effect. 2382 01:50:38,850 --> 01:50:43,460 So I'm going to define a function called draw that takes as input an int. 2383 01:50:43,460 --> 01:50:46,280 I'll call it n for number, but I could call it anything I want. 2384 01:50:46,280 --> 01:50:47,850 And inside of this. 2385 01:50:47,850 --> 01:50:53,240 I'm going to go ahead and print out a left aligned pyramid like this from top 2386 01:50:53,240 --> 01:50:53,990 to bottom. 2387 01:50:53,990 --> 01:50:57,710 The salient features here are that this is a pyramid, at least in this example, 2388 01:50:57,710 --> 01:50:58,820 of height four. 2389 01:50:58,820 --> 01:51:02,240 And now in height four, the first row has one brick. 2390 01:51:02,240 --> 01:51:03,470 The second row has two. 2391 01:51:03,470 --> 01:51:04,400 The third has three. 2392 01:51:04,400 --> 01:51:05,270 The fourth has four. 2393 01:51:05,270 --> 01:51:08,100 That's a nice pattern that I can probably represent in code. 2394 01:51:08,100 --> 01:51:09,210 So how might I do this? 2395 01:51:09,210 --> 01:51:11,390 Well, how about 4 int i gets-- 2396 01:51:11,390 --> 01:51:12,950 let me do it the old school way-- 2397 01:51:12,950 --> 01:51:13,730 1. 2398 01:51:13,730 --> 01:51:18,530 And then i is less than or equal to n. 2399 01:51:18,530 --> 01:51:20,180 And then i plus plus-- 2400 01:51:20,180 --> 01:51:23,810 so I'm going from 1 to 4 just to keep myself sane here. 2401 01:51:23,810 --> 01:51:26,690 And then inside of this loop, what do I want to do? 2402 01:51:26,690 --> 01:51:28,560 Well, let me keep it conventional, in fact. 2403 01:51:28,560 --> 01:51:31,970 Let me just change this to be the more conventional 0 2404 01:51:31,970 --> 01:51:36,170 to n even though it might not be as intuitive because now on row 0, 2405 01:51:36,170 --> 01:51:37,550 I want one brick. 2406 01:51:37,550 --> 01:51:39,965 On row 1, I want two bricks, dot dot dot. 2407 01:51:39,965 --> 01:51:41,300 On row 3, I want four. 2408 01:51:41,300 --> 01:51:42,530 So it's kind of offset now. 2409 01:51:42,530 --> 01:51:44,240 But I'm being more conventional. 2410 01:51:44,240 --> 01:51:48,440 So on each row, how many bricks do I want to print? 2411 01:51:48,440 --> 01:51:49,880 Well, I think I want to do this. 2412 01:51:49,880 --> 01:51:55,700 For int j, for instance, common to use j after if you have a nested loop, 2413 01:51:55,700 --> 01:52:02,930 let's start j at 0 and do this so long as is less than i plus 1 2414 01:52:02,930 --> 01:52:04,460 and then do j plus plus. 2415 01:52:04,460 --> 01:52:06,500 So why i plus 1? 2416 01:52:06,500 --> 01:52:10,670 Well, again, when I equals 0, that's the first row, and I want one brick. 2417 01:52:10,670 --> 01:52:12,650 When i equals 1, that's the second row. 2418 01:52:12,650 --> 01:52:13,430 I want two bricks. 2419 01:52:13,430 --> 01:52:16,250 And dot dot dot, when i is 3, I want four bricks. 2420 01:52:16,250 --> 01:52:19,460 So again, I have to add 1 to i to get the total number of bricks 2421 01:52:19,460 --> 01:52:21,120 that I want to print to the screen. 2422 01:52:21,120 --> 01:52:25,820 So inside of this nested for loop, I'm going to do printf of a hash 2423 01:52:25,820 --> 01:52:28,970 with no new line. 2424 01:52:28,970 --> 01:52:33,112 I'm going to save the new line for about here instead. 2425 01:52:33,112 --> 01:52:34,820 All right, the last thing I'm going to do 2426 01:52:34,820 --> 01:52:38,550 is copy and paste the prototype at the top of the file. 2427 01:52:38,550 --> 01:52:39,620 So that I can call this. 2428 01:52:39,620 --> 01:52:42,728 And again, this is of now week one, week two. 2429 01:52:42,728 --> 01:52:45,020 Wouldn't necessarily come to your mind as quickly as it 2430 01:52:45,020 --> 01:52:48,380 might to mine after all this practice, but this is something reminiscent 2431 01:52:48,380 --> 01:52:51,230 of what you yourself did already for Mario-- printing out 2432 01:52:51,230 --> 01:52:54,330 a pyramid that hopefully in a moment is going to look like this. 2433 01:52:54,330 --> 01:52:56,370 So let me go back to my code. 2434 01:52:56,370 --> 01:53:00,560 Let me run make iteration, and let me do dot slash iteration. 2435 01:53:00,560 --> 01:53:02,540 I'll type in 4, and voila. 2436 01:53:02,540 --> 01:53:05,840 Seems to be correct, and let's assume it's going to work for other inputs 2437 01:53:05,840 --> 01:53:06,500 as well. 2438 01:53:06,500 --> 01:53:07,310 Oh, thank you. 2439 01:53:07,310 --> 01:53:11,770 2440 01:53:11,770 --> 01:53:15,460 So this is indeed an example of iteration-- doing something 2441 01:53:15,460 --> 01:53:16,780 again and again. 2442 01:53:16,780 --> 01:53:18,160 And it's very procedural. 2443 01:53:18,160 --> 01:53:21,160 Like, I literally have a function called draw that does this thing. 2444 01:53:21,160 --> 01:53:25,570 But I can think about implementing draw in a somewhat different way that's 2445 01:53:25,570 --> 01:53:26,710 kind of clever. 2446 01:53:26,710 --> 01:53:28,720 And it's not strictly necessary for this problem 2447 01:53:28,720 --> 01:53:31,360 because this problem honestly is not that complicated 2448 01:53:31,360 --> 01:53:33,387 to solve once you have practice under your belt. 2449 01:53:33,387 --> 01:53:36,220 Certainly the first time around, probably significantly challenging. 2450 01:53:36,220 --> 01:53:39,250 But now that you kind of associate, OK, row one 2451 01:53:39,250 --> 01:53:42,010 with one brick, row two with two bricks, it kind of comes together 2452 01:53:42,010 --> 01:53:43,340 with these for loops. 2453 01:53:43,340 --> 01:53:45,770 But how else could we think about this problem? 2454 01:53:45,770 --> 01:53:48,940 Well, this physical structure, these bricks, in some sense 2455 01:53:48,940 --> 01:53:54,980 is a recursive structure, a structure that's defined in terms of itself. 2456 01:53:54,980 --> 01:53:56,360 Now what do I mean by that? 2457 01:53:56,360 --> 01:54:01,600 Well, if I were to ask you the question, what does a pyramid of height 4 2458 01:54:01,600 --> 01:54:04,720 look like, you would point, of course, to this picture. 2459 01:54:04,720 --> 01:54:11,170 But you could also kind of cleverly say to me, well, 2460 01:54:11,170 --> 01:54:16,240 it's actually a pyramid of height 3 plus 1 additional row. 2461 01:54:16,240 --> 01:54:18,237 And here's that cyclical argument, right? 2462 01:54:18,237 --> 01:54:21,070 Kind of obnoxious to do typically in English or in a spoken language 2463 01:54:21,070 --> 01:54:23,200 because you're defining one thing in terms of itself. 2464 01:54:23,200 --> 01:54:24,408 What's a pyramid of height 4? 2465 01:54:24,408 --> 01:54:28,180 Well, it's a pyramid of height 3 plus 1 more row. 2466 01:54:28,180 --> 01:54:30,940 But we can kind of leverage this logic in code. 2467 01:54:30,940 --> 01:54:32,590 Well, what's a pyramid of height 3? 2468 01:54:32,590 --> 01:54:34,870 Well, it's a pyramid of height 2 plus 1 more row. 2469 01:54:34,870 --> 01:54:36,940 Fine, what's a pyramid of height 2? 2470 01:54:36,940 --> 01:54:39,370 Well, it's a pyramid of height 1 plus 1 more row. 2471 01:54:39,370 --> 01:54:42,370 And then hopefully, this process ends, and it does because notice, 2472 01:54:42,370 --> 01:54:44,990 the pyramid is getting smaller and smaller. 2473 01:54:44,990 --> 01:54:48,610 So you're not going to have this sort of silly back and forth with me 2474 01:54:48,610 --> 01:54:52,550 infinitely many times because when we finally get to the base case, 2475 01:54:52,550 --> 01:54:54,130 the end of the pyramid, fine. 2476 01:54:54,130 --> 01:54:55,600 What is a pyramid of height 1? 2477 01:54:55,600 --> 01:54:58,630 Well, it's a pyramid of no height plus one more row. 2478 01:54:58,630 --> 01:55:01,270 And at that point, things just get negative-- 2479 01:55:01,270 --> 01:55:02,050 no pun intended. 2480 01:55:02,050 --> 01:55:04,035 Things just would otherwise go negative. 2481 01:55:04,035 --> 01:55:05,410 And so you can just kind of stop. 2482 01:55:05,410 --> 01:55:07,930 The base case is when there is no more pyramid. 2483 01:55:07,930 --> 01:55:12,080 So there's a way to draw a line in the sand and say, stop, no more arguments. 2484 01:55:12,080 --> 01:55:16,180 But this idea of defining a physical structure in terms of itself 2485 01:55:16,180 --> 01:55:22,180 or code in terms of itself actually lets us do some interesting new algorithms. 2486 01:55:22,180 --> 01:55:24,160 Let me go back to my code here. 2487 01:55:24,160 --> 01:55:29,800 Let me go ahead and create one final file here called recursion.c 2488 01:55:29,800 --> 01:55:36,100 that leverages this idea of this built-in self-referential nature. 2489 01:55:36,100 --> 01:55:38,080 Let me include cs50.h. 2490 01:55:38,080 --> 01:55:41,740 Let me go ahead and include standardio.h, int main void. 2491 01:55:41,740 --> 01:55:46,000 And then inside of main, I'm going to do the exact same thing-- int 2492 01:55:46,000 --> 01:55:50,470 height equals get int, asking the user for height. 2493 01:55:50,470 --> 01:55:53,870 And then I'm going to go ahead and call draw passing in height. 2494 01:55:53,870 --> 01:55:55,520 So that's going to stay the same. 2495 01:55:55,520 --> 01:56:01,580 I even am going to make my prototype the same-- void draw int n semicolon. 2496 01:56:01,580 --> 01:56:03,820 And now I'm going to implement void down here 2497 01:56:03,820 --> 01:56:05,590 with that same prototype, of course. 2498 01:56:05,590 --> 01:56:08,470 But the code now is going to be a little different. 2499 01:56:08,470 --> 01:56:10,070 What am I going to do here? 2500 01:56:10,070 --> 01:56:15,610 Well, first of all, if you ask me to draw a pyramid of height n, 2501 01:56:15,610 --> 01:56:18,580 I'm going to be kind of a wise ass here and say, well, just 2502 01:56:18,580 --> 01:56:21,010 draw a pyramid of n minus 1-- 2503 01:56:21,010 --> 01:56:21,762 done. 2504 01:56:21,762 --> 01:56:24,220 All right, but there's still a little more work to be done. 2505 01:56:24,220 --> 01:56:28,660 What happens after I print or draw a pyramid of height n minus 1 2506 01:56:28,660 --> 01:56:33,340 according to our structural definition a moment ago? 2507 01:56:33,340 --> 01:56:38,470 What remains after drawing a pyramid of height n minus 1 or 3, specifically? 2508 01:56:38,470 --> 01:56:40,840 AUDIENCE: [INAUDIBLE] 2509 01:56:40,840 --> 01:56:42,400 We need one more row of hashes. 2510 01:56:42,400 --> 01:56:43,750 OK, so I can do that, right? 2511 01:56:43,750 --> 01:56:45,213 I'm OK with the single loops. 2512 01:56:45,213 --> 01:56:46,630 There's no nesting necessary here. 2513 01:56:46,630 --> 01:56:51,010 I'm just going to do this-- for int i get 0, i is less than n, 2514 01:56:51,010 --> 01:56:53,185 which is the height that's passed in, i plus plus. 2515 01:56:53,185 --> 01:56:55,060 And then inside of this loop, I'm very simply 2516 01:56:55,060 --> 01:56:56,740 going to print out a single hash. 2517 01:56:56,740 --> 01:57:00,650 And then down here, I'm going to print out a new line at the very end. 2518 01:57:00,650 --> 01:57:01,600 So that's good, right? 2519 01:57:01,600 --> 01:57:03,720 I might not be as comfortable with nested loops. 2520 01:57:03,720 --> 01:57:04,720 This is nice and simple. 2521 01:57:04,720 --> 01:57:07,660 What does this loop do here on line 17 through 20? 2522 01:57:07,660 --> 01:57:13,540 It literally prints n hashes by counting from i equals 0 on up to 2523 01:57:13,540 --> 01:57:15,130 but not through n. 2524 01:57:15,130 --> 01:57:17,930 So that's sort of week one style syntax. 2525 01:57:17,930 --> 01:57:20,740 But this is kind of trippy now because I've somehow 2526 01:57:20,740 --> 01:57:25,480 boiled down the implementation of draw into printing a row after just 2527 01:57:25,480 --> 01:57:27,250 drawing the thing above it. 2528 01:57:27,250 --> 01:57:31,250 But this is problematic as is because in this case, 2529 01:57:31,250 --> 01:57:37,900 my drawer function, notice, is always going to call the draw function forever 2530 01:57:37,900 --> 01:57:38,890 in some sense. 2531 01:57:38,890 --> 01:57:44,200 But ideally, when do I want this cyclical process to stop? 2532 01:57:44,200 --> 01:57:48,010 When do I want to not call draw anymore? 2533 01:57:48,010 --> 01:57:51,297 Yeah, when n is 1, right? 2534 01:57:51,297 --> 01:57:53,380 When I get to the top of the pyramid, when n is 1, 2535 01:57:53,380 --> 01:57:55,720 or heck, when the pyramids all gone and n equals 0. 2536 01:57:55,720 --> 01:57:58,060 I can pick any line in the sand, so long as it's 2537 01:57:58,060 --> 01:57:59,690 sort of at the end of the process. 2538 01:57:59,690 --> 01:58:01,480 Then I don't want to call draw anymore. 2539 01:58:01,480 --> 01:58:03,850 So maybe what I should do is this. 2540 01:58:03,850 --> 01:58:10,450 If n equals equals 0, there's really nothing to draw. 2541 01:58:10,450 --> 01:58:14,340 So I'm just going to go ahead and return like this. 2542 01:58:14,340 --> 01:58:16,920 Otherwise, I'm going to go ahead and draw 2543 01:58:16,920 --> 01:58:20,238 n minus 1 rows and then one more row. 2544 01:58:20,238 --> 01:58:21,780 And I could express this differently. 2545 01:58:21,780 --> 01:58:24,540 I could do something like this, which would be equivalent. 2546 01:58:24,540 --> 01:58:29,490 I could say something like if n is greater than or equal to 0, 2547 01:58:29,490 --> 01:58:31,462 then go ahead and draw the row. 2548 01:58:31,462 --> 01:58:32,670 But I like it this way first. 2549 01:58:32,670 --> 01:58:34,587 For now, I'm going to go with the original way 2550 01:58:34,587 --> 01:58:37,740 just to ask a simple question and then just bail out of the function 2551 01:58:37,740 --> 01:58:38,970 if n equals 0. 2552 01:58:38,970 --> 01:58:41,740 And heck, just to be super safe, just in case 2553 01:58:41,740 --> 01:58:44,040 the user types in a negative number, let me also 2554 01:58:44,040 --> 01:58:46,980 just check if n is a negative number, also, just return immediately. 2555 01:58:46,980 --> 01:58:48,210 Don't do anything. 2556 01:58:48,210 --> 01:58:51,490 I'm not returning a value because again, the function is void. 2557 01:58:51,490 --> 01:58:53,680 It doesn't need or have a return value. 2558 01:58:53,680 --> 01:58:55,860 So just saying return suffices. 2559 01:58:55,860 --> 01:59:00,660 But if n equals 1 or 2 or 3 or anything higher, 2560 01:59:00,660 --> 01:59:06,120 it is reasonable to draw a pyramid of slightly shorter height like, instead 2561 01:59:06,120 --> 01:59:10,680 of 4, 3, and then go ahead and print one more row. 2562 01:59:10,680 --> 01:59:16,450 So this is an example now of code that calls itself within itself. 2563 01:59:16,450 --> 01:59:18,150 Draw is calling draw. 2564 01:59:18,150 --> 01:59:22,920 But this so-called base case ensures, this conditional ensures, 2565 01:59:22,920 --> 01:59:24,690 that we're not going to do this forever. 2566 01:59:24,690 --> 01:59:27,300 Otherwise, we literally would do this infinitely many times, 2567 01:59:27,300 --> 01:59:30,330 and something bad is probably going to happen. 2568 01:59:30,330 --> 01:59:34,230 All right, let me go ahead and compile this code-- make recursion. 2569 01:59:34,230 --> 01:59:38,250 OK, no syntax errors-- dot slash recursion, Enter, height of 4, 2570 01:59:38,250 --> 01:59:40,590 and voila. 2571 01:59:40,590 --> 01:59:44,400 If only because some of you have run into this issue accidentally already, 2572 01:59:44,400 --> 01:59:48,420 let me get rid of the base case here, and let me recompile the code. 2573 01:59:48,420 --> 01:59:49,980 Make recursion. 2574 01:59:49,980 --> 01:59:52,150 Oh, and actually, now it's actually catching it. 2575 01:59:52,150 --> 01:59:55,050 So the compiler is smart enough here to realize 2576 01:59:55,050 --> 01:59:58,470 that all paths through this function will call itself. 2577 01:59:58,470 --> 02:00:01,300 AKA, It's going to loop forever. 2578 02:00:01,300 --> 02:00:03,090 So let me do the first thing. 2579 02:00:03,090 --> 02:00:05,550 Suppose I only check for n equaling 0. 2580 02:00:05,550 --> 02:00:09,390 Let me go ahead and recompile this code with make recursion. 2581 02:00:09,390 --> 02:00:12,030 And now let me just be kind of uncooperative. 2582 02:00:12,030 --> 02:00:16,200 When I run this program, still works for 4, still works for 0. 2583 02:00:16,200 --> 02:00:19,110 What if I do like negative 100? 2584 02:00:19,110 --> 02:00:22,920 Have any of you experienced a segmentation fault or core dump? 2585 02:00:22,920 --> 02:00:24,240 OK, so no shame in this. 2586 02:00:24,240 --> 02:00:28,980 Like, this means I have somehow touched memory that I shouldn't have. 2587 02:00:28,980 --> 02:00:32,910 And in short, I actually called this function thousands of times 2588 02:00:32,910 --> 02:00:35,550 accidentally, it would seem now, until the program just 2589 02:00:35,550 --> 02:00:38,413 bailed on me because I eventually touched memory in the computer 2590 02:00:38,413 --> 02:00:39,330 that I shouldn't have. 2591 02:00:39,330 --> 02:00:41,140 That'll make even more sense next week. 2592 02:00:41,140 --> 02:00:42,640 But for now, it's simply a bug. 2593 02:00:42,640 --> 02:00:44,430 And I can avoid that bug in this context, 2594 02:00:44,430 --> 02:00:48,900 probably not your own pset context, by just making sure we don't even 2595 02:00:48,900 --> 02:00:51,340 allow for negative numbers at all. 2596 02:00:51,340 --> 02:00:53,700 So with this building block in place, what 2597 02:00:53,700 --> 02:00:57,420 can we now do in terms of those same numbers to sort? 2598 02:00:57,420 --> 02:01:00,390 Well, it turns out there's a sorting algorithm called merge sort. 2599 02:01:00,390 --> 02:01:01,980 And there's bunches of others too. 2600 02:01:01,980 --> 02:01:06,923 But merge sort is a nice one to discuss because it fundamentally, we hope, 2601 02:01:06,923 --> 02:01:09,090 is going to do better than selection sort and bubble 2602 02:01:09,090 --> 02:01:11,490 sort that is better than n squared. 2603 02:01:11,490 --> 02:01:14,238 But the catch is it's a little harder to think about. 2604 02:01:14,238 --> 02:01:17,280 In fact, I'll act it out myself with just these numbers on the shelf here 2605 02:01:17,280 --> 02:01:21,360 rather than humans because recursion in general takes a little bit of effort 2606 02:01:21,360 --> 02:01:23,700 to wrap your mind around, typically a bit of practice. 2607 02:01:23,700 --> 02:01:25,908 But I'll see if we can't walk through it methodically 2608 02:01:25,908 --> 02:01:28,260 enough such that this comes to light. 2609 02:01:28,260 --> 02:01:32,460 So here's the pseudo code I propose for this algorithm called merge sort. 2610 02:01:32,460 --> 02:01:35,640 In the spirit of recursion, this sorting algorithm 2611 02:01:35,640 --> 02:01:41,020 literally calls itself by using the verb sort in its pseudo code. 2612 02:01:41,020 --> 02:01:42,600 So how does merge sort work? 2613 02:01:42,600 --> 02:01:46,020 It sort of obnoxiously says, well, if you want to sort all of these things, 2614 02:01:46,020 --> 02:01:48,720 go sort the left half, then go sort the right half, 2615 02:01:48,720 --> 02:01:50,430 and then merge the two together. 2616 02:01:50,430 --> 02:01:51,682 Now obnoxious in what sense? 2617 02:01:51,682 --> 02:01:54,390 Well, if I just asked you to sort something and you just tell me, 2618 02:01:54,390 --> 02:01:56,070 well, go sort that thing and then go sort 2619 02:01:56,070 --> 02:01:58,737 that thing, what was the point of asking you in the first place? 2620 02:01:58,737 --> 02:02:01,080 But the key is that each of these lines is 2621 02:02:01,080 --> 02:02:03,880 sorting a smaller piece of the problem. 2622 02:02:03,880 --> 02:02:06,450 So eventually, we'll be able to pare this down 2623 02:02:06,450 --> 02:02:10,410 into something that doesn't go on forever because in fact, in merge sort, 2624 02:02:10,410 --> 02:02:12,120 there's a base case too. 2625 02:02:12,120 --> 02:02:14,940 There's a scenario where we just check, wait a minute, 2626 02:02:14,940 --> 02:02:17,520 if there's only one number to sort, that's it. 2627 02:02:17,520 --> 02:02:19,470 Quit then because you're all done. 2628 02:02:19,470 --> 02:02:22,590 So there has to be this base case in any use of recursion 2629 02:02:22,590 --> 02:02:27,090 to make sure that you don't mindlessly call yourself forever. 2630 02:02:27,090 --> 02:02:29,380 You've got to stop at some point. 2631 02:02:29,380 --> 02:02:32,560 So let's focus on the third of these steps. 2632 02:02:32,560 --> 02:02:37,328 What does it mean to merge two lists, two halves of a list, 2633 02:02:37,328 --> 02:02:39,120 just because this is apparently going to be 2634 02:02:39,120 --> 02:02:41,290 a key ingredient-- so here, for instance, 2635 02:02:41,290 --> 02:02:43,890 are two halves of a list of size 8. 2636 02:02:43,890 --> 02:02:47,010 We have the numbers 2-- and I'll call it out if you're at a bad angle-- 2637 02:02:47,010 --> 02:02:52,200 2457 and 0136. 2638 02:02:52,200 --> 02:02:56,710 Notice that the left half at the moment, 2457, is already sorted, 2639 02:02:56,710 --> 02:03:01,000 and the right half, 0136, is also sorted as well. 2640 02:03:01,000 --> 02:03:03,870 So that's a good thing because it means that theoretically, I've 2641 02:03:03,870 --> 02:03:05,280 sorted the left half already. 2642 02:03:05,280 --> 02:03:07,620 I've sorted the right half already before we began. 2643 02:03:07,620 --> 02:03:09,690 I just need to merge these two halves. 2644 02:03:09,690 --> 02:03:11,800 What does it mean to sort two halves? 2645 02:03:11,800 --> 02:03:13,550 Well, for the sake of discussion, I'm just 2646 02:03:13,550 --> 02:03:19,340 going to turn over most of the numbers except for the first numbers in each 2647 02:03:19,340 --> 02:03:20,360 of these halves. 2648 02:03:20,360 --> 02:03:23,390 There's two halves here, left and right. 2649 02:03:23,390 --> 02:03:25,460 At the moment, I'm only going to consider 2650 02:03:25,460 --> 02:03:29,532 the leftmost element of each half-- that is, the one on the left here 2651 02:03:29,532 --> 02:03:30,740 and the one on the left here. 2652 02:03:30,740 --> 02:03:33,800 How do I merge these two lists together? 2653 02:03:33,800 --> 02:03:38,540 Well, if I look at 2 and I look at 0, which one should presumably come first? 2654 02:03:38,540 --> 02:03:39,502 The smaller one. 2655 02:03:39,502 --> 02:03:41,210 So I'm going to grab the 0, and I'm going 2656 02:03:41,210 --> 02:03:44,150 to put it into its own place on this new shelf here. 2657 02:03:44,150 --> 02:03:50,300 And now I'm going to consider, as part of my iteration, 2658 02:03:50,300 --> 02:03:53,270 the beginning of this list and the new beginning of this list. 2659 02:03:53,270 --> 02:03:55,130 So I'm now comparing 2 and 1. 2660 02:03:55,130 --> 02:03:56,210 Which one's smaller? 2661 02:03:56,210 --> 02:03:58,400 I'm going to go ahead and grab the 1. 2662 02:03:58,400 --> 02:04:01,130 Now I'm going to compare the beginning of the left list 2663 02:04:01,130 --> 02:04:03,440 and the new beginning of the right list, 2 and 3. 2664 02:04:03,440 --> 02:04:05,018 Of course, it's 2. 2665 02:04:05,018 --> 02:04:07,310 Now I'm going to compare the beginning of the left list 2666 02:04:07,310 --> 02:04:09,290 and the beginning of the right list, 4 and 3. 2667 02:04:09,290 --> 02:04:11,510 It's of course 3. 2668 02:04:11,510 --> 02:04:14,250 Now I'm going to compare the 4 against the beginning and end, 2669 02:04:14,250 --> 02:04:15,830 it turns out, of the second list-- 2670 02:04:15,830 --> 02:04:16,988 4, of course. 2671 02:04:16,988 --> 02:04:19,280 Now I'm going to compare the beginning of the left list 2672 02:04:19,280 --> 02:04:20,822 and the beginning of the right list-- 2673 02:04:20,822 --> 02:04:22,430 5, of course. 2674 02:04:22,430 --> 02:04:25,010 I'm realizing this is not going to end well because I left 2675 02:04:25,010 --> 02:04:26,420 too much distance between the numbers. 2676 02:04:26,420 --> 02:04:28,337 But that has nothing to do with the algorithm. 2677 02:04:28,337 --> 02:04:29,840 7 is the beginning of the left list. 2678 02:04:29,840 --> 02:04:31,460 6 is the beginning of the right list. 2679 02:04:31,460 --> 02:04:32,960 It's, of course, 6. 2680 02:04:32,960 --> 02:04:36,410 And at the risk of knocking all of these over, 2681 02:04:36,410 --> 02:04:43,310 if I now make room for this element, we have hopefully 2682 02:04:43,310 --> 02:04:50,400 sorted the whole thing by having merged together the two halves of the list. 2683 02:04:50,400 --> 02:04:52,040 So in short-- thank you. 2684 02:04:52,040 --> 02:04:56,448 2685 02:04:56,448 --> 02:04:58,740 I'm a little worried that's just getting sarcastic now, 2686 02:04:58,740 --> 02:05:03,870 but we now have merged two half lists. 2687 02:05:03,870 --> 02:05:07,430 We haven't done the guts of the algorithm yet-- sort the left half 2688 02:05:07,430 --> 02:05:08,430 and sort the right half. 2689 02:05:08,430 --> 02:05:11,160 But I claim that that is how mechanically you 2690 02:05:11,160 --> 02:05:12,960 merge two sorted halves. 2691 02:05:12,960 --> 02:05:15,060 You keep looking at the beginning of each list, 2692 02:05:15,060 --> 02:05:17,640 and you just kind of weave them together based 2693 02:05:17,640 --> 02:05:21,070 on which one belongs first based on its size. 2694 02:05:21,070 --> 02:05:23,460 So if you agree that that was a reasonable way 2695 02:05:23,460 --> 02:05:28,020 to merge two lists together, let's go ahead and focus lastly 2696 02:05:28,020 --> 02:05:31,260 on what it means to actually sort the left half 2697 02:05:31,260 --> 02:05:33,437 and sort the right half of a whole bunch of numbers. 2698 02:05:33,437 --> 02:05:35,520 And for this, I'm going to go ahead and order them 2699 02:05:35,520 --> 02:05:37,320 in this seemingly random order. 2700 02:05:37,320 --> 02:05:40,200 And I just have a little cheat sheet above so that I don't mess up. 2701 02:05:40,200 --> 02:05:42,330 And I'm going to start at the very top this time. 2702 02:05:42,330 --> 02:05:45,390 And hopefully, these will not fall down at any point. 2703 02:05:45,390 --> 02:05:51,300 But I'm just deliberately putting them in this random order, 5274. 2704 02:05:51,300 --> 02:05:53,703 And then we have 1630-- 2705 02:05:53,703 --> 02:05:57,000 1630. 2706 02:05:57,000 --> 02:05:59,340 Hopefully this won't fall over. 2707 02:05:59,340 --> 02:06:03,990 Here is now an array of size 8 with eight integers. 2708 02:06:03,990 --> 02:06:05,290 And I want to sort this. 2709 02:06:05,290 --> 02:06:08,332 I could use selection sort and just go back and forth and back and forth. 2710 02:06:08,332 --> 02:06:11,100 I could use bubble sort and just compare pairs, pairs, pairs. 2711 02:06:11,100 --> 02:06:13,830 But those are going to be on the order of big O of n squared. 2712 02:06:13,830 --> 02:06:16,000 My hope is to do fundamentally better here. 2713 02:06:16,000 --> 02:06:17,760 So let's see if we can do better. 2714 02:06:17,760 --> 02:06:19,800 All right, so let me look now at my code. 2715 02:06:19,800 --> 02:06:21,490 I'll keep it on the screen. 2716 02:06:21,490 --> 02:06:23,310 How do I implement merge sort? 2717 02:06:23,310 --> 02:06:25,303 Well, if there's only one number, I quit. 2718 02:06:25,303 --> 02:06:26,220 There's obviously not. 2719 02:06:26,220 --> 02:06:28,270 There's eight numbers, so that's not applicable. 2720 02:06:28,270 --> 02:06:30,603 I'm going to go ahead and sort the left half of numbers. 2721 02:06:30,603 --> 02:06:32,460 All right, here's the left half-- 2722 02:06:32,460 --> 02:06:34,470 5274. 2723 02:06:34,470 --> 02:06:37,230 Do I sort an array of size 4? 2724 02:06:37,230 --> 02:06:40,170 Well, here's where the recursion kicks in. 2725 02:06:40,170 --> 02:06:42,150 How do you sort a list of size 4? 2726 02:06:42,150 --> 02:06:44,310 Well, there's the pseudo code on the board. 2727 02:06:44,310 --> 02:06:47,490 I sort the left half of the list of size 4. 2728 02:06:47,490 --> 02:06:49,200 So here we go. 2729 02:06:49,200 --> 02:06:50,550 I have a list of size 4. 2730 02:06:50,550 --> 02:06:51,390 How do I sort it? 2731 02:06:51,390 --> 02:06:52,680 I sort the left half. 2732 02:06:52,680 --> 02:06:54,480 All right, now I have a list of size 2. 2733 02:06:54,480 --> 02:06:56,680 How do I sort this? 2734 02:06:56,680 --> 02:06:58,600 Well, sort the left half. 2735 02:06:58,600 --> 02:06:59,520 So here we go. 2736 02:06:59,520 --> 02:07:01,020 Here's a list of size 1. 2737 02:07:01,020 --> 02:07:03,996 How do I sort this? 2738 02:07:03,996 --> 02:07:05,640 I think it's done, right? 2739 02:07:05,640 --> 02:07:06,480 That's quit, right? 2740 02:07:06,480 --> 02:07:08,010 If only one number, I'm done. 2741 02:07:08,010 --> 02:07:09,233 The 5 is sorted. 2742 02:07:09,233 --> 02:07:10,650 All right, what was the next step? 2743 02:07:10,650 --> 02:07:12,180 You have to now rewind in time. 2744 02:07:12,180 --> 02:07:16,510 I just sorted the left half of the left half of the left half. 2745 02:07:16,510 --> 02:07:17,970 What do I now sort? 2746 02:07:17,970 --> 02:07:20,850 The right half, which is 2. 2747 02:07:20,850 --> 02:07:22,030 This is one element. 2748 02:07:22,030 --> 02:07:22,740 So I'm done. 2749 02:07:22,740 --> 02:07:27,060 So now at this point in the story, I have sorted, sort of idiotically-- 2750 02:07:27,060 --> 02:07:29,850 the 5 assorted, and the 2 is sorted. 2751 02:07:29,850 --> 02:07:34,410 But what's the third and final step of this phase of the algorithm? 2752 02:07:34,410 --> 02:07:35,650 Merge the two together. 2753 02:07:35,650 --> 02:07:37,600 So here's the left, here's the right list. 2754 02:07:37,600 --> 02:07:38,850 How do I merge these together? 2755 02:07:38,850 --> 02:07:41,530 I compare the lists, and I put the two there. 2756 02:07:41,530 --> 02:07:43,590 I only have the [? 5 ?] left, and I do that. 2757 02:07:43,590 --> 02:07:45,990 So now we see some visible progress. 2758 02:07:45,990 --> 02:07:47,190 But again, let's rewind. 2759 02:07:47,190 --> 02:07:48,220 How did we get here? 2760 02:07:48,220 --> 02:07:52,860 We started to sort the left half of the left half of the left half, then 2761 02:07:52,860 --> 02:07:53,700 the right half. 2762 02:07:53,700 --> 02:07:54,900 And now where are we? 2763 02:07:54,900 --> 02:07:58,090 We've just sorted the left half of the left half. 2764 02:07:58,090 --> 02:08:01,140 So what comes after sorting the left half of anything? 2765 02:08:01,140 --> 02:08:01,692 Right half. 2766 02:08:01,692 --> 02:08:03,900 All right, here's the sort of same nonsensical thing. 2767 02:08:03,900 --> 02:08:05,460 Here's a list of size 2. 2768 02:08:05,460 --> 02:08:06,690 Let's sort the left half. 2769 02:08:06,690 --> 02:08:07,347 Done. 2770 02:08:07,347 --> 02:08:08,430 Let's sort the right half. 2771 02:08:08,430 --> 02:08:09,120 Done. 2772 02:08:09,120 --> 02:08:10,170 What's the third step? 2773 02:08:10,170 --> 02:08:11,590 Merge them together. 2774 02:08:11,590 --> 02:08:14,880 So that's the 4, and that's the 7. 2775 02:08:14,880 --> 02:08:16,110 What have I now done? 2776 02:08:16,110 --> 02:08:21,070 In total, I've now sorted the left half of the original thing. 2777 02:08:21,070 --> 02:08:23,092 So what happens next? 2778 02:08:23,092 --> 02:08:24,300 Wait a minute, wait a minute. 2779 02:08:24,300 --> 02:08:25,560 I have not done that. 2780 02:08:25,560 --> 02:08:26,550 What have I done? 2781 02:08:26,550 --> 02:08:29,220 I have sorted the left half of the left half, 2782 02:08:29,220 --> 02:08:32,350 and I've sorted the right half of the left half. 2783 02:08:32,350 --> 02:08:34,710 What do I now need to do lastly? 2784 02:08:34,710 --> 02:08:36,640 Merge those two lists together. 2785 02:08:36,640 --> 02:08:38,562 So again, I put my finger on the beginning 2786 02:08:38,562 --> 02:08:40,270 of this list, the beginning of this list. 2787 02:08:40,270 --> 02:08:42,360 And if you want, I'll do the same thing when I merged last time 2788 02:08:42,360 --> 02:08:43,830 to be clear what I'm comparing. 2789 02:08:43,830 --> 02:08:46,440 2 and 4-- the 2 obviously comes first. 2790 02:08:46,440 --> 02:08:48,220 What comes next? 2791 02:08:48,220 --> 02:08:50,410 Well, the 4 comes next. 2792 02:08:50,410 --> 02:08:51,670 What comes next? 2793 02:08:51,670 --> 02:08:55,920 The 5 comes next and then lastly, of course, the 7. 2794 02:08:55,920 --> 02:09:00,010 Notice that the 2457 are now sorted. 2795 02:09:00,010 --> 02:09:02,802 So the original left half is sorted. 2796 02:09:02,802 --> 02:09:05,010 And I'll do the rest a little faster because, my God, 2797 02:09:05,010 --> 02:09:06,385 this feels like it takes forever. 2798 02:09:06,385 --> 02:09:08,430 But I bet we're on to something here. 2799 02:09:08,430 --> 02:09:10,320 What step remains next? 2800 02:09:10,320 --> 02:09:12,570 I've just sorted the left half of the original. 2801 02:09:12,570 --> 02:09:14,280 Sort the right half of the original. 2802 02:09:14,280 --> 02:09:15,300 How do I sort this? 2803 02:09:15,300 --> 02:09:17,880 I sort the left half of the right half. 2804 02:09:17,880 --> 02:09:19,050 How do I sort this? 2805 02:09:19,050 --> 02:09:21,360 I sort the left half of the left half. 2806 02:09:21,360 --> 02:09:22,320 Done. 2807 02:09:22,320 --> 02:09:24,360 I sort the right half of the left half. 2808 02:09:24,360 --> 02:09:25,260 Done. 2809 02:09:25,260 --> 02:09:27,390 Now I merge the two together. 2810 02:09:27,390 --> 02:09:30,190 The 1 comes first, the 6 comes next. 2811 02:09:30,190 --> 02:09:34,380 Now I sort the right half of the right half. 2812 02:09:34,380 --> 02:09:35,190 What do I do? 2813 02:09:35,190 --> 02:09:36,600 Sort the left half. 2814 02:09:36,600 --> 02:09:37,200 Done. 2815 02:09:37,200 --> 02:09:38,200 Sort the right half. 2816 02:09:38,200 --> 02:09:38,700 Done. 2817 02:09:38,700 --> 02:09:39,420 What do I do? 2818 02:09:39,420 --> 02:09:41,230 Merge them together. 2819 02:09:41,230 --> 02:09:43,080 So that's the third step of that phase. 2820 02:09:43,080 --> 02:09:47,760 Now where are we in the stor-- oh my God, where are we in the story? 2821 02:09:47,760 --> 02:09:52,230 We have sorted the left half of the right half 2822 02:09:52,230 --> 02:09:54,180 and the right half of the right half. 2823 02:09:54,180 --> 02:09:55,760 What comes next? 2824 02:09:55,760 --> 02:09:56,263 Merge. 2825 02:09:56,263 --> 02:09:58,430 So I'm going to compare, and I'm going to move those 2826 02:09:58,430 --> 02:10:00,590 down just to make clear what I'm comparing, 2827 02:10:00,590 --> 02:10:02,300 the beginning of both sublists. 2828 02:10:02,300 --> 02:10:03,110 What comes first? 2829 02:10:03,110 --> 02:10:04,610 Of course, the 0. 2830 02:10:04,610 --> 02:10:07,110 What comes next? 2831 02:10:07,110 --> 02:10:08,550 What comes next? 2832 02:10:08,550 --> 02:10:10,340 The 1. 2833 02:10:10,340 --> 02:10:11,730 What comes next? 2834 02:10:11,730 --> 02:10:13,010 The 3. 2835 02:10:13,010 --> 02:10:14,930 And then lastly comes the 6. 2836 02:10:14,930 --> 02:10:16,880 All right, where are we in the story? 2837 02:10:16,880 --> 02:10:19,220 We've now sorted the left half of the original 2838 02:10:19,220 --> 02:10:20,780 and the right half of the original. 2839 02:10:20,780 --> 02:10:22,430 What step remains? 2840 02:10:22,430 --> 02:10:23,030 Merge. 2841 02:10:23,030 --> 02:10:25,070 All right, so I'm going to make the same point. 2842 02:10:25,070 --> 02:10:27,690 And this is actually literally what we did earlier 2843 02:10:27,690 --> 02:10:31,760 because I deliberately demoed those original numbers in this order, 2 2844 02:10:31,760 --> 02:10:32,630 and a 0. 2845 02:10:32,630 --> 02:10:34,010 This comes out first. 2846 02:10:34,010 --> 02:10:35,300 What comes next? 2847 02:10:35,300 --> 02:10:36,230 2 and 1. 2848 02:10:36,230 --> 02:10:37,890 The 1 comes out next. 2849 02:10:37,890 --> 02:10:39,450 What comes next? 2850 02:10:39,450 --> 02:10:40,920 The 2 comes next. 2851 02:10:40,920 --> 02:10:42,030 What comes next? 2852 02:10:42,030 --> 02:10:43,410 The 3 comes next. 2853 02:10:43,410 --> 02:10:44,460 What comes next? 2854 02:10:44,460 --> 02:10:47,150 The 4. 2855 02:10:47,150 --> 02:10:48,590 What comes after that? 2856 02:10:48,590 --> 02:10:50,720 The 5. 2857 02:10:50,720 --> 02:10:51,920 What comes after that? 2858 02:10:51,920 --> 02:10:52,700 The 6. 2859 02:10:52,700 --> 02:10:56,760 And lastly-- this is when we run out of memory-- 2860 02:10:56,760 --> 02:11:01,010 the 7 over there is actually in place. 2861 02:11:01,010 --> 02:11:03,390 OK. 2862 02:11:03,390 --> 02:11:05,867 OK, so admittedly, a little harder to explain, 2863 02:11:05,867 --> 02:11:07,950 and honestly, it gets a little trippy because it's 2864 02:11:07,950 --> 02:11:10,980 so easy to forget about where you are in the story 2865 02:11:10,980 --> 02:11:13,680 because we're constantly diving into the algorithm 2866 02:11:13,680 --> 02:11:15,250 and then backing back out of it. 2867 02:11:15,250 --> 02:11:17,970 But in code, we could express this pretty correctly 2868 02:11:17,970 --> 02:11:21,000 and, it turns out, pretty efficiently because what 2869 02:11:21,000 --> 02:11:24,660 I was doing, even though it's longer when I do it verbally, 2870 02:11:24,660 --> 02:11:28,260 I was touching these elements a minimal amount of times, right? 2871 02:11:28,260 --> 02:11:31,530 I wasn't going back and forth, back and forth in front of the shelf 2872 02:11:31,530 --> 02:11:32,490 again and again. 2873 02:11:32,490 --> 02:11:37,240 I was deliberately only ever merging the smallest elements in each list. 2874 02:11:37,240 --> 02:11:40,200 So every time we merge, even though I was doing it quickly, 2875 02:11:40,200 --> 02:11:44,370 my fingers were only touching each of the elements once. 2876 02:11:44,370 --> 02:11:50,152 And how many times did we divide, divide, divide in half the list? 2877 02:11:50,152 --> 02:11:52,110 Well, we started with all of the elements here, 2878 02:11:52,110 --> 02:11:53,318 and there were eight of them. 2879 02:11:53,318 --> 02:11:56,860 And then we moved them 1, 2, 3 positions. 2880 02:11:56,860 --> 02:12:03,420 So the height of this visualization, if you will, is actually log n, right? 2881 02:12:03,420 --> 02:12:06,280 If I started with 8, turns out if you do the arithmetic, 2882 02:12:06,280 --> 02:12:10,380 this is log n height because 2 to the 3 is 8. 2883 02:12:10,380 --> 02:12:12,750 But for now, just trust that this is a log n height. 2884 02:12:12,750 --> 02:12:14,280 And how wide is the shelf? 2885 02:12:14,280 --> 02:12:18,270 Well, it's of width n because there's n elements any time 2886 02:12:18,270 --> 02:12:19,470 they were on the shelf. 2887 02:12:19,470 --> 02:12:23,737 So technically, I was kind of cheating this algorithm because this 2888 02:12:23,737 --> 02:12:25,320 is the first time I've needed shelves. 2889 02:12:25,320 --> 02:12:29,010 With the human examples, we just had the humans, and that's it, and only eight 2890 02:12:29,010 --> 02:12:29,520 of them. 2891 02:12:29,520 --> 02:12:32,200 Here, I was sort of using more and more memory. 2892 02:12:32,200 --> 02:12:34,770 In fact, I was using like four times as much memory 2893 02:12:34,770 --> 02:12:36,930 even though that was just for visualization's sake. 2894 02:12:36,930 --> 02:12:41,040 Merge sort actually requires that you have some spare space, an empty array 2895 02:12:41,040 --> 02:12:44,045 to move the elements into when you're merging them together. 2896 02:12:44,045 --> 02:12:46,920 But if I really wanted and if I didn't have this shelf or this shelf, 2897 02:12:46,920 --> 02:12:49,590 honestly, I could have just gone back and forth between the two shelves. 2898 02:12:49,590 --> 02:12:51,250 That would have been sufficient. 2899 02:12:51,250 --> 02:12:56,190 So merge sort uses more memory for this merging process, 2900 02:12:56,190 --> 02:12:59,100 but the advantage of using more memory is 2901 02:12:59,100 --> 02:13:04,680 that the total running time, if you can perhaps infer from that math, is what? 2902 02:13:04,680 --> 02:13:07,560 The big O notation for merge sort, it turns out, 2903 02:13:07,560 --> 02:13:10,530 is actually going to be n times log n. 2904 02:13:10,530 --> 02:13:13,050 And even if you're a little rusty still on your logarithms, 2905 02:13:13,050 --> 02:13:18,390 we saw in week zero and again today that log n is smaller than n. 2906 02:13:18,390 --> 02:13:19,650 That's a good thing. 2907 02:13:19,650 --> 02:13:20,880 Binary search was log n. 2908 02:13:20,880 --> 02:13:23,250 That's faster than linear search, which was n. 2909 02:13:23,250 --> 02:13:28,830 So n times log n is, of course, smaller than n times n or n squared. 2910 02:13:28,830 --> 02:13:31,830 So it's sort of lower on this little cheat sheet that I've been drawing, 2911 02:13:31,830 --> 02:13:35,350 which is to suggest that it's running time is indeed better or faster. 2912 02:13:35,350 --> 02:13:38,340 And in fact, if we consider the best case running time, 2913 02:13:38,340 --> 02:13:42,645 turns out it's not quite as good as bubble sort with omega of n, 2914 02:13:42,645 --> 02:13:45,270 where you can just sort of abort if you realize, wait a minute, 2915 02:13:45,270 --> 02:13:46,320 I've done no work. 2916 02:13:46,320 --> 02:13:50,850 Merge sort, you actually have to do that work to get to the finish line anyway. 2917 02:13:50,850 --> 02:13:56,170 So it's actually in omega and ultimately theta of n log n as well. 2918 02:13:56,170 --> 02:13:58,230 So again, a trade off there because if you 2919 02:13:58,230 --> 02:14:00,540 happen to have a data set that is very often sorted, 2920 02:14:00,540 --> 02:14:02,665 honestly, you might want to stick with bubble sort. 2921 02:14:02,665 --> 02:14:05,490 But in the general case, where the data is unsorted, 2922 02:14:05,490 --> 02:14:08,820 n log n as sounding better than n squared. 2923 02:14:08,820 --> 02:14:10,830 Well, what does it actually look or feel like? 2924 02:14:10,830 --> 02:14:14,230 Give me a moment to just change over to our visualization here. 2925 02:14:14,230 --> 02:14:18,210 And we'll see with this example what merge sort looks like 2926 02:14:18,210 --> 02:14:20,170 depicted with now these vertical bars. 2927 02:14:20,170 --> 02:14:22,860 So same algorithm, but instead of my numbers on shelves, 2928 02:14:22,860 --> 02:14:27,970 here is a random array of numbers being sorted. 2929 02:14:27,970 --> 02:14:30,120 And you can see it being done half at a time. 2930 02:14:30,120 --> 02:14:33,720 And you see sort of remnants of the previous bars. 2931 02:14:33,720 --> 02:14:35,520 Actually, that was unfair. 2932 02:14:35,520 --> 02:14:37,500 Let me zoom out here. 2933 02:14:37,500 --> 02:14:42,030 Let me zoom out so you can actually see the height here. 2934 02:14:42,030 --> 02:14:44,730 Let me go ahead and randomize this again and run merge sort. 2935 02:14:44,730 --> 02:14:45,310 There we go. 2936 02:14:45,310 --> 02:14:50,100 Now you can see the second array and where the values are going temporarily. 2937 02:14:50,100 --> 02:14:54,130 And even though this one looks way more cryptic visualization-wise, 2938 02:14:54,130 --> 02:14:56,170 it does seem to be moving faster. 2939 02:14:56,170 --> 02:14:59,670 And it seems to be merging halves together, and boom, it's done. 2940 02:14:59,670 --> 02:15:03,990 So let's actually see, in conclusion, what these algorithms compare to 2941 02:15:03,990 --> 02:15:07,080 and consider that moving forward as we write more and more code, 2942 02:15:07,080 --> 02:15:10,350 the goal is, again, not just to be correct but to be well-designed. 2943 02:15:10,350 --> 02:15:13,820 And one measure of design is going to indeed be efficiency. 2944 02:15:13,820 --> 02:15:18,120 So here we have, in final, a visualization of three algorithms-- 2945 02:15:18,120 --> 02:15:21,000 selection sort, bubble sort, and merge sort-- 2946 02:15:21,000 --> 02:15:22,450 from top to bottom. 2947 02:15:22,450 --> 02:15:25,620 And let's see what these algorithms might look or sound like here. 2948 02:15:25,620 --> 02:15:27,855 Oh, if we can dim the lights for dramatic effect-- 2949 02:15:27,855 --> 02:15:32,160 2950 02:15:32,160 --> 02:15:35,970 selection's on top, bubble on bottom, merge in the middle. 2951 02:15:35,970 --> 02:16:33,320 [MUSIC PLAYING] 2952 02:16:33,320 --> 02:17:11,000 [MUSIC PLAYING] 242218

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.