All language subtitles for https___cdn.cs50.net_2022_fall_lectures_2_lang_en_lecture2.srt

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:02,982 2 00:00:02,982 --> 00:00:06,461 [MUSIC PLAYING] 3 00:00:06,461 --> 00:01:12,600 4 00:01:12,600 --> 00:01:13,590 DAVID MALAN: All right. 5 00:01:13,590 --> 00:01:17,130 This is CS50, and this is week 2 wherein we're 6 00:01:17,130 --> 00:01:20,610 going to take a look at a lower level at how things work, 7 00:01:20,610 --> 00:01:24,120 and indeed, among the goals of the course is this bottom-up understanding 8 00:01:24,120 --> 00:01:26,670 so that in a couple of weeks' time, even a few years' time, 9 00:01:26,670 --> 00:01:29,920 when you encounter some new technology, you'll be able to think back hopefully 10 00:01:29,920 --> 00:01:33,180 on some of this week's and this is basic building blocks and primitives 11 00:01:33,180 --> 00:01:36,060 and really just deduce how tomorrow's technologies work. 12 00:01:36,060 --> 00:01:37,685 But along the way, it's going to seem-- 13 00:01:37,685 --> 00:01:40,727 it's going to be a little hard, perhaps, to see the forest for the trees, 14 00:01:40,727 --> 00:01:41,380 so to speak. 15 00:01:41,380 --> 00:01:44,783 And so the goal at the end of the day still is going to be problem-solving. 16 00:01:44,783 --> 00:01:47,700 And so we thought we'd begin today with a look at some of the problems 17 00:01:47,700 --> 00:01:50,405 we'll talk about or solve this coming week, 18 00:01:50,405 --> 00:01:53,280 and for that, we have some brave volunteers who have already come up. 19 00:01:53,280 --> 00:01:58,320 If we could turn on some dramatic lighting and meet today's volunteers. 20 00:01:58,320 --> 00:02:00,430 So on my left here, we have-- 21 00:02:00,430 --> 00:02:00,930 ALEX: Hi. 22 00:02:00,930 --> 00:02:01,960 My name is Alex. 23 00:02:01,960 --> 00:02:05,340 I'm a first-year at the college and I'm from Chapel Hill, North Carolina. 24 00:02:05,340 --> 00:02:07,080 DAVID MALAN: Welcome to Alex. 25 00:02:07,080 --> 00:02:09,180 And to Alex's right. 26 00:02:09,180 --> 00:02:10,050 SARAH: I'm Sarah. 27 00:02:10,050 --> 00:02:13,230 I'm from Toronto, Canada, and I'm also a first-year student at the college. 28 00:02:13,230 --> 00:02:14,188 DAVID MALAN: Wonderful. 29 00:02:14,188 --> 00:02:15,869 Well, welcome to both Alex and Sarah. 30 00:02:15,869 --> 00:02:18,577 So one of the problems you'll perhaps solve this week for problem 31 00:02:18,577 --> 00:02:22,442 set 2 is to analyze the reading level of a body of text, 32 00:02:22,442 --> 00:02:25,650 whether someone reads at a first grade level, second grade level, third grade 33 00:02:25,650 --> 00:02:28,570 level, all the way up to 12 or 13 or beyond. 34 00:02:28,570 --> 00:02:32,250 What you perhaps never quite thought about, certainly in terms of code, 35 00:02:32,250 --> 00:02:35,310 like how you would analyze some text, some book and figure 36 00:02:35,310 --> 00:02:36,750 out what reading level is it at. 37 00:02:36,750 --> 00:02:40,330 And yet, surely our teachers growing up knew or had an intuitive sense of this. 38 00:02:40,330 --> 00:02:42,450 So let's consider some sample text. 39 00:02:42,450 --> 00:02:45,960 For instance, Alex, what have you been reading lately? 40 00:02:45,960 --> 00:02:52,502 ALEX: One fish, two fish, red fish, blue fish. 41 00:02:52,502 --> 00:02:53,460 DAVID MALAN: Wonderful. 42 00:02:53,460 --> 00:02:58,890 So given that, what grade level would you say Alex is currently reading at? 43 00:02:58,890 --> 00:03:01,500 Feel free to just shout it out. 44 00:03:01,500 --> 00:03:02,730 First, first? 45 00:03:02,730 --> 00:03:07,200 So indeed, you'll see this week, if you run your code on Alex's text, 46 00:03:07,200 --> 00:03:10,410 it actually turns out he reads below a first grade reading level. 47 00:03:10,410 --> 00:03:12,400 But why might that be? 48 00:03:12,400 --> 00:03:16,410 What might your intuition be for why we've 49 00:03:16,410 --> 00:03:19,020 accused Alex of reading at this level? 50 00:03:19,020 --> 00:03:20,990 Feel free to shout out. 51 00:03:20,990 --> 00:03:21,490 Yeah. 52 00:03:21,490 --> 00:03:24,520 So very few syllables, short words, short sentences. 53 00:03:24,520 --> 00:03:27,828 And so there's some heuristics, perhaps, we can infer from that short text, 54 00:03:27,828 --> 00:03:30,370 that that probably means that it's best for younger children. 55 00:03:30,370 --> 00:03:33,370 Now Sarah, by contrast, what have you been reading? 56 00:03:33,370 --> 00:03:35,470 SARAH: Mr. And Mrs. Dursley of Number. 57 00:03:35,470 --> 00:03:38,890 Four Privet Drive were proud to say that they were 58 00:03:38,890 --> 00:03:41,050 perfectly normal, thank you very much. 59 00:03:41,050 --> 00:03:43,480 They were the last people you'd expect to be involved 60 00:03:43,480 --> 00:03:46,390 in anything strange or mysterious because they just 61 00:03:46,390 --> 00:03:47,952 didn't hold with much nonsense. 62 00:03:47,952 --> 00:03:48,910 DAVID MALAN: All right. 63 00:03:48,910 --> 00:03:50,950 Now irrespective of what grade you were in when 64 00:03:50,950 --> 00:03:53,283 you might have read that text, what grade level to Sarah 65 00:03:53,283 --> 00:03:55,230 seemed to be reading at? 66 00:03:55,230 --> 00:03:57,570 So eighth grade, second grade. 67 00:03:57,570 --> 00:03:58,080 OK. 68 00:03:58,080 --> 00:04:01,125 So hearing a bit of everything, so with that, at least according to code, 69 00:04:01,125 --> 00:04:03,240 it would actually be seventh grade. 70 00:04:03,240 --> 00:04:05,130 And what might the intuition there be? 71 00:04:05,130 --> 00:04:07,620 Why is that a higher grade level even though we might 72 00:04:07,620 --> 00:04:09,917 disagree exactly which grade it is? 73 00:04:09,917 --> 00:04:11,250 AUDIENCE: Complicated sentences. 74 00:04:11,250 --> 00:04:12,000 DAVID MALAN: Yeah. 75 00:04:12,000 --> 00:04:14,218 So complicated sentences, longer sentences. 76 00:04:14,218 --> 00:04:17,010 So indeed a lot more words were being spoken by Sarah because there 77 00:04:17,010 --> 00:04:18,519 was so much more there on the page. 78 00:04:18,519 --> 00:04:22,079 So we'll translate these ideas this coming week in problem set 2, 79 00:04:22,079 --> 00:04:25,170 if you tackle this one, through code so that you can ultimately 80 00:04:25,170 --> 00:04:26,910 infer things of these quantitatively. 81 00:04:26,910 --> 00:04:29,190 But to do so, we're going to have to understand text. 82 00:04:29,190 --> 00:04:32,610 So let's first thank our volunteers and then we'll dive in to that lower level. 83 00:04:32,610 --> 00:04:35,337 [APPLAUSE] 84 00:04:35,337 --> 00:04:39,910 85 00:04:39,910 --> 00:04:40,600 Sorry. 86 00:04:40,600 --> 00:04:41,490 You can keep those. 87 00:04:41,490 --> 00:04:42,222 SARAH: Oh, OK. 88 00:04:42,222 --> 00:04:43,180 DAVID MALAN: All right. 89 00:04:43,180 --> 00:04:45,970 So besides that, let's consider one other body of text 90 00:04:45,970 --> 00:04:48,010 perhaps that you might see this week, which 91 00:04:48,010 --> 00:04:50,210 is namely a little something like this. 92 00:04:50,210 --> 00:04:53,860 What I have here on the screen is what we'll start calling today ciphertext. 93 00:04:53,860 --> 00:04:56,530 It's the result of encrypting some piece of information. 94 00:04:56,530 --> 00:05:00,190 And encryption, or more generally, the art and science of cryptography 95 00:05:00,190 --> 00:05:00,908 is all around us. 96 00:05:00,908 --> 00:05:03,700 It's what you're using on the web, on your phones, with your banks. 97 00:05:03,700 --> 00:05:07,000 And anything that tries to keep data secure is using encryption. 98 00:05:07,000 --> 00:05:10,390 But there's going to be different levels of encryption-- strong encryption, 99 00:05:10,390 --> 00:05:11,140 weak encryption. 100 00:05:11,140 --> 00:05:14,590 And what you see here on the screen isn't all that strong, 101 00:05:14,590 --> 00:05:18,190 but we'll see later today how we might decrypt this and actually reveal 102 00:05:18,190 --> 00:05:22,030 what the plaintext is that corresponds to that ciphertext. 103 00:05:22,030 --> 00:05:25,670 But in order to do so, we have to start taking off some training wheels, 104 00:05:25,670 --> 00:05:26,197 so to speak. 105 00:05:26,197 --> 00:05:28,030 And believe it or not, even though your time 106 00:05:28,030 --> 00:05:30,100 would see this past week for the first time, 107 00:05:30,100 --> 00:05:32,230 probably, might have been rather in the weeds. 108 00:05:32,230 --> 00:05:36,072 And much more complicated seemingly than C, it turns out that along the way, 109 00:05:36,072 --> 00:05:37,780 we have been providing and we'll continue 110 00:05:37,780 --> 00:05:39,760 to provide certain training wheels. 111 00:05:39,760 --> 00:05:42,190 For instance, the CS50 Library is one of them, 112 00:05:42,190 --> 00:05:46,240 and even some of the explanations we give of topics for now 113 00:05:46,240 --> 00:05:49,120 in these early weeks will be somewhat simplified-- abstracted away, 114 00:05:49,120 --> 00:05:49,730 if you will. 115 00:05:49,730 --> 00:05:51,730 But the goal ultimately is for you to understand 116 00:05:51,730 --> 00:05:55,060 each and every one of those details so that after CS50, you really 117 00:05:55,060 --> 00:05:58,210 can stand on your own and understand and wrap your mind 118 00:05:58,210 --> 00:06:01,040 around any future technologies as well. 119 00:06:01,040 --> 00:06:05,318 So let's consider first the very first program with which we began last week, 120 00:06:05,318 --> 00:06:06,110 which was this one. 121 00:06:06,110 --> 00:06:09,215 So "hello, world" in C. At the end of the day, it was really the printf 122 00:06:09,215 --> 00:06:11,590 function that was doing the interesting part of the work, 123 00:06:11,590 --> 00:06:14,890 but there was a lot of technical stuff above and below it. 124 00:06:14,890 --> 00:06:19,900 The curly braces, the parentheses, words like void and include, and then 125 00:06:19,900 --> 00:06:21,730 of course, the angled brackets and more. 126 00:06:21,730 --> 00:06:25,870 But at the end of the day, we needed to convert that source code in C 127 00:06:25,870 --> 00:06:30,190 to machine code, the 0's and 1's in binary that the computer understood. 128 00:06:30,190 --> 00:06:32,500 And to do that, of course, we ran-- 129 00:06:32,500 --> 00:06:33,700 we compiled the code. 130 00:06:33,700 --> 00:06:37,400 We ran make and then we were able to actually run that code there. 131 00:06:37,400 --> 00:06:39,370 So let me actually go over here to VS Code 132 00:06:39,370 --> 00:06:44,510 and really quickly recreate that hello.c pretty much by transcribing the same. 133 00:06:44,510 --> 00:06:51,970 So I might have here include stdio.h, int main void. 134 00:06:51,970 --> 00:06:54,460 And then in here, I had quite simply, hello, 135 00:06:54,460 --> 00:06:57,430 comma, world with my backslash, endquotes, and more. 136 00:06:57,430 --> 00:07:01,693 Now last time, to compile this, I indeed ran make hello, followed by Enter. 137 00:07:01,693 --> 00:07:03,860 Hopefully you see no errors and that's a good thing. 138 00:07:03,860 --> 00:07:05,980 And if you do dot, slash, hello, you see, 139 00:07:05,980 --> 00:07:07,840 in fact, the results of that program. 140 00:07:07,840 --> 00:07:11,470 But it turns out that make is not actually a compiler 141 00:07:11,470 --> 00:07:12,950 as I alluded to last week. 142 00:07:12,950 --> 00:07:15,520 It's a program that clearly makes your program, 143 00:07:15,520 --> 00:07:19,030 but it itself just automates the process of using an actual compiler. 144 00:07:19,030 --> 00:07:21,290 And there's lots of different compilers out there, 145 00:07:21,290 --> 00:07:24,190 and the one that it's actually using underneath the hood 146 00:07:24,190 --> 00:07:27,640 is a little something called Clang for C Language. 147 00:07:27,640 --> 00:07:30,190 And Clang is a pretty popular compiler nowadays. 148 00:07:30,190 --> 00:07:33,520 There's another one that's been around for ages called GCC, 149 00:07:33,520 --> 00:07:36,330 but these are just specific names for types of compilers 150 00:07:36,330 --> 00:07:38,830 that different people, different companies, different groups 151 00:07:38,830 --> 00:07:40,310 have actually created. 152 00:07:40,310 --> 00:07:44,800 But if you use in week 1 a compiler yourself manually, 153 00:07:44,800 --> 00:07:47,170 you have to understand a little more about what's 154 00:07:47,170 --> 00:07:50,703 going on because it's even more cryptic than what just make alone. 155 00:07:50,703 --> 00:07:53,620 So in fact, let me go back to my terminal window here, let me go ahead 156 00:07:53,620 --> 00:07:58,690 and clear the screen a little bit and just run really the raw compiler 157 00:07:58,690 --> 00:07:59,360 command. 158 00:07:59,360 --> 00:08:01,450 So what make is automating for me let me, 159 00:08:01,450 --> 00:08:03,620 actually do this manually for just a moment. 160 00:08:03,620 --> 00:08:10,450 So if I want to compile hello.c into an executable program I can run, 161 00:08:10,450 --> 00:08:12,220 I can do this. 162 00:08:12,220 --> 00:08:17,110 clang, space, hello.c, and then Enter. 163 00:08:17,110 --> 00:08:20,980 And now there's no output, which is a good thing in this case, no errors, 164 00:08:20,980 --> 00:08:22,010 but notice this. 165 00:08:22,010 --> 00:08:25,450 If I go ahead and type ls, it turns out there's 166 00:08:25,450 --> 00:08:32,140 a file that's been created suddenly in my current folder weirdly called a.out. 167 00:08:32,140 --> 00:08:33,580 That stands for Assembler Output. 168 00:08:33,580 --> 00:08:35,980 And long story short, that's actually the default name 169 00:08:35,980 --> 00:08:39,440 of a program that's created when you just run Clang by itself. 170 00:08:39,440 --> 00:08:41,830 Now that's a pretty bad name for a program 171 00:08:41,830 --> 00:08:44,000 because it doesn't describe what it does. 172 00:08:44,000 --> 00:08:49,870 So better would be here to perhaps do, well, instead of a.out, which, yes, 173 00:08:49,870 --> 00:08:53,950 still prints hello.world, but isn't really a clearly-named program, 174 00:08:53,950 --> 00:08:55,420 it'd be nice to name this hello. 175 00:08:55,420 --> 00:08:56,240 So what could I do? 176 00:08:56,240 --> 00:08:59,740 I could do like we learned last week-- well, I could rename a.out to hello 177 00:08:59,740 --> 00:09:01,820 by using Linux's mv command. 178 00:09:01,820 --> 00:09:04,480 So I'm going to move a.out to become hello. 179 00:09:04,480 --> 00:09:06,370 But that, too, seems kind of tedious. 180 00:09:06,370 --> 00:09:07,720 Now I have three steps. 181 00:09:07,720 --> 00:09:10,750 Like write my code, compile my code, and then rename it 182 00:09:10,750 --> 00:09:12,190 before I can even run it. 183 00:09:12,190 --> 00:09:13,580 We can do better than that. 184 00:09:13,580 --> 00:09:15,580 And so it turns out that certain commands 185 00:09:15,580 --> 00:09:18,220 like clang support what we're going to start today 186 00:09:18,220 --> 00:09:20,380 calling command line arguments. 187 00:09:20,380 --> 00:09:24,010 A command line argument, unlike an argument to a function, 188 00:09:24,010 --> 00:09:27,040 is just an additional word or key phrase that you 189 00:09:27,040 --> 00:09:30,400 type after a command at your prompt in your terminal 190 00:09:30,400 --> 00:09:33,440 window that just modifies the behavior of that command. 191 00:09:33,440 --> 00:09:35,600 It configures it a little more specifically. 192 00:09:35,600 --> 00:09:39,220 So what you're seeing here on the screen is some of a better command with which 193 00:09:39,220 --> 00:09:45,220 to run clang so that now I can specify the output of this command per this o. 194 00:09:45,220 --> 00:09:46,610 So do what I mean by that? 195 00:09:46,610 --> 00:09:48,943 Well, let me go ahead and clear my terminal window again 196 00:09:48,943 --> 00:09:54,955 and more explicitly type clang -o hello hello.c and then Enter. 197 00:09:54,955 --> 00:09:57,580 Nothing, again, appears to happen, but that's a good thing when 198 00:09:57,580 --> 00:10:02,860 you see no errors and now the program I just created is indeed called Hello. 199 00:10:02,860 --> 00:10:07,280 So it achieves really the same exact effect as make did, but what. 200 00:10:07,280 --> 00:10:09,820 I don't have to do with make is type and remember something 201 00:10:09,820 --> 00:10:11,075 as long as this command. 202 00:10:11,075 --> 00:10:12,700 And this, too, is a bit of a white lie. 203 00:10:12,700 --> 00:10:16,420 It turns out, we have preconfigured VS Code in the cloud for you 204 00:10:16,420 --> 00:10:21,310 to also use some other features of Clang that would be even more 205 00:10:21,310 --> 00:10:22,840 tedious for you to write yourselves. 206 00:10:22,840 --> 00:10:28,130 And so really, this is why we distill this as ultimately just running make. 207 00:10:28,130 --> 00:10:31,900 So let me pause here to see first if there's any questions on what I've 208 00:10:31,900 --> 00:10:34,540 done by taking my very first program in C 209 00:10:34,540 --> 00:10:37,720 and just now compiling it first with make, but then starting over 210 00:10:37,720 --> 00:10:40,780 and now manually compiling it with clang with what 211 00:10:40,780 --> 00:10:44,500 we'll call command line arguments. -o, space, hello, 212 00:10:44,500 --> 00:10:46,820 and then the name of the file. 213 00:10:46,820 --> 00:10:47,320 Yeah? 214 00:10:47,320 --> 00:10:48,780 AUDIENCE: What is a.out? 215 00:10:48,780 --> 00:10:49,530 DAVID MALAN: Yeah. 216 00:10:49,530 --> 00:10:51,870 So a.out is a historical name. 217 00:10:51,870 --> 00:10:55,240 It refers to assembler output-- more on that soon. 218 00:10:55,240 --> 00:10:58,080 And it's just the default file name that you get automatically 219 00:10:58,080 --> 00:11:01,350 if you just run the compiler on any file so that you 220 00:11:01,350 --> 00:11:02,970 have just a standard name for it. 221 00:11:02,970 --> 00:11:05,213 But it's not a very well-named program. 222 00:11:05,213 --> 00:11:07,380 Instead of running Microsoft Word on your Mac or PC, 223 00:11:07,380 --> 00:11:09,880 it would be like double-clicking on a.out. 224 00:11:09,880 --> 00:11:11,880 So instead with these command line arguments, 225 00:11:11,880 --> 00:11:17,370 you can customize the output of Clang and call it hello or anything you want. 226 00:11:17,370 --> 00:11:23,020 Other questions on what I've done here with Clang itself, the compiler? 227 00:11:23,020 --> 00:11:23,520 Yeah? 228 00:11:23,520 --> 00:11:25,510 AUDIENCE: What is -o? 229 00:11:25,510 --> 00:11:26,565 DAVID MALAN: So -o-- 230 00:11:26,565 --> 00:11:29,440 and you would only know this from reading the manual, taking a class, 231 00:11:29,440 --> 00:11:30,500 means output. 232 00:11:30,500 --> 00:11:35,890 So -o means change Clang's output to be a file called hello 233 00:11:35,890 --> 00:11:38,680 instead of the default, which is a.out. 234 00:11:38,680 --> 00:11:42,400 And this, too, is, again, a detail you would have to look up on a web page, 235 00:11:42,400 --> 00:11:44,810 read the manual, hear someone like me tell you about it. 236 00:11:44,810 --> 00:11:46,893 And in fact, there's even more than these options, 237 00:11:46,893 --> 00:11:48,890 but we'll just scratch the surface here. 238 00:11:48,890 --> 00:11:49,390 All right. 239 00:11:49,390 --> 00:11:53,530 So if we now know this, what more is actually happening underneath the hood? 240 00:11:53,530 --> 00:11:57,250 Well, let's take a closer look at not just this version of my code, 241 00:11:57,250 --> 00:12:01,190 but my slightly more complicated version last week, 242 00:12:01,190 --> 00:12:03,430 which looked a little something like this, wherein 243 00:12:03,430 --> 00:12:07,330 I added in some dynamic input from the user so I could say not hello, world 244 00:12:07,330 --> 00:12:11,810 to everyone, but hello, David or hello to whoever actually runs this program. 245 00:12:11,810 --> 00:12:15,880 So in fact, let me go ahead and change my code here in VS Code just 246 00:12:15,880 --> 00:12:17,770 to match that same code from last week. 247 00:12:17,770 --> 00:12:19,190 So no new code yet. 248 00:12:19,190 --> 00:12:22,820 I'm just going to, in a moment, compile it in a slightly different way. 249 00:12:22,820 --> 00:12:29,020 So I did last week's string, I think, answer equals string, quote-unquote, 250 00:12:29,020 --> 00:12:30,100 "What's your name?" 251 00:12:30,100 --> 00:12:31,540 Just like in Scratch. 252 00:12:31,540 --> 00:12:35,920 And then down here, instead of doing world, I initially wrote answer, 253 00:12:35,920 --> 00:12:37,450 but that didn't go well. 254 00:12:37,450 --> 00:12:41,530 What did I ultimately do instead to print out hello, David or hello, 255 00:12:41,530 --> 00:12:42,940 so-and-so? 256 00:12:42,940 --> 00:12:44,722 Yeah? 257 00:12:44,722 --> 00:12:45,680 Sorry, a little louder? 258 00:12:45,680 --> 00:12:46,430 AUDIENCE: %s? 259 00:12:46,430 --> 00:12:50,478 DAVID MALAN: Yeah, so %s, the so-called format code that printf just knows how 260 00:12:50,478 --> 00:12:51,020 to deal with. 261 00:12:51,020 --> 00:12:52,470 And I had to add one other thing. 262 00:12:52,470 --> 00:12:54,350 Someone else besides %s-- 263 00:12:54,350 --> 00:12:54,850 yeah? 264 00:12:54,850 --> 00:12:56,050 AUDIENCE: The name of the variable. 265 00:12:56,050 --> 00:12:58,870 DAVID MALAN: The name of the variable that I want to plug into that 266 00:12:58,870 --> 00:13:00,190 placeholder %s. 267 00:13:00,190 --> 00:13:01,630 And in this case, it's answer. 268 00:13:01,630 --> 00:13:04,363 Now let me make one refinement only because now we're in week 2 269 00:13:04,363 --> 00:13:06,530 and we're going to start writing more lines of code, 270 00:13:06,530 --> 00:13:10,360 even though Scratch called the return value of the ask puzzle piece, 271 00:13:10,360 --> 00:13:11,560 answer always. 272 00:13:11,560 --> 00:13:14,480 And see, we have full control over what our variables are called. 273 00:13:14,480 --> 00:13:17,410 And now it's probably good not to just generically always call 274 00:13:17,410 --> 00:13:19,870 my variable answer if I'm using get_string. 275 00:13:19,870 --> 00:13:21,050 Let's call it what it is. 276 00:13:21,050 --> 00:13:23,680 So this is now just a matter of style, if you will. 277 00:13:23,680 --> 00:13:26,620 Let me change the variable to be name just so 278 00:13:26,620 --> 00:13:29,980 that it's a little clearer to me, to you, to a TF or TA 279 00:13:29,980 --> 00:13:34,000 exactly what that variable represents instead of more generically answer. 280 00:13:34,000 --> 00:13:37,030 All right, so that said, let me go down to my terminal window, 281 00:13:37,030 --> 00:13:41,050 and last week again, I ran make to compile this exact same program. 282 00:13:41,050 --> 00:13:43,270 Now, though, let me go ahead and just use clang. 283 00:13:43,270 --> 00:13:45,490 So clang -o-- 284 00:13:45,490 --> 00:13:47,500 I'll still call this version hello-- 285 00:13:47,500 --> 00:13:49,330 space, hello.c. 286 00:13:49,330 --> 00:13:51,080 So exact same command as before. 287 00:13:51,080 --> 00:13:54,640 The only thing that's different is I've added a couple of more lines of code 288 00:13:54,640 --> 00:13:56,330 to get the user's input. 289 00:13:56,330 --> 00:13:59,960 Let me hit Enter, and now, darn it, our first error. 290 00:13:59,960 --> 00:14:02,750 So output from clang and make is not a good thing, 291 00:14:02,750 --> 00:14:05,420 and here, we're seeing something particularly cryptic. 292 00:14:05,420 --> 00:14:09,010 So something in function 'main--' undefined reference 293 00:14:09,010 --> 00:14:13,480 to 'get_string,' string and then linker command failed with exit code 1. 294 00:14:13,480 --> 00:14:16,540 So there's actually a lot of jargon in there that will tease apart today, 295 00:14:16,540 --> 00:14:20,338 but my hint is that clearly my problem's in main, although that's not surprising 296 00:14:20,338 --> 00:14:22,130 because there's nothing else going on here. 297 00:14:22,130 --> 00:14:26,830 get_string is an issue, and the issue is that it's an undefined reference. 298 00:14:26,830 --> 00:14:28,990 And yet, notice, I was pretty good. 299 00:14:28,990 --> 00:14:32,920 I added the CS50 header file and I said last week that that's 300 00:14:32,920 --> 00:14:35,920 enough to teach the compiler that functions exist, 301 00:14:35,920 --> 00:14:39,070 but the problem is that even though this does, in fact, 302 00:14:39,070 --> 00:14:43,090 teach Clang that get_string exists, it is not 303 00:14:43,090 --> 00:14:47,530 sufficient information for Clang to go find on the hard drive of the computer 304 00:14:47,530 --> 00:14:51,860 the 0's and 1's that actually implement get_string itself. 305 00:14:51,860 --> 00:14:54,250 So in other words, this include line, per last week, 306 00:14:54,250 --> 00:14:55,333 is a little bit of a hint. 307 00:14:55,333 --> 00:14:59,560 It's a teaser to Clang that you're about to see and use this function somewhere. 308 00:14:59,560 --> 00:15:05,710 But if you actually want to use the 0's and 1's that CS50 wrote some time ago 309 00:15:05,710 --> 00:15:08,740 and bake those into your program so your program actually 310 00:15:08,740 --> 00:15:11,470 knows how to get input from the user, well then, 311 00:15:11,470 --> 00:15:15,440 I'm going to have to go ahead and run a slightly different command. 312 00:15:15,440 --> 00:15:16,250 So let me do this. 313 00:15:16,250 --> 00:15:18,917 Let me clear my terminal window just get rid of that distraction 314 00:15:18,917 --> 00:15:23,020 and let me propose now that we run this command instead. 315 00:15:23,020 --> 00:15:28,510 Almost the same as before, clang -o, space, hello, then hello.c, 316 00:15:28,510 --> 00:15:34,210 but with one additional command line argument at the end, and this is a -l-- 317 00:15:34,210 --> 00:15:35,050 not a number 1. 318 00:15:35,050 --> 00:15:39,370 So -lcs with no space in between those two. 319 00:15:39,370 --> 00:15:43,540 Now the l is going to result in all of those 0's and 1's that actually 320 00:15:43,540 --> 00:15:48,350 were in by CS50 being linked into your code, your few lines of code or mine 321 00:15:48,350 --> 00:15:48,850 here. 322 00:15:48,850 --> 00:15:53,530 But that's the second step that the compiler requires in order to know how 323 00:15:53,530 --> 00:15:58,537 to actually execute and rather compile your code and CS50's. 324 00:15:58,537 --> 00:16:00,370 And CS50 is not the only one that does this. 325 00:16:00,370 --> 00:16:04,750 If you use any third party library in C that doesn't come with the language, 326 00:16:04,750 --> 00:16:08,333 you would do -l such and such where whoever-- 327 00:16:08,333 --> 00:16:10,000 however they've named their own library. 328 00:16:10,000 --> 00:16:14,298 But you don't have to do it for built in things like we've been using thus far. 329 00:16:14,298 --> 00:16:16,090 All right, so let me go ahead and try this. 330 00:16:16,090 --> 00:16:19,000 I'll go back to VS Code here, and let me go ahead now 331 00:16:19,000 --> 00:16:23,620 and run clang -o hello, then hello.c. 332 00:16:23,620 --> 00:16:26,560 And now instead of just hitting Enter, -lcs50 333 00:16:26,560 --> 00:16:29,590 with no space between the l and the cs50, Enter. 334 00:16:29,590 --> 00:16:33,310 Now nothing bad happens, and now I can do ./hello. 335 00:16:33,310 --> 00:16:34,180 What's your name? 336 00:16:34,180 --> 00:16:37,633 I'll type in David, Enter, and now we see hello, David. 337 00:16:37,633 --> 00:16:40,300 Now honestly, this is where we're really getting into the weeds, 338 00:16:40,300 --> 00:16:42,130 and now this is taking-- 339 00:16:42,130 --> 00:16:45,730 this is really just adding nuisance to the process of compiling and running 340 00:16:45,730 --> 00:16:46,460 your code. 341 00:16:46,460 --> 00:16:49,960 And so the reality is, even though this is indeed what is happening, 342 00:16:49,960 --> 00:16:51,880 this is why we used last week and we're going 343 00:16:51,880 --> 00:16:55,240 to continue using this week onward make because it just 344 00:16:55,240 --> 00:16:57,130 automates that whole process for you. 345 00:16:57,130 --> 00:17:00,130 But it's ideal to understand what's going wrong because any of the error 346 00:17:00,130 --> 00:17:02,770 messages you saw for problem set 1, any of the error messages 347 00:17:02,770 --> 00:17:05,859 you see for the next few weeks probably aren't coming from make, 348 00:17:05,859 --> 00:17:08,560 they're coming from Clang underneath the hood 349 00:17:08,560 --> 00:17:10,780 because make is just automating the process. 350 00:17:10,780 --> 00:17:14,060 But with make, you literally just write make and then the name of the program, 351 00:17:14,060 --> 00:17:17,560 you don't have to worry about any of those command line arguments. 352 00:17:17,560 --> 00:17:22,240 Questions, then, on compiling with dash -lcs50 or anything else? 353 00:17:22,240 --> 00:17:23,043 Yeah? 354 00:17:23,043 --> 00:17:24,960 AUDIENCE: What is the benefit of [INAUDIBLE]?? 355 00:17:24,960 --> 00:17:26,220 DAVID MALAN: Sorry, what is the benefit of-- 356 00:17:26,220 --> 00:17:27,512 AUDIENCE: Using Clang manually. 357 00:17:27,512 --> 00:17:30,000 DAVID MALAN: What is the benefit of using Clang manually? 358 00:17:30,000 --> 00:17:30,870 None, really. 359 00:17:30,870 --> 00:17:33,450 In fact, all main is doing is just say-- make is doing 360 00:17:33,450 --> 00:17:35,055 is saving us some keystrokes. 361 00:17:35,055 --> 00:17:37,680 If you prefer, though, and you just like to be more in control, 362 00:17:37,680 --> 00:17:41,130 you can totally run Clang manually if you remember the various command line 363 00:17:41,130 --> 00:17:42,090 arguments. 364 00:17:42,090 --> 00:17:42,660 Yeah? 365 00:17:42,660 --> 00:17:47,335 AUDIENCE: So why did you have to explain [INAUDIBLE] 366 00:17:47,335 --> 00:17:48,210 DAVID MALAN: Exactly. 367 00:17:48,210 --> 00:17:49,560 Why did I have to explain-- 368 00:17:49,560 --> 00:17:53,220 that is, provide a hint to CS50 with the cs50.h header file, 369 00:17:53,220 --> 00:17:55,470 but I didn't have to do that with standardio.h? 370 00:17:55,470 --> 00:17:56,400 Just because. 371 00:17:56,400 --> 00:18:00,990 standardio.h comes with C, just like a few other libraries come 372 00:18:00,990 --> 00:18:03,060 with C that we'll start seeing today. 373 00:18:03,060 --> 00:18:05,410 CS50, though, is not built into C everywhere, 374 00:18:05,410 --> 00:18:07,890 and so you do have to explicitly add that one there. 375 00:18:07,890 --> 00:18:08,767 Yeah? 376 00:18:08,767 --> 00:18:11,970 AUDIENCE: Can you define what command line argument [INAUDIBLE]?? 377 00:18:11,970 --> 00:18:15,210 DAVID MALAN: A command line argument is a word or phrase 378 00:18:15,210 --> 00:18:17,740 that you type at the command line-- 379 00:18:17,740 --> 00:18:22,200 a.k.a., your terminal-- in order to influence the behavior of a program. 380 00:18:22,200 --> 00:18:22,742 AUDIENCE: OK. 381 00:18:22,742 --> 00:18:24,430 So it's a term for whatever you're giving it. 382 00:18:24,430 --> 00:18:24,565 DAVID MALAN: Yeah. 383 00:18:24,565 --> 00:18:25,660 It changes the defaults. 384 00:18:25,660 --> 00:18:27,790 In our GUI world, Graphical User Interface, 385 00:18:27,790 --> 00:18:29,680 you and I would probably click some boxes, 386 00:18:29,680 --> 00:18:32,350 we would select some menu options to configure a program 387 00:18:32,350 --> 00:18:33,460 to behave in the same way. 388 00:18:33,460 --> 00:18:36,850 At a command line interface, you have to just say everything all at once, 389 00:18:36,850 --> 00:18:39,600 and that's why we have command line arguments. 390 00:18:39,600 --> 00:18:40,605 Yeah? 391 00:18:40,605 --> 00:18:43,243 AUDIENCE: Is make [INAUDIBLE] 392 00:18:43,243 --> 00:18:43,910 DAVID MALAN: No. 393 00:18:43,910 --> 00:18:45,470 Make is not just for CS50. 394 00:18:45,470 --> 00:18:50,480 It's used globally in any project really nowadays using C, C++, 395 00:18:50,480 --> 00:18:52,020 even other languages as well. 396 00:18:52,020 --> 00:18:54,140 In fact, most every command you see in this class, 397 00:18:54,140 --> 00:18:57,530 unless it has 5-0 at the end of it, is globally used. 398 00:18:57,530 --> 00:19:00,758 Only those-- a suffix with 50 are, indeed, course-specific. 399 00:19:00,758 --> 00:19:03,050 And even those we'll gradually take training wheels off 400 00:19:03,050 --> 00:19:06,890 of so that exactly what those commands are doing as well. 401 00:19:06,890 --> 00:19:09,053 All right, so what is it that we've just done? 402 00:19:09,053 --> 00:19:11,720 Everything we've just done, of course, I keep calling compiling, 403 00:19:11,720 --> 00:19:13,580 but let's just go down one rabbit hole so 404 00:19:13,580 --> 00:19:15,967 that you understand that when you compile code, 405 00:19:15,967 --> 00:19:18,050 there's actually a whole bunch of steps, happening 406 00:19:18,050 --> 00:19:21,800 and this is going to enable a lot of features, like companies can 407 00:19:21,800 --> 00:19:26,060 write code and then convert it to run it on Macs and PCs alike 408 00:19:26,060 --> 00:19:27,240 or phones or the like. 409 00:19:27,240 --> 00:19:30,320 So it's not just a matter of converting source code to machine code, 410 00:19:30,320 --> 00:19:34,610 there's actually four steps involved in what you and I, as of last week, 411 00:19:34,610 --> 00:19:35,840 know as compiling. 412 00:19:35,840 --> 00:19:39,033 And these aren't terms that you'll have to keep in mind constantly 413 00:19:39,033 --> 00:19:41,450 because again, we're going to abstract a lot of this away. 414 00:19:41,450 --> 00:19:43,492 But just so we've gone down the rabbit hole once, 415 00:19:43,492 --> 00:19:45,890 let's consider each of these four steps that 416 00:19:45,890 --> 00:19:49,850 have been happening for you for a week automatically, the first of which 417 00:19:49,850 --> 00:19:51,080 is called preprocessing. 418 00:19:51,080 --> 00:19:52,260 So what does this mean? 419 00:19:52,260 --> 00:19:54,450 Well, let's consider that same program as before. 420 00:19:54,450 --> 00:19:57,830 So notice that two of the lines of code start with a hash mark. 421 00:19:57,830 --> 00:20:02,338 That is a special symbol in C, and it's a so-called preprocessor directive. 422 00:20:02,338 --> 00:20:04,130 You don't need to memorize terms like that, 423 00:20:04,130 --> 00:20:07,005 but it just means that it's a little different from every other line. 424 00:20:07,005 --> 00:20:08,960 And anything with a hash symbol here should 425 00:20:08,960 --> 00:20:13,315 be preprocessed-- that is, analyzed initially before anything else happens. 426 00:20:13,315 --> 00:20:17,100 So let's consider these two lines up top, what exactly is happening. 427 00:20:17,100 --> 00:20:19,220 Well, it turns out with these two lines, you 428 00:20:19,220 --> 00:20:23,390 have two header files, of course, cs50.h and stdio.h. 429 00:20:23,390 --> 00:20:27,980 Where are those files, because they've never been in VS Code for you, 430 00:20:27,980 --> 00:20:28,550 seemingly. 431 00:20:28,550 --> 00:20:31,940 If you type LS-- if you open up the File Explorer in the GUI, 432 00:20:31,940 --> 00:20:35,900 you have never seen, probably, cs50.h or stdio.h. 433 00:20:35,900 --> 00:20:39,620 They just work, but that's because there's a folder somewhere 434 00:20:39,620 --> 00:20:43,340 on the hard drive that you're using on your Mac or PC 435 00:20:43,340 --> 00:20:45,690 or somewhere in the cloud, as in our case. 436 00:20:45,690 --> 00:20:50,210 And inside of this folder, traditionally called /usr/include. 437 00:20:50,210 --> 00:20:51,857 And user is deliberately misspelled. 438 00:20:51,857 --> 00:20:54,440 It's just slightly more succinct, although it's a little weird 439 00:20:54,440 --> 00:20:55,760 why we drop that one letter. 440 00:20:55,760 --> 00:21:01,760 But usr/include is just a folder on the server that contains cs50.h, stdio.h, 441 00:21:01,760 --> 00:21:03,990 and a bunch of other things as well. 442 00:21:03,990 --> 00:21:08,030 So in fact, if you type in VS Code, in your terminal window, 443 00:21:08,030 --> 00:21:13,310 when you're using code spaces in the cloud and type LS space /usr/include, 444 00:21:13,310 --> 00:21:15,470 you can see all of the files in that folder. 445 00:21:15,470 --> 00:21:17,580 But we've preinstalled all of that stuff for you. 446 00:21:17,580 --> 00:21:20,390 So let's consider what's actually in those files here. 447 00:21:20,390 --> 00:21:25,370 If I highlight these two lines up top that start with hash include, well, 448 00:21:25,370 --> 00:21:30,530 I kind of hinted last week that what's in that first file is a hint as to what 449 00:21:30,530 --> 00:21:32,660 functions CS50 wrote for you. 450 00:21:32,660 --> 00:21:35,540 So you can kind of think of these include lines 451 00:21:35,540 --> 00:21:38,300 as being temporary placeholders for what's 452 00:21:38,300 --> 00:21:41,000 going to become like a global find and replace. 453 00:21:41,000 --> 00:21:44,270 That is the first thing clang is going to do is to preprocess this file. 454 00:21:44,270 --> 00:21:47,300 It's going to look for any line that starts with hash include. 455 00:21:47,300 --> 00:21:50,960 And if it sees that, it's going to essentially go into that file, 456 00:21:50,960 --> 00:21:55,190 like cs50.h, and then just copy and paste the contents of that file 457 00:21:55,190 --> 00:21:56,443 magically there for you. 458 00:21:56,443 --> 00:21:58,110 You don't see it visually on the screen. 459 00:21:58,110 --> 00:22:00,060 But it's happening behind the scenes. 460 00:22:00,060 --> 00:22:03,230 And so really, what's happening with this first line 461 00:22:03,230 --> 00:22:09,380 is that somewhere in cs50.h is the declaration of getString 462 00:22:09,380 --> 00:22:11,690 like we talked last week, and it probably 463 00:22:11,690 --> 00:22:13,215 looks a little something like this. 464 00:22:13,215 --> 00:22:15,590 And we didn't spend much time on this yet this past week, 465 00:22:15,590 --> 00:22:17,030 but we will in time more. 466 00:22:17,030 --> 00:22:21,470 Notice that this is how a function is declared. 467 00:22:21,470 --> 00:22:23,677 That is, it is decreed to exist. 468 00:22:23,677 --> 00:22:25,760 The name of the function, of course, is getString. 469 00:22:25,760 --> 00:22:28,310 Inside of the parentheses are its arguments. 470 00:22:28,310 --> 00:22:31,580 In this case, there's one argument to getString, I claim today, 471 00:22:31,580 --> 00:22:33,080 but you've known this implicitly. 472 00:22:33,080 --> 00:22:34,160 And it's a prompt. 473 00:22:34,160 --> 00:22:36,860 It's the prompt that the human sees when you use getString. 474 00:22:36,860 --> 00:22:37,790 What is that prompt? 475 00:22:37,790 --> 00:22:41,060 Well, it's a string of text, like quote unquote, "what's your name?" 476 00:22:41,060 --> 00:22:43,080 or anything else that I asked last week. 477 00:22:43,080 --> 00:22:46,610 Meanwhile, getString, as we know from last week, has a return value. 478 00:22:46,610 --> 00:22:48,140 It returns something to you. 479 00:22:48,140 --> 00:22:49,610 And that, too, is a string. 480 00:22:49,610 --> 00:22:52,120 So again, this is also called a functions prototype. 481 00:22:52,120 --> 00:22:53,870 It's the thing toward the end of last week 482 00:22:53,870 --> 00:22:57,560 that I just copied and pasted from the bottom of my file to the top, 483 00:22:57,560 --> 00:23:02,030 just so that it was like this teaser for clang as to what would exist later. 484 00:23:02,030 --> 00:23:07,670 So you can think, then, of these include lines as just kind of combining all 485 00:23:07,670 --> 00:23:11,360 of those function declarations in some separate file called cs50.h, 486 00:23:11,360 --> 00:23:14,780 so that you yourself don't have to type them every time you use the library-- 487 00:23:14,780 --> 00:23:18,470 or worse, so that you, yourself, don't have to copy and paste those lines. 488 00:23:18,470 --> 00:23:22,520 This is what clang is doing for you in its first step of preprocessing. 489 00:23:22,520 --> 00:23:27,470 Second, and last in this example, what happens when clang preprocesses 490 00:23:27,470 --> 00:23:29,175 this second include line? 491 00:23:29,175 --> 00:23:31,550 Well, the only other function we care about in this story 492 00:23:31,550 --> 00:23:33,650 is printf, of course, which comes with C. 493 00:23:33,650 --> 00:23:39,440 So essentially, you can think of printf's prototype or declaration 494 00:23:39,440 --> 00:23:40,820 as just being this. 495 00:23:40,820 --> 00:23:42,870 Printf is the name of the function. 496 00:23:42,870 --> 00:23:47,370 It takes a string that you want to format like, Hello comma world, 497 00:23:47,370 --> 00:23:49,110 or Hello comma %s. 498 00:23:49,110 --> 00:23:52,120 And then with dot, dot, dot, this actually has technical meaning. 499 00:23:52,120 --> 00:23:55,770 It means, of course, that you can plug-in 0 variables, 1 variable, 2 500 00:23:55,770 --> 00:23:56,340 or 10. 501 00:23:56,340 --> 00:23:58,530 So dot, dot, dot means some number of variables. 502 00:23:58,530 --> 00:24:00,072 Now we haven't talked about this yet. 503 00:24:00,072 --> 00:24:01,410 And we won't really, in general. 504 00:24:01,410 --> 00:24:05,490 printf actually returns a value, a number, that is an integer. 505 00:24:05,490 --> 00:24:07,420 But more on that perhaps another time. 506 00:24:07,420 --> 00:24:10,920 It's generally not something the programmer tends to look at. 507 00:24:10,920 --> 00:24:14,250 But that's all we mean by preprocessing, so that at the end of this process, 508 00:24:14,250 --> 00:24:18,030 even though there's more lines of code in cs50.h and stdio.h, 509 00:24:18,030 --> 00:24:21,330 what's really just happening is that clang, in preprocessing 510 00:24:21,330 --> 00:24:25,380 the file, copies and pastes the contents of those files into your code 511 00:24:25,380 --> 00:24:29,160 so that now your code knows about everything-- getString, printf, 512 00:24:29,160 --> 00:24:31,060 and anything else. 513 00:24:31,060 --> 00:24:35,230 Any questions, then, on that first step, preprocessing? 514 00:24:35,230 --> 00:24:35,920 Yes? 515 00:24:35,920 --> 00:24:49,195 AUDIENCE: [INAUDIBLE] 516 00:24:49,195 --> 00:24:50,320 DAVID MALAN: Good question. 517 00:24:50,320 --> 00:24:52,720 When you include a file, does it only include what 518 00:24:52,720 --> 00:24:54,880 you need or does it include everything? 519 00:24:54,880 --> 00:24:56,420 Think of it as including everything. 520 00:24:56,420 --> 00:24:59,020 So if it's a big file, that's a lot of code at the very top. 521 00:24:59,020 --> 00:25:01,880 And that's why, if you think back to all of the zeros and ones 522 00:25:01,880 --> 00:25:03,880 I showed a little bit ago, as well as last week, 523 00:25:03,880 --> 00:25:06,130 there's a lot of zeros and ones that end up 524 00:25:06,130 --> 00:25:08,892 on the screen as a result of just writing, Hello, world. 525 00:25:08,892 --> 00:25:10,600 A lot of those zeros and ones are perhaps 526 00:25:10,600 --> 00:25:13,390 coming from code that you didn't actually, necessarily need. 527 00:25:13,390 --> 00:25:15,340 But some of it is perhaps there, but there 528 00:25:15,340 --> 00:25:17,740 are ways to optimize that as well. 529 00:25:17,740 --> 00:25:22,395 All right, so step two of compiling is, confusingly, called compiling. 530 00:25:22,395 --> 00:25:24,520 It's just, this is the term that most everyone uses 531 00:25:24,520 --> 00:25:27,940 to describe the whole process, instead of just this one step. 532 00:25:27,940 --> 00:25:32,140 But once a program has been preprocessed behind the scenes 533 00:25:32,140 --> 00:25:35,865 by the compiler for you, it looks now a little something like this. 534 00:25:35,865 --> 00:25:38,740 And I've put dot, dot, dot just to imply that, yes, to your question, 535 00:25:38,740 --> 00:25:39,820 there's more stuff above it. 536 00:25:39,820 --> 00:25:40,987 There's more stuff below it. 537 00:25:40,987 --> 00:25:43,070 It's just not interesting right now for us. 538 00:25:43,070 --> 00:25:44,860 So now we have just C code. 539 00:25:44,860 --> 00:25:46,960 There's no more preprocessor directives. 540 00:25:46,960 --> 00:25:49,840 At this point, all of the hash symbols and those lines of code 541 00:25:49,840 --> 00:25:52,670 have been preprocessed and converted to something else. 542 00:25:52,670 --> 00:25:56,380 And so now-- and this is where things get a little spooky looking. 543 00:25:56,380 --> 00:26:00,370 Here now is what happens when clang, or any compiler, 544 00:26:00,370 --> 00:26:03,310 literally compiles code like this. 545 00:26:03,310 --> 00:26:08,720 It converts it from this in C to this in assembly code. 546 00:26:08,720 --> 00:26:10,720 So this is among the scarier languages. 547 00:26:10,720 --> 00:26:12,580 I, myself, don't really have fond memories. 548 00:26:12,580 --> 00:26:14,805 This is not a language that many people program in. 549 00:26:14,805 --> 00:26:16,930 If you take a subsequent class in computer science, 550 00:26:16,930 --> 00:26:19,600 in systems, a higher level class, you might actually 551 00:26:19,600 --> 00:26:21,430 learn this or some variant thereof. 552 00:26:21,430 --> 00:26:23,232 But there's at least a few people out there 553 00:26:23,232 --> 00:26:24,940 that need to know this stuff because this 554 00:26:24,940 --> 00:26:29,320 is closer to what the computers themselves, nowadays, understand. 555 00:26:29,320 --> 00:26:34,600 The Intel CPUs or the AMD CPUs, the brains of today's computers and phones 556 00:26:34,600 --> 00:26:37,960 understand stuff that looks more like this and less like C. 557 00:26:37,960 --> 00:26:42,430 Now it's completely esoteric, but let me just highlight a few phrases. 558 00:26:42,430 --> 00:26:44,630 There's some stuff that's a little familiar. 559 00:26:44,630 --> 00:26:47,620 There is mention of main at the top there in yellow. 560 00:26:47,620 --> 00:26:49,750 There is mention of getString toward the bottom. 561 00:26:49,750 --> 00:26:52,070 There is mention of printf down below. 562 00:26:52,070 --> 00:26:55,600 So this is just another programming language called assembly language, 563 00:26:55,600 --> 00:26:57,010 that decades ago, humans-- 564 00:26:57,010 --> 00:26:58,450 myself included in school-- 565 00:26:58,450 --> 00:27:00,130 did write code in. 566 00:27:00,130 --> 00:27:02,630 And absolutely, some people still write this code, 567 00:27:02,630 --> 00:27:06,070 especially since you can write very, very efficient code. 568 00:27:06,070 --> 00:27:08,590 But it's a lot more arcane. 569 00:27:08,590 --> 00:27:11,380 It's a lot less user friendly. 570 00:27:11,380 --> 00:27:14,650 So you'll see in yellow now, these are the so-called instructions 571 00:27:14,650 --> 00:27:18,460 that a computer's brain or CPU understands, pushing values 572 00:27:18,460 --> 00:27:23,630 around, moving them, subtracting values, calling functions, and move, move, 573 00:27:23,630 --> 00:27:24,130 move. 574 00:27:24,130 --> 00:27:27,400 So really, the low-level operations that computers understand 575 00:27:27,400 --> 00:27:31,030 tend to be arithmetic operations-- subtraction, addition, 576 00:27:31,030 --> 00:27:34,120 and the like-- moving things in and out of memory. 577 00:27:34,120 --> 00:27:37,510 It's just a lot more tedious for folks like us to write code like this. 578 00:27:37,510 --> 00:27:40,450 This is why you and I tend to write stuff like this. 579 00:27:40,450 --> 00:27:44,080 And ideally, still, people like you and I tend to drag and drop puzzle pieces 580 00:27:44,080 --> 00:27:46,520 that sort of abstract all of that away further. 581 00:27:46,520 --> 00:27:49,420 But for now, this is, again, called assembly language. 582 00:27:49,420 --> 00:27:54,310 It is what happens when the compiler literally compiles your code. 583 00:27:54,310 --> 00:27:57,010 But of course, this, still not zeros and ones. 584 00:27:57,010 --> 00:27:58,580 So we got two steps to go. 585 00:27:58,580 --> 00:28:02,270 So when a compiler proceeds to step three, 586 00:28:02,270 --> 00:28:05,530 this is where things get converted to machine code. 587 00:28:05,530 --> 00:28:08,500 And when a compiler assembles your code for you, 588 00:28:08,500 --> 00:28:14,260 it converts what we just saw on the screen here to actual zeros and ones-- 589 00:28:14,260 --> 00:28:18,550 the so-called machine code that your phone or your computer understands. 590 00:28:18,550 --> 00:28:22,120 But it's worth noting that these are not necessarily all 591 00:28:22,120 --> 00:28:24,280 of the zeros and ones of your program. 592 00:28:24,280 --> 00:28:29,980 Yes, they are the zeros and ones that correspond to your Hello program 593 00:28:29,980 --> 00:28:33,250 or printf and getString and the like, but notice 594 00:28:33,250 --> 00:28:36,940 that here, we need one final step. 595 00:28:36,940 --> 00:28:40,100 In those zeros and ones are only your lines of code. 596 00:28:40,100 --> 00:28:43,540 But what about CS50's lines of code that we wrote to implement getString? 597 00:28:43,540 --> 00:28:46,990 What about the lines of code that humans wrote decades ago to implement printf? 598 00:28:46,990 --> 00:28:50,020 Those are somewhere on this hard drive, like on my Mac, my PC, 599 00:28:50,020 --> 00:28:54,460 or somewhere in the cloud, but we need to combine all of those zeros and ones 600 00:28:54,460 --> 00:29:01,390 together and link my code with CS50's code with standard I/O's code, 601 00:29:01,390 --> 00:29:02,420 all together. 602 00:29:02,420 --> 00:29:05,110 And so what happens in the last step, ultimately, 603 00:29:05,110 --> 00:29:07,960 is that if we have my code here in yellow, 604 00:29:07,960 --> 00:29:11,440 and then the code that CS50 wrote, and the code that the authors of C 605 00:29:11,440 --> 00:29:15,940 itself wrote, what really is happening is that somewhere, we have not only 606 00:29:15,940 --> 00:29:19,960 hello.c, which, obviously, I wrote, and wrote with us live here, 607 00:29:19,960 --> 00:29:24,550 there's also, let's assume, somewhere on the computer, a cs50.c file 608 00:29:24,550 --> 00:29:28,210 that, coincidentally, I and CS50 staff wrote years ago. 609 00:29:28,210 --> 00:29:30,790 And also, somewhere on the computer, there's another file. 610 00:29:30,790 --> 00:29:34,120 Let me oversimplify by just calling it stdio.c. 611 00:29:34,120 --> 00:29:36,850 In practice, it's probably specifically called printf.c. 612 00:29:36,850 --> 00:29:39,460 But they're somewhere, these two other files. 613 00:29:39,460 --> 00:29:44,110 And so this last step called linking takes my zeros and ones 614 00:29:44,110 --> 00:29:48,100 from the code I just wrote, namely this code on the screen here. 615 00:29:48,100 --> 00:29:50,810 It then grabs the zeros and ones that CS50 wrote. 616 00:29:50,810 --> 00:29:53,480 And it grabs the zeros and ones that the authors of C wrote, 617 00:29:53,480 --> 00:29:56,240 in order to implement the standard I/O library. 618 00:29:56,240 --> 00:30:00,750 And lastly, voila, links them all together. 619 00:30:00,750 --> 00:30:03,980 And this is the same blob of zeros and ones that we saw earlier. 620 00:30:03,980 --> 00:30:08,090 It's just now the result of preprocessing your code, 621 00:30:08,090 --> 00:30:12,620 compiling your code, assembling your code, linking your code, and my God, 622 00:30:12,620 --> 00:30:15,830 at this point, like if there were any fun in programming for you yet, 623 00:30:15,830 --> 00:30:19,620 we've just taken it all away, we just call this whole process compiling. 624 00:30:19,620 --> 00:30:20,120 Why? 625 00:30:20,120 --> 00:30:22,490 Because now that we know those steps exist-- 626 00:30:22,490 --> 00:30:25,370 and smart people solve that problem for us-- 627 00:30:25,370 --> 00:30:27,890 you and I can kind of operate at this level of abstraction 628 00:30:27,890 --> 00:30:32,420 and just assume that compiling converts source code to machine code. 629 00:30:32,420 --> 00:30:36,350 Questions, though, on any of these intermediate steps? 630 00:30:36,350 --> 00:30:37,360 Yeah? 631 00:30:37,360 --> 00:30:41,958 AUDIENCE: For linking, are different parts, like [INAUDIBLE]?? 632 00:30:41,958 --> 00:30:50,072 633 00:30:50,072 --> 00:30:51,280 DAVID MALAN: A good question. 634 00:30:51,280 --> 00:30:53,238 So where are all of these zeros and one stored? 635 00:30:53,238 --> 00:30:56,400 Because you and I, we've been using a browser, right? code.cs50.io, 636 00:30:56,400 --> 00:30:58,330 of course, is this web-based user interface. 637 00:30:58,330 --> 00:31:00,497 But again, recall from last week, even though you're 638 00:31:00,497 --> 00:31:05,640 using a web browser to access VS Code, that web-based version of VS code 639 00:31:05,640 --> 00:31:09,000 is connected to an actual server somewhere in the cloud. 640 00:31:09,000 --> 00:31:13,170 And on that server, you have your own account and your own files, and really, 641 00:31:13,170 --> 00:31:15,360 your own hard drive, virtually in the cloud. 642 00:31:15,360 --> 00:31:18,872 Think of it a little like Dropbox or Box or Google Drive or OneDrive 643 00:31:18,872 --> 00:31:19,830 or something like that. 644 00:31:19,830 --> 00:31:23,310 So you have a hard drive somewhere out there that we've provisioned for you. 645 00:31:23,310 --> 00:31:27,930 And it's on that hard drive that you have your code that you just wrote, 646 00:31:27,930 --> 00:31:32,700 or I just wrote, cs50.c, stdio.c, and all of the other code 647 00:31:32,700 --> 00:31:36,967 that implements the math functions and everything else that C supports. 648 00:31:36,967 --> 00:31:37,550 Good question. 649 00:31:37,550 --> 00:31:38,964 Yeah? 650 00:31:38,964 --> 00:31:45,425 AUDIENCE: So, say in the CS50 library, the line [INAUDIBLE] 651 00:31:45,425 --> 00:31:49,401 do we do the same exact thing [INAUDIBLE] 652 00:31:49,401 --> 00:31:51,935 copy paste them all the way over? 653 00:31:51,935 --> 00:31:53,060 DAVID MALAN: Good question. 654 00:31:53,060 --> 00:31:57,110 That hash includes cs50.h line at the top of my code. 655 00:31:57,110 --> 00:32:01,310 If I just replace that with the contents of cs50.c, would that work? 656 00:32:01,310 --> 00:32:03,590 Short answer, yes, that would work. 657 00:32:03,590 --> 00:32:05,400 You could copy all of the code there. 658 00:32:05,400 --> 00:32:08,577 However, there's some order of operations that might come into play. 659 00:32:08,577 --> 00:32:10,910 And so it's probably not quite as simple as copy, paste. 660 00:32:10,910 --> 00:32:13,190 But conceptually, yes, that's what's happening. 661 00:32:13,190 --> 00:32:19,370 Now with that said, in cs50.h, are only the prototypes of the functions, 662 00:32:19,370 --> 00:32:23,628 the hints as to how the functions look, what their return type is, 663 00:32:23,628 --> 00:32:25,670 what their name is, and what their arguments are. 664 00:32:25,670 --> 00:32:29,867 It's in the dot c file that actual code tends to be written. 665 00:32:29,867 --> 00:32:32,450 And this is a little confusing now because you and I have only 666 00:32:32,450 --> 00:32:33,920 written code in dot c files. 667 00:32:33,920 --> 00:32:35,690 But in the next few weeks, you'll actually 668 00:32:35,690 --> 00:32:37,940 start writing some of your own dot h files 669 00:32:37,940 --> 00:32:40,460 as well, just like CS50, just like standard I/O. 670 00:32:40,460 --> 00:32:44,150 But in essence, that line of code just makes it easier to use and reuse 671 00:32:44,150 --> 00:32:46,020 code that's already been written. 672 00:32:46,020 --> 00:32:47,750 And that's the whole point of a library. 673 00:32:47,750 --> 00:32:50,327 AUDIENCE: Does linking them [INAUDIBLE]? 674 00:32:50,327 --> 00:32:51,910 DAVID MALAN: Say that a little louder. 675 00:32:51,910 --> 00:32:54,472 AUDIENCE: Does linking happen when you use the compiler? 676 00:32:54,472 --> 00:32:55,180 DAVID MALAN: Yes. 677 00:32:55,180 --> 00:32:56,980 Does linking happen when you compile your code? 678 00:32:56,980 --> 00:32:57,480 Yes. 679 00:32:57,480 --> 00:33:02,320 When you run make, as we have been doing the past week now, 680 00:33:02,320 --> 00:33:04,570 all four of these steps are happening. 681 00:33:04,570 --> 00:33:07,780 Preprocessing converts the hash include lines to something else. 682 00:33:07,780 --> 00:33:10,600 Compiling technically converts it to assembly 683 00:33:10,600 --> 00:33:14,290 code, which the Mac, the PC, the server more closely understands. 684 00:33:14,290 --> 00:33:18,850 Assembly converts that language to binary machine code that this computer 685 00:33:18,850 --> 00:33:20,080 actually understands. 686 00:33:20,080 --> 00:33:22,540 And then linking combines everything together. 687 00:33:22,540 --> 00:33:27,550 And in fact, if you think back a few minutes ago to when I did this -lcs50, 688 00:33:27,550 --> 00:33:30,070 the reason I had to add that, and the reason 689 00:33:30,070 --> 00:33:32,860 my code did not compile at first, was because I 690 00:33:32,860 --> 00:33:38,650 forgot to tell clang to link in CS50's zeros and ones per that last step. 691 00:33:38,650 --> 00:33:42,147 I don't need to do -lstdio because it comes with C, 692 00:33:42,147 --> 00:33:44,480 so that would just be tedious for everyone in the world. 693 00:33:44,480 --> 00:33:47,140 But CS50 does not come with C, so we link that in. 694 00:33:47,140 --> 00:33:49,780 And to be clear, too, we won't always use CS50's library. 695 00:33:49,780 --> 00:33:53,072 That'll be yet another pair of training wheels we take off in the coming weeks. 696 00:33:53,072 --> 00:33:55,000 But for now, it makes a few things simpler. 697 00:33:55,000 --> 00:33:57,284 Yeah? 698 00:33:57,284 --> 00:33:59,750 AUDIENCE: What is the [INAUDIBLE]? 699 00:33:59,750 --> 00:34:08,878 700 00:34:08,878 --> 00:34:10,170 DAVID MALAN: Short answer, yes. 701 00:34:10,170 --> 00:34:12,870 So what do the zeros and ones, the machine code, translate to? 702 00:34:12,870 --> 00:34:15,690 Yes, there is a one-to-one relationship between the machine 703 00:34:15,690 --> 00:34:17,340 code and the assembly code. 704 00:34:17,340 --> 00:34:21,510 Assembly code, it's not really English, but at least it's symbols I recognize. 705 00:34:21,510 --> 00:34:22,800 It's not zeros and ones. 706 00:34:22,800 --> 00:34:24,810 Machine code, of course, is just zeros and ones. 707 00:34:24,810 --> 00:34:27,960 So back in the day, before C existed, people 708 00:34:27,960 --> 00:34:30,630 were programming only in assembly code. 709 00:34:30,630 --> 00:34:34,469 Before assembly code existed, people were coding in zeros and ones. 710 00:34:34,469 --> 00:34:36,719 And you can imagine just how painful that was, 711 00:34:36,719 --> 00:34:39,027 and so each of these languages makes life, for us, 712 00:34:39,027 --> 00:34:40,110 sort of easier and easier. 713 00:34:40,110 --> 00:34:42,330 In a few weeks, we'll transition to Python, which 714 00:34:42,330 --> 00:34:45,300 will, in turn, make C even simpler-- 715 00:34:45,300 --> 00:34:48,090 or coding, in general, simpler to do too. 716 00:34:48,090 --> 00:34:53,346 All right, so with that said, what now can we-- 717 00:34:53,346 --> 00:34:55,060 what could go wrong with this? 718 00:34:55,060 --> 00:34:58,140 Well, it turns out that besides compiling, technically speaking, 719 00:34:58,140 --> 00:34:59,233 there's decompiling. 720 00:34:59,233 --> 00:35:01,150 And we've not done this, and we won't do this. 721 00:35:01,150 --> 00:35:04,080 But it's worth considering for just a moment. 722 00:35:04,080 --> 00:35:07,560 If you were to not compile your code, but decompile it-- 723 00:35:07,560 --> 00:35:11,340 as the word suggests, this just means reversing the process, converting it, 724 00:35:11,340 --> 00:35:14,580 ideally, from machine code-- zeros and ones-- 725 00:35:14,580 --> 00:35:19,870 maybe back to C. Now this would be cool, perhaps, if all you have is a program, 726 00:35:19,870 --> 00:35:22,080 you can convert it and see the actual source code. 727 00:35:22,080 --> 00:35:25,320 What might a downside be, if anyone on the internet 728 00:35:25,320 --> 00:35:28,650 is able to decompile code on their machine? 729 00:35:28,650 --> 00:35:29,160 Yeah? 730 00:35:29,160 --> 00:35:30,270 AUDIENCE: [INAUDIBLE] 731 00:35:30,270 --> 00:35:34,130 DAVID MALAN: OK, so it's easier to find bugs in the code that-- 732 00:35:34,130 --> 00:35:35,430 oh, to exploit. 733 00:35:35,430 --> 00:35:38,417 So it might be easier to hack into the software 734 00:35:38,417 --> 00:35:41,000 by finding mistakes you and I made because, literally, they're 735 00:35:41,000 --> 00:35:43,370 staring at you in code, whereas the zeros and ones make 736 00:35:43,370 --> 00:35:45,080 it way less obvious. 737 00:35:45,080 --> 00:35:48,140 Other downsides of what I called decompiling? 738 00:35:48,140 --> 00:35:49,970 Yeah? 739 00:35:49,970 --> 00:35:53,690 AUDIENCE: If stuff is copyrighted or you don't even know how to get it-- 740 00:35:53,690 --> 00:35:54,440 DAVID MALAN: Yeah. 741 00:35:54,440 --> 00:35:55,948 AUDIENCE: [INAUDIBLE] 742 00:35:55,948 --> 00:35:57,740 DAVID MALAN: Yeah, if your code, your work, 743 00:35:57,740 --> 00:36:00,950 is your intellectual property, copyrighted or otherwise, that's 744 00:36:00,950 --> 00:36:03,660 kind of obnoxious that someone can just run a command, and boom, 745 00:36:03,660 --> 00:36:05,577 they can see the original code that you wrote. 746 00:36:05,577 --> 00:36:08,490 Now, it turns out it's not quite as simple as that. 747 00:36:08,490 --> 00:36:11,720 And so even though, yes, you could take a program like Hello, 748 00:36:11,720 --> 00:36:15,080 or even Microsoft Word, and convert it from zeros and ones 749 00:36:15,080 --> 00:36:19,400 back to some form of source code-- be it in C or Java 750 00:36:19,400 --> 00:36:22,820 or Python or something else, whatever it was originally written in-- odds 751 00:36:22,820 --> 00:36:25,800 are it's going to be an utter mess to look at. 752 00:36:25,800 --> 00:36:26,300 Why? 753 00:36:26,300 --> 00:36:30,390 Because things variable names are not retained in the zeros and ones, 754 00:36:30,390 --> 00:36:30,890 typically. 755 00:36:30,890 --> 00:36:33,980 Function names might not be retained in the zeros and ones. 756 00:36:33,980 --> 00:36:36,350 The code is, the logic is, but the computer 757 00:36:36,350 --> 00:36:38,510 doesn't care what pretty variables you chose 758 00:36:38,510 --> 00:36:41,060 and how nicely named your functions were, it just 759 00:36:41,060 --> 00:36:42,890 needs to know them as zeros and ones. 760 00:36:42,890 --> 00:36:46,370 Moreover, if you think about last week, we introduced things like loops in C. 761 00:36:46,370 --> 00:36:49,745 And besides for loops, there's what other kind of loop, for instance? 762 00:36:49,745 --> 00:36:50,620 AUDIENCE: [INAUDIBLE] 763 00:36:50,620 --> 00:36:53,412 DAVID MALAN: So, a while loop-- and even though they look different 764 00:36:53,412 --> 00:36:55,920 and you have to write different code, they achieve exactly 765 00:36:55,920 --> 00:36:59,910 the same functionality, which is to say, when you compile a for loop 766 00:36:59,910 --> 00:37:04,140 or you compile a while loop, if they logically do the same thing, 767 00:37:04,140 --> 00:37:07,420 they might end up looking identical as zeros and ones. 768 00:37:07,420 --> 00:37:09,780 And so, therefore, it's not necessarily predictable 769 00:37:09,780 --> 00:37:11,820 that you'll get back the original code, why? 770 00:37:11,820 --> 00:37:15,110 Because the zeros and ones might not know, so to speak, 771 00:37:15,110 --> 00:37:16,860 whether it was a for loop or a while loop, 772 00:37:16,860 --> 00:37:19,350 so maybe compiling will show you one or the other. 773 00:37:19,350 --> 00:37:21,870 And honestly, decompiling, while possible-- and it's 774 00:37:21,870 --> 00:37:24,570 one way of reverse engineering someone's product. 775 00:37:24,570 --> 00:37:28,662 Odds are, if you're good enough to start reading code that's been decompiled 776 00:37:28,662 --> 00:37:30,870 and reading through the messiness of it, odds are you 777 00:37:30,870 --> 00:37:34,020 have the talent probably to just write that same program from scratch 778 00:37:34,020 --> 00:37:34,650 yourself. 779 00:37:34,650 --> 00:37:36,870 Now, that's an overstatement, perhaps, but it's not 780 00:37:36,870 --> 00:37:40,410 quite as easy or threatening as you might first think. 781 00:37:40,410 --> 00:37:43,290 So in general, once code is compiled, it's 782 00:37:43,290 --> 00:37:48,290 pretty challenging, time consuming, costly to reverse engineer it, much 783 00:37:48,290 --> 00:37:50,040 like it would be in the real world, right? 784 00:37:50,040 --> 00:37:52,860 Like all of us have some kind of phone, probably, nowadays in our pocket. 785 00:37:52,860 --> 00:37:55,193 There's nothing stopping you from opening it up somehow, 786 00:37:55,193 --> 00:37:57,060 poking around, recreating what's there. 787 00:37:57,060 --> 00:37:59,130 That's a huge amount of effort, most likely. 788 00:37:59,130 --> 00:38:01,880 And at that point, maybe you should just invent the phone, instead 789 00:38:01,880 --> 00:38:03,310 of trying to reverse engineer it. 790 00:38:03,310 --> 00:38:06,330 So same kind of idea in the physical world. 791 00:38:06,330 --> 00:38:13,050 Any questions, then, on compiling, or even decompiling in these forms? 792 00:38:13,050 --> 00:38:17,160 All right, so odds are, at this point, not only I, but you have made mistakes. 793 00:38:17,160 --> 00:38:19,050 And you've written buggy code-- 794 00:38:19,050 --> 00:38:22,350 a bug in a code is just a mistake, a logical error 795 00:38:22,350 --> 00:38:26,490 or otherwise, where the code just does not behave correctly as you intend. 796 00:38:26,490 --> 00:38:29,880 And up until now, odds are, your debugging techniques 797 00:38:29,880 --> 00:38:32,910 have been to maybe look back at what I did in class, maybe 798 00:38:32,910 --> 00:38:35,320 ask a question online or in-person. 799 00:38:35,320 --> 00:38:38,190 But ultimately, it'd be nice if you had some tools of your own 800 00:38:38,190 --> 00:38:39,570 with which to debug code. 801 00:38:39,570 --> 00:38:41,587 And this, honestly, is a lifelong skill. 802 00:38:41,587 --> 00:38:43,170 You're not going to emerge from CS50-- 803 00:38:43,170 --> 00:38:44,490 and even 20 years from now, you're not going 804 00:38:44,490 --> 00:38:47,910 to be writing-- if you're writing code at all-- correct code all of the time. 805 00:38:47,910 --> 00:38:50,820 Like, all of us on the staff continue to write bugs. 806 00:38:50,820 --> 00:38:54,120 Hopefully, they get a little more sophisticated, and not sort of like, 807 00:38:54,120 --> 00:38:55,540 oops, I missed a semicolon. 808 00:38:55,540 --> 00:38:57,660 But even those kinds of mistakes, we make too. 809 00:38:57,660 --> 00:39:00,150 But there's tools out there and techniques 810 00:39:00,150 --> 00:39:03,550 that can make your life easier when it comes to solving those problems. 811 00:39:03,550 --> 00:39:06,360 Now, the term bug has actually been around for decades. 812 00:39:06,360 --> 00:39:11,790 But a fun story to tell is that the first documented actual bug was 813 00:39:11,790 --> 00:39:13,650 actually somehow connected to Harvard. 814 00:39:13,650 --> 00:39:18,870 In fact, this is the logbook relating to the Harvard Mark II computer 815 00:39:18,870 --> 00:39:22,890 from 1947, whereby if you read the notes here-- and I'll Zoom in-- this 816 00:39:22,890 --> 00:39:27,630 was an actual moth discovered inside of this big mainframe computer that 817 00:39:27,630 --> 00:39:29,160 was causing some kind of problems. 818 00:39:29,160 --> 00:39:30,450 And the engineers there at the time actually 819 00:39:30,450 --> 00:39:33,610 thought it was funny that, wow, physical bug actually explains the issue. 820 00:39:33,610 --> 00:39:36,450 And it's been forever taped to this sheet of paper, which I believe 821 00:39:36,450 --> 00:39:39,090 now is on display in the Smithsonian. 822 00:39:39,090 --> 00:39:43,260 With that said, this is just representative, too, of a logical bug. 823 00:39:43,260 --> 00:39:45,390 And that story is actually-- 824 00:39:45,390 --> 00:39:49,170 that story was often retold by a famous mathematician, then computer scientist 825 00:39:49,170 --> 00:39:53,640 really, Dr. Grace Hopper, who actually worked not only on the Harvard Mark II 826 00:39:53,640 --> 00:39:57,210 computer, but its predecessor, the Harvard Mark I. 827 00:39:57,210 --> 00:40:01,020 And if you ever spent time, yet, in the engineering building across the river 828 00:40:01,020 --> 00:40:04,103 here, you can actually see much of this computer, which 829 00:40:04,103 --> 00:40:07,020 is along the wall when you first walk into the Science and Engineering 830 00:40:07,020 --> 00:40:07,530 Complex. 831 00:40:07,530 --> 00:40:09,530 And indeed, as you've probably heard growing up, 832 00:40:09,530 --> 00:40:11,070 this is a mainframe computer. 833 00:40:11,070 --> 00:40:15,210 This is what Macs and PCs, so to speak, looked like back in the day, 834 00:40:15,210 --> 00:40:18,240 with very physical things that essentially implemented the zeros 835 00:40:18,240 --> 00:40:21,900 and ones that you and I take for granted now being miniaturized in our laptops 836 00:40:21,900 --> 00:40:22,410 and phones. 837 00:40:22,410 --> 00:40:23,910 So there's a piece of history there. 838 00:40:23,910 --> 00:40:27,390 If you visit that side of campus sometime, do take a look. 839 00:40:27,390 --> 00:40:30,480 But let's consider, then, how we solve not, of course, physical bugs, 840 00:40:30,480 --> 00:40:31,350 but logical bugs. 841 00:40:31,350 --> 00:40:33,600 And let's consider something like this from last week, 842 00:40:33,600 --> 00:40:38,820 whereby, we were trying very simply to print like this column of three bricks 843 00:40:38,820 --> 00:40:40,320 using hashtags of sorts. 844 00:40:40,320 --> 00:40:44,400 So let me go over here in just a moment to VS Code. 845 00:40:44,400 --> 00:40:47,080 And I'm going to go ahead and open a program I wrote in advance. 846 00:40:47,080 --> 00:40:49,455 And I'm bringing it to class because there's a bug in it, 847 00:40:49,455 --> 00:40:51,510 and I'd like to figure out how to solve this bug. 848 00:40:51,510 --> 00:40:56,160 So let me open up a buggy0.c, which is version 0 of my code. 849 00:40:56,160 --> 00:40:58,200 And let's just take a quick peek at what's here. 850 00:40:58,200 --> 00:40:58,950 It's pretty short. 851 00:40:58,950 --> 00:41:03,750 It includes only stdio.h, it uses printf, it uses a for loop, 852 00:41:03,750 --> 00:41:07,797 and the goal, quite simply, is to print out that column of three bricks. 853 00:41:07,797 --> 00:41:11,130 Now, it's short enough that some of you, if you're getting comfy already with C, 854 00:41:11,130 --> 00:41:13,360 you might already see the logical bug. 855 00:41:13,360 --> 00:41:16,200 It's not a syntax error, like it will compile and run. 856 00:41:16,200 --> 00:41:17,280 But there's a bug there. 857 00:41:17,280 --> 00:41:22,320 And suppose that I'm very new to C, I'm very uncomfortable with C, it's 2:00 AM 858 00:41:22,320 --> 00:41:26,130 and I just can't see the bug, what are my recourses here for actually 859 00:41:26,130 --> 00:41:27,745 finding a mistake like this? 860 00:41:27,745 --> 00:41:29,370 Well, first, let's look at the symptom. 861 00:41:29,370 --> 00:41:31,740 Let me go down to my terminal window. 862 00:41:31,740 --> 00:41:36,120 I'm going to use make buggy0 because, again, the file is called buggyo.c. 863 00:41:36,120 --> 00:41:37,260 I'm not going to use clang. 864 00:41:37,260 --> 00:41:39,880 In fact, I'm never really going to use clang manually here on out. 865 00:41:39,880 --> 00:41:42,430 I'm just going to use make because it makes our lives easier. 866 00:41:42,430 --> 00:41:43,560 It does compile. 867 00:41:43,560 --> 00:41:45,390 No errors, so it's not syntax. 868 00:41:45,390 --> 00:41:47,670 It's not something silly like a missing semicolon. 869 00:41:47,670 --> 00:41:53,190 But when I run ./buggy0, I, of course, see one, two, three, four-- 870 00:41:53,190 --> 00:41:57,990 and this, of course, does not match the one, two, three bricks that I actually 871 00:41:57,990 --> 00:41:59,610 intended for that column. 872 00:41:59,610 --> 00:42:02,970 And yet, I'm starting counting at 0, as I usually do. 873 00:42:02,970 --> 00:42:03,930 I've got three. 874 00:42:03,930 --> 00:42:05,280 I'm going up to three. 875 00:42:05,280 --> 00:42:06,780 So where is my logical error? 876 00:42:06,780 --> 00:42:10,150 If it hasn't obviously jumped out at you already, well, how can I solve this? 877 00:42:10,150 --> 00:42:13,080 Well, first and foremost, perhaps the best technique 878 00:42:13,080 --> 00:42:16,080 for solving bugs, at least early on, is just use printf. 879 00:42:16,080 --> 00:42:20,020 Like thus far, we've used sprint say, Hello, and other things on the screen. 880 00:42:20,020 --> 00:42:22,530 But printf is just a function for printing anything. 881 00:42:22,530 --> 00:42:24,570 And there's no reason you can't temporarily 882 00:42:24,570 --> 00:42:27,900 use printf to print out the contents of variables, 883 00:42:27,900 --> 00:42:29,850 what's going on inside of your program, just 884 00:42:29,850 --> 00:42:31,350 to figure out where your mistake is. 885 00:42:31,350 --> 00:42:32,940 And then you can delete that line of code later. 886 00:42:32,940 --> 00:42:34,600 It doesn't have to stay there forever. 887 00:42:34,600 --> 00:42:35,740 So let me do this. 888 00:42:35,740 --> 00:42:39,450 Instead of just printing out in VS Code the hash symbol, 889 00:42:39,450 --> 00:42:45,690 let me do a little safety check here and print out the value of i. 890 00:42:45,690 --> 00:42:49,170 So let me go ahead and say something like, i is-- 891 00:42:49,170 --> 00:42:51,610 now I want to say i is this. 892 00:42:51,610 --> 00:42:54,540 But, of course, this is not how I print out the value of i. 893 00:42:54,540 --> 00:42:58,930 If I want to print out the value of i, what should I put here? 894 00:42:58,930 --> 00:43:02,160 So %i for integer, instead of %s for string. 895 00:43:02,160 --> 00:43:03,410 So they're still placeholders. 896 00:43:03,410 --> 00:43:04,930 But we use %s for integers. 897 00:43:04,930 --> 00:43:08,450 And now if I want to print out i, I just need the comma as the second argument, 898 00:43:08,450 --> 00:43:09,250 and then i. 899 00:43:09,250 --> 00:43:13,000 All right, let me go ahead and back to my terminal window. 900 00:43:13,000 --> 00:43:15,760 Let me recompile the program because I've changed it. 901 00:43:15,760 --> 00:43:18,880 That still works fine, ./buggy0. 902 00:43:18,880 --> 00:43:22,540 And now, let me increase the size of my terminal window here. 903 00:43:22,540 --> 00:43:25,510 You just see some diagnostic information, if you will. 904 00:43:25,510 --> 00:43:26,560 This is not the goal. 905 00:43:26,560 --> 00:43:29,393 This is not what you should be submitting for this homework problem, 906 00:43:29,393 --> 00:43:30,070 were it one. 907 00:43:30,070 --> 00:43:33,730 But it is helping us diagnostically know that, OK, when i is zero, 908 00:43:33,730 --> 00:43:34,450 here's a hash. 909 00:43:34,450 --> 00:43:36,182 When i is 1, here's a hash. 910 00:43:36,182 --> 00:43:37,390 When i is two, here's a hash. 911 00:43:37,390 --> 00:43:39,017 When i is 3, here's a hash. 912 00:43:39,017 --> 00:43:39,850 Well, wait a minute. 913 00:43:39,850 --> 00:43:41,530 That's one, two, three, four. 914 00:43:41,530 --> 00:43:44,360 So clearly, I'm printing it one too many times. 915 00:43:44,360 --> 00:43:48,130 So let me look back at the code here by shrinking my terminal window. 916 00:43:48,130 --> 00:43:53,080 And let me just ask the group, where is, in fact, the mistake? 917 00:43:53,080 --> 00:43:56,080 Or what, equivalently, would be the solution? 918 00:43:56,080 --> 00:43:57,561 Yeah, in the middle. 919 00:43:57,561 --> 00:44:00,020 AUDIENCE: [INAUDIBLE] 920 00:44:00,020 --> 00:44:03,550 DAVID MALAN: Yeah, instead of less than or equal to, use just less than. 921 00:44:03,550 --> 00:44:05,300 So you've got to kind of pick a lane here. 922 00:44:05,300 --> 00:44:08,630 If you're going to start counting from 0, you generally use less than, 923 00:44:08,630 --> 00:44:10,880 and go up to, but not through the value. 924 00:44:10,880 --> 00:44:13,970 Or if you prefer, like in the human world, counting from 1 on up, 925 00:44:13,970 --> 00:44:17,300 you can use less than or equal to, but you have to be consistent. 926 00:44:17,300 --> 00:44:19,790 And in general, as a programmer, just always start 927 00:44:19,790 --> 00:44:22,610 counting from 0 if you're doing something canonical like this. 928 00:44:22,610 --> 00:44:25,160 But the solution is, indeed, just to change this 929 00:44:25,160 --> 00:44:27,860 by changing the greater less than or equal to the less than. 930 00:44:27,860 --> 00:44:34,340 If I recompile this program with make buggy0, and then do .buggy0 again-- 931 00:44:34,340 --> 00:44:36,500 and let me increase the size of my terminal window. 932 00:44:36,500 --> 00:44:39,050 Now, you see, OK, almost the same output. 933 00:44:39,050 --> 00:44:44,330 But indeed, i starts at 0 and goes up to, but not through, three. 934 00:44:44,330 --> 00:44:48,920 All right, so printf, in short, can be your first diagnostic tool. 935 00:44:48,920 --> 00:44:51,500 Instead of just staring at the screen or raising your hand-- 936 00:44:51,500 --> 00:44:55,490 I mean, use printf to see, literally, what's going on inside of your program 937 00:44:55,490 --> 00:44:57,287 by just printing out things of interest. 938 00:44:57,287 --> 00:44:59,120 And then once you've solved the problem, you 939 00:44:59,120 --> 00:45:02,840 can go back into your code, as I'll do here, by shrinking my terminal window. 940 00:45:02,840 --> 00:45:04,610 I'll delete the printf line. 941 00:45:04,610 --> 00:45:07,100 And now I'm ready to share this program with the world 942 00:45:07,100 --> 00:45:08,870 or submit it as homework or the like. 943 00:45:08,870 --> 00:45:11,390 It's just meant there to be temporary. 944 00:45:11,390 --> 00:45:15,440 Any questions on printf as a debugging tool? 945 00:45:15,440 --> 00:45:18,010 946 00:45:18,010 --> 00:45:18,510 No? 947 00:45:18,510 --> 00:45:20,970 All right, well, that only gets us so far. 948 00:45:20,970 --> 00:45:23,430 And honestly, as your programs grow and grow and grow, 949 00:45:23,430 --> 00:45:25,180 it's going to actually get really annoying 950 00:45:25,180 --> 00:45:28,860 to start going in and adding printf's, then removing them, and figuring out, 951 00:45:28,860 --> 00:45:31,860 if you've got multiple printf's, well, which one printed what? 952 00:45:31,860 --> 00:45:34,560 It just gets messy, eventually, to rely on printf alone. 953 00:45:34,560 --> 00:45:37,740 So being a computer scientist, computer scientists 954 00:45:37,740 --> 00:45:41,040 have written software to make it easier to debug code. 955 00:45:41,040 --> 00:45:44,040 That software is what we would generally call a debugger, which 956 00:45:44,040 --> 00:45:47,040 would be the second tool of the trade that you can use to actually solve 957 00:45:47,040 --> 00:45:48,610 problems in your code. 958 00:45:48,610 --> 00:45:52,690 Now, in the world of VS code, there's actually a debugger built in. 959 00:45:52,690 --> 00:45:54,840 So the graphical user interface you're about to see 960 00:45:54,840 --> 00:45:58,260 in VS Code isn't specific to CS50, it actually comes with VS Code. 961 00:45:58,260 --> 00:46:01,230 And it supports C, and C++, and Java, and Python, 962 00:46:01,230 --> 00:46:03,030 and lots of other languages too. 963 00:46:03,030 --> 00:46:05,640 But it's, admittedly, a little complicated 964 00:46:05,640 --> 00:46:07,650 to just start using the debugger. 965 00:46:07,650 --> 00:46:10,200 You have to create a configuration file and do 966 00:46:10,200 --> 00:46:13,480 some annoying steps that just get in the way of solving real problems. 967 00:46:13,480 --> 00:46:17,070 So we have automated the process for you of just starting the debugger. 968 00:46:17,070 --> 00:46:19,680 And thereafter, it's sort of industry standard how you use it. 969 00:46:19,680 --> 00:46:23,380 But we save you the headache of having to create those configuration files. 970 00:46:23,380 --> 00:46:25,330 So, suppose I want to do this. 971 00:46:25,330 --> 00:46:27,600 Suppose I want to try to debug this program 972 00:46:27,600 --> 00:46:30,330 step by step using special software. 973 00:46:30,330 --> 00:46:31,810 Well, how can I do that? 974 00:46:31,810 --> 00:46:36,240 Well, let me propose that if I revert this back to the original version 975 00:46:36,240 --> 00:46:40,530 where i was less than or equal to 3, I'm pretty sure that I 976 00:46:40,530 --> 00:46:41,790 was printing too many hashes. 977 00:46:41,790 --> 00:46:43,350 So I'm going to do this-- and you might have done this 978 00:46:43,350 --> 00:46:45,160 accidentally or never at all. 979 00:46:45,160 --> 00:46:49,500 But notice if you hover over the gutter, so to speak, in VS Code, the part of it 980 00:46:49,500 --> 00:46:52,590 all the way to the left of the editor, you see this sort of grayed 981 00:46:52,590 --> 00:46:54,390 out red dot. 982 00:46:54,390 --> 00:46:57,240 If you click there, it becomes a brighter red dot. 983 00:46:57,240 --> 00:46:59,670 And this represents what we're going to call a breakpoint. 984 00:46:59,670 --> 00:47:03,090 And this is just a visual indicator that you've put like a stop sign equivalent 985 00:47:03,090 --> 00:47:06,270 there, and you're telling the debugger in a moment, stop 986 00:47:06,270 --> 00:47:07,350 running my code there. 987 00:47:07,350 --> 00:47:07,920 Why? 988 00:47:07,920 --> 00:47:11,610 Because I prefer to step through my code at sort of a human speed, 989 00:47:11,610 --> 00:47:14,380 and not as computer speed where it runs all at once. 990 00:47:14,380 --> 00:47:16,750 So I've set my breakpoint, which is step one. 991 00:47:16,750 --> 00:47:18,580 And then step two is quite simply this. 992 00:47:18,580 --> 00:47:23,190 Instead of running the program itself, run the command called debug50, 993 00:47:23,190 --> 00:47:26,010 and then ./buggy0. 994 00:47:26,010 --> 00:47:29,220 And now this will start your program, but inside 995 00:47:29,220 --> 00:47:31,200 of the debugger, which is a special program 996 00:47:31,200 --> 00:47:33,060 that smart people wrote that will empower 997 00:47:33,060 --> 00:47:38,190 you to now step through your code line by line, and again, at your own comfort 998 00:47:38,190 --> 00:47:38,970 pace. 999 00:47:38,970 --> 00:47:43,080 I'm going to hit Enter, some stuff's going to happen on the screen-- whoops. 1000 00:47:43,080 --> 00:47:45,767 Notice, this is a common mistake that I made accidentally here. 1001 00:47:45,767 --> 00:47:47,100 Looks like I've changed my code. 1002 00:47:47,100 --> 00:47:49,892 I did because I went in and changed the less than or equal to sign. 1003 00:47:49,892 --> 00:47:52,860 So let me go ahead and rerun make buggy0-- 1004 00:47:52,860 --> 00:47:53,520 Enter. 1005 00:47:53,520 --> 00:47:55,590 Good, now let me rerun debug50-- 1006 00:47:55,590 --> 00:47:57,810 Enter. 1007 00:47:57,810 --> 00:47:59,760 And now some stuff just happened on the screen 1008 00:47:59,760 --> 00:48:03,270 and it takes a moment to get started but once it's started you'll 1009 00:48:03,270 --> 00:48:06,010 see this you'll still see your code. 1010 00:48:06,010 --> 00:48:09,410 But you'll see this yellow highlight, which you've probably not seen before. 1011 00:48:09,410 --> 00:48:11,910 And notice that it's specifically highlighting the same line 1012 00:48:11,910 --> 00:48:13,440 that I set a breakpoint on. 1013 00:48:13,440 --> 00:48:13,950 Why? 1014 00:48:13,950 --> 00:48:18,870 That just means the debugger has executed all of these lines, 1015 00:48:18,870 --> 00:48:20,670 except for line 7. 1016 00:48:20,670 --> 00:48:23,340 It has broken at-- not in a bad way. 1017 00:48:23,340 --> 00:48:27,580 But it has paused execution on line 7, so it hasn't yet printed any hashes. 1018 00:48:27,580 --> 00:48:30,450 And you can see that-- no hashes in the terminal window yet. 1019 00:48:30,450 --> 00:48:31,980 It's paused execution. 1020 00:48:31,980 --> 00:48:35,190 But what's interesting with the debugger is the stuff 1021 00:48:35,190 --> 00:48:37,410 over here on the left-hand side. 1022 00:48:37,410 --> 00:48:39,960 In the debugger here, you'll see, under variables, 1023 00:48:39,960 --> 00:48:41,910 all of your so-called local variables. 1024 00:48:41,910 --> 00:48:44,160 And we haven't really made a distinction between local 1025 00:48:44,160 --> 00:48:45,327 and something called global. 1026 00:48:45,327 --> 00:48:48,000 But for now, local variables just means all of the variables 1027 00:48:48,000 --> 00:48:49,390 that exist in your function. 1028 00:48:49,390 --> 00:48:52,110 So i currently has a value of 0. 1029 00:48:52,110 --> 00:48:53,410 OK, and that makes sense. 1030 00:48:53,410 --> 00:48:57,360 So now, how do I step through my code and see what it's doing? 1031 00:48:57,360 --> 00:48:59,610 Well, at the top of the screen here, you'll 1032 00:48:59,610 --> 00:49:02,250 see some playback icons, kind of like a video player, 1033 00:49:02,250 --> 00:49:03,630 but they have special meaning. 1034 00:49:03,630 --> 00:49:07,892 This first one will just play the rest of your program all the way to the end. 1035 00:49:07,892 --> 00:49:10,350 So you only click that if you've sort of solved the problem 1036 00:49:10,350 --> 00:49:13,110 and you just want to run it to completion like before. 1037 00:49:13,110 --> 00:49:14,370 But the next three-- 1038 00:49:14,370 --> 00:49:16,920 or next two, really, are really the juiciest. 1039 00:49:16,920 --> 00:49:19,710 The second one here, if you hover over it, eventually, 1040 00:49:19,710 --> 00:49:21,930 you'll see that it's called Step Over. 1041 00:49:21,930 --> 00:49:25,170 Step Over means that the debugger will run 1042 00:49:25,170 --> 00:49:28,630 this currently highlighted line of code, but it's not going to dive into it. 1043 00:49:28,630 --> 00:49:30,660 So if it's a function like printf, it's not 1044 00:49:30,660 --> 00:49:32,827 going to start stepping through printf line by line. 1045 00:49:32,827 --> 00:49:33,327 Why? 1046 00:49:33,327 --> 00:49:36,420 Because I can pretty much assume printf, written decades ago, is correct. 1047 00:49:36,420 --> 00:49:38,050 Problem's probably with me. 1048 00:49:38,050 --> 00:49:42,690 But this next line, if I did really want to step into the printf code 1049 00:49:42,690 --> 00:49:46,110 to figure out how it works or find some problem in it all these years later, 1050 00:49:46,110 --> 00:49:48,810 you can step into printf, and then the screen would change, 1051 00:49:48,810 --> 00:49:50,910 and you'd see each of the lines for printf, 1052 00:49:50,910 --> 00:49:54,250 line by line-- at least if you have the source code for printf installed. 1053 00:49:54,250 --> 00:49:56,490 All right, I'm going to use the first one, Step Over. 1054 00:49:56,490 --> 00:49:59,130 And watch as the yellow highlight moves. 1055 00:49:59,130 --> 00:50:03,060 And watch as, in the terminal window, there's a hash symbol. 1056 00:50:03,060 --> 00:50:03,780 Here we go. 1057 00:50:03,780 --> 00:50:05,130 There's one hash. 1058 00:50:05,130 --> 00:50:07,230 Now, notice line 5 is highlighted. 1059 00:50:07,230 --> 00:50:09,480 That means it has paused on line 5. 1060 00:50:09,480 --> 00:50:11,350 Line 5 has not yet been executed. 1061 00:50:11,350 --> 00:50:12,600 So what does that mean? 1062 00:50:12,600 --> 00:50:16,320 The value of i, per the top left-hand corner, is still 0. 1063 00:50:16,320 --> 00:50:18,920 But as soon as I click Step Over again, watch 1064 00:50:18,920 --> 00:50:24,470 what happens at the top left, where i is a variable on the screen. 1065 00:50:24,470 --> 00:50:26,420 Now i-- and it flashed briefly-- 1066 00:50:26,420 --> 00:50:27,920 has a value of 1. 1067 00:50:27,920 --> 00:50:30,650 And now if I step over again, watch the terminal window. 1068 00:50:30,650 --> 00:50:32,120 There's my second hash. 1069 00:50:32,120 --> 00:50:36,380 Now, let me click Step Over on for loop, watch the variable at top left. 1070 00:50:36,380 --> 00:50:38,567 Now 1 goes to 2. 1071 00:50:38,567 --> 00:50:39,650 Now let me click it again. 1072 00:50:39,650 --> 00:50:43,220 Third hash-- and here's where the logical error is perhaps revealed. 1073 00:50:43,220 --> 00:50:45,210 Let me go ahead and step over the loop. 1074 00:50:45,210 --> 00:50:46,520 Now i is 3. 1075 00:50:46,520 --> 00:50:49,280 Wait a minute, I'm still going to print out a hash. 1076 00:50:49,280 --> 00:50:49,810 There it is. 1077 00:50:49,810 --> 00:50:50,810 There's the fourth hash. 1078 00:50:50,810 --> 00:50:53,852 And at this point, hopefully, the light bulb, proverbially, has gone off. 1079 00:50:53,852 --> 00:50:55,020 I realize, oh, I screwed up. 1080 00:50:55,020 --> 00:50:58,580 I can either stop the program altogether with the red square, 1081 00:50:58,580 --> 00:51:01,100 or I can just let it run all the way to the end, which 1082 00:51:01,100 --> 00:51:02,493 just terminates everything. 1083 00:51:02,493 --> 00:51:05,660 At this point, I just want to get back into my code and start fixing things. 1084 00:51:05,660 --> 00:51:07,700 And you can close, for instance, as I will here, 1085 00:51:07,700 --> 00:51:10,670 the File Explorer, just to hide the panel that opened. 1086 00:51:10,670 --> 00:51:12,320 So that's debug50. 1087 00:51:12,320 --> 00:51:15,920 But it's not a CS50 thing, that just starts the debugger for you, which 1088 00:51:15,920 --> 00:51:19,520 is something you'd find in most any programming environment nowadays. 1089 00:51:19,520 --> 00:51:23,670 Questions on debugging? 1090 00:51:23,670 --> 00:51:24,170 Questions? 1091 00:51:24,170 --> 00:51:24,670 Yeah? 1092 00:51:24,670 --> 00:51:27,295 AUDIENCE: Where does it tell you where it went wrong? 1093 00:51:27,295 --> 00:51:28,420 DAVID MALAN: Good question. 1094 00:51:28,420 --> 00:51:30,310 Where does it tell you where it went wrong? 1095 00:51:30,310 --> 00:51:33,190 So, sadly, it does not tell you any of that. 1096 00:51:33,190 --> 00:51:37,570 The onus is still on you, the human, to use this tool productively to walk 1097 00:51:37,570 --> 00:51:39,580 through your code at a saner pace. 1098 00:51:39,580 --> 00:51:42,070 But your brain is the one that still needs to solve it. 1099 00:51:42,070 --> 00:51:45,190 And I don't doubt, down the line, with artificial intelligence and more, 1100 00:51:45,190 --> 00:51:47,350 programs like this will get all the more helpful, 1101 00:51:47,350 --> 00:51:49,160 and start answering questions like that for us. 1102 00:51:49,160 --> 00:51:51,340 And there are other tools we'll introduce you this semester 1103 00:51:51,340 --> 00:51:52,990 that are even more powerful than this. 1104 00:51:52,990 --> 00:51:56,770 But for now, it's just a tool, really, to slow things down and not 1105 00:51:56,770 --> 00:51:57,820 have to change your code. 1106 00:51:57,820 --> 00:52:01,420 The fact that I had that panel on the left that just showed me i's changing 1107 00:52:01,420 --> 00:52:04,150 value is just an alternative to printf, and I can 1108 00:52:04,150 --> 00:52:06,820 step through it a little more slowly. 1109 00:52:06,820 --> 00:52:10,580 Other questions on debugging? 1110 00:52:10,580 --> 00:52:11,080 No? 1111 00:52:11,080 --> 00:52:14,950 Let me show you one final example with this debugger here. 1112 00:52:14,950 --> 00:52:16,750 And this one, too, I wrote in advance. 1113 00:52:16,750 --> 00:52:18,730 Let me close buggy0.c. 1114 00:52:18,730 --> 00:52:22,327 And let me open up buggy1.c, my second version thereof. 1115 00:52:22,327 --> 00:52:24,160 Let me close my terminal window for a second 1116 00:52:24,160 --> 00:52:26,350 and give you a quick tour of this program, which 1117 00:52:26,350 --> 00:52:28,030 similarly, has a mistake. 1118 00:52:28,030 --> 00:52:32,830 Now, at the top of this program, some familiar includes, cs50.h and stdio.h. 1119 00:52:32,830 --> 00:52:34,730 This is not something we've seen before. 1120 00:52:34,730 --> 00:52:36,190 It's specific to this example-- 1121 00:52:36,190 --> 00:52:38,830 a function called getNegativeInt. 1122 00:52:38,830 --> 00:52:41,043 Takes no arguments, and it returns an integer. 1123 00:52:41,043 --> 00:52:41,710 What does it do? 1124 00:52:41,710 --> 00:52:45,040 It literally gets a negative integer, ideally, from the user. 1125 00:52:45,040 --> 00:52:47,200 Fun fact, though, it doesn't correctly. 1126 00:52:47,200 --> 00:52:50,090 That's the bug. getNegativeInt is broken at the moment. 1127 00:52:50,090 --> 00:52:51,470 So what does main do? 1128 00:52:51,470 --> 00:52:54,130 Well, main just calls this function, passing in nothing 1129 00:52:54,130 --> 00:52:55,690 in parentheses, no inputs. 1130 00:52:55,690 --> 00:52:58,240 And it stores the return value in i. 1131 00:52:58,240 --> 00:53:00,260 And then it just prints out i on the screen. 1132 00:53:00,260 --> 00:53:03,910 So honestly, just by eyeballing this, I feel comfortable enough 1133 00:53:03,910 --> 00:53:06,365 with programming in C, I think main is correct. 1134 00:53:06,365 --> 00:53:07,990 Let me just stipulate, main is correct. 1135 00:53:07,990 --> 00:53:09,698 But there is going to be a bug down here. 1136 00:53:09,698 --> 00:53:11,210 Now, what's the bug down here? 1137 00:53:11,210 --> 00:53:14,830 Well, let me look at getNegativeInt's implementation. 1138 00:53:14,830 --> 00:53:18,970 Notice, this first line, 12, is identical to the prototype up here. 1139 00:53:18,970 --> 00:53:22,690 The prototype is sort of stupidly required up here 1140 00:53:22,690 --> 00:53:25,300 because C reads things top to bottom, left to right-- 1141 00:53:25,300 --> 00:53:26,690 the compiler technically does. 1142 00:53:26,690 --> 00:53:29,680 So if you reference getNegativeInt here, but you 1143 00:53:29,680 --> 00:53:33,490 don't implement it until down here, and you haven't told C in advance 1144 00:53:33,490 --> 00:53:36,820 that it will exist, again, you get the error we saw last week. 1145 00:53:36,820 --> 00:53:39,010 All right, so how does getNegativeInt work? 1146 00:53:39,010 --> 00:53:40,960 We declare a variable called n. 1147 00:53:40,960 --> 00:53:43,540 We've got to do while loop that does what? 1148 00:53:43,540 --> 00:53:47,110 It uses getInt, which comes with the cs50 library, per last week. 1149 00:53:47,110 --> 00:53:49,480 It prompts the user for negative integer, quote unquote, 1150 00:53:49,480 --> 00:53:51,670 and stores the value in n. 1151 00:53:51,670 --> 00:53:56,800 I then do all of this while n is less than 0, right? 1152 00:53:56,800 --> 00:54:00,400 Remember, we used to do while loop last week to make sure the human cooperates 1153 00:54:00,400 --> 00:54:03,970 and doesn't give us the wrong type of value, be it positive or negative 1154 00:54:03,970 --> 00:54:04,970 or something else. 1155 00:54:04,970 --> 00:54:06,400 And then we return n. 1156 00:54:06,400 --> 00:54:07,570 And there's some subtleties. 1157 00:54:07,570 --> 00:54:12,970 Anyone recall-- or have an intuition for why I've declared n on line 14, 1158 00:54:12,970 --> 00:54:15,790 instead of line 17? 1159 00:54:15,790 --> 00:54:17,620 This is a C specific thing. 1160 00:54:17,620 --> 00:54:23,465 AUDIENCE: [INAUDIBLE] 1161 00:54:23,465 --> 00:54:24,340 DAVID MALAN: Exactly. 1162 00:54:24,340 --> 00:54:27,610 There's this notion of scope in C. And we'll continue to see this over time, 1163 00:54:27,610 --> 00:54:32,590 whereby, a variable only exists inside of the most recent curly braces 1164 00:54:32,590 --> 00:54:33,560 that you've opened. 1165 00:54:33,560 --> 00:54:36,910 So if I've declared n here on line 14, I can use it 1166 00:54:36,910 --> 00:54:40,900 anywhere between lines 13 and 21 because those are the nearest curly braces. 1167 00:54:40,900 --> 00:54:43,540 If by contrast, as you note, if I instead said this, 1168 00:54:43,540 --> 00:54:49,180 int n equals getInt and so forth, and didn't have the current line 14, 1169 00:54:49,180 --> 00:54:53,470 well, n would exist inside of these curly braces, but not here, which 1170 00:54:53,470 --> 00:54:55,340 is too late, and definitely not here. 1171 00:54:55,340 --> 00:54:59,480 So you just have to declare it first, and then use and reuse it as such. 1172 00:54:59,480 --> 00:55:01,545 Now, let me just show you how I can debug this. 1173 00:55:01,545 --> 00:55:03,170 But let me show you the symptoms first. 1174 00:55:03,170 --> 00:55:04,930 Let me open my terminal window. 1175 00:55:04,930 --> 00:55:06,970 Let me run make buggy1. 1176 00:55:06,970 --> 00:55:11,710 Compiles OK, so it's not something silly like a semicolon. ./buggy1, 1177 00:55:11,710 --> 00:55:13,660 and I'm asked for a negative integer. 1178 00:55:13,660 --> 00:55:15,280 All right, let me give it negative 1-- 1179 00:55:15,280 --> 00:55:16,710 Enter. 1180 00:55:16,710 --> 00:55:19,920 Well, the main function is supposed to print out what I typed, 1181 00:55:19,920 --> 00:55:20,880 but it clearly didn't. 1182 00:55:20,880 --> 00:55:21,880 It's prompting me again. 1183 00:55:21,880 --> 00:55:23,830 All right, so maybe it'll like negative 2. 1184 00:55:23,830 --> 00:55:24,330 No? 1185 00:55:24,330 --> 00:55:26,380 Maybe negative 3. 1186 00:55:26,380 --> 00:55:27,570 50? 1187 00:55:27,570 --> 00:55:29,160 OK, so it's definitely broken, right? 1188 00:55:29,160 --> 00:55:31,528 It kind of seems logically to be doing the opposite. 1189 00:55:31,528 --> 00:55:33,820 Now, you can perhaps see why this is happening already. 1190 00:55:33,820 --> 00:55:37,170 These are deliberately simple programs for demonstrations sake. 1191 00:55:37,170 --> 00:55:38,470 But let's do this. 1192 00:55:38,470 --> 00:55:41,037 Let me go ahead and set a breakpoint in main, 1193 00:55:41,037 --> 00:55:42,870 even though I'm pretty sure main is correct. 1194 00:55:42,870 --> 00:55:45,810 But it just helps me start my thought process-- start with main, 1195 00:55:45,810 --> 00:55:47,010 and then take it from there. 1196 00:55:47,010 --> 00:55:51,840 Let me run now, debug50 ./buggy1-- 1197 00:55:51,840 --> 00:55:52,920 Enter. 1198 00:55:52,920 --> 00:55:53,700 And let's see. 1199 00:55:53,700 --> 00:55:56,880 With that breakpoint now, the GUI is going to reconfigure itself. 1200 00:55:56,880 --> 00:56:00,360 It's going to pause on line 8 because that's the first interesting line 1201 00:56:00,360 --> 00:56:01,260 inside of main. 1202 00:56:01,260 --> 00:56:03,780 So I could have just put the breakpoint on line 8 too. 1203 00:56:03,780 --> 00:56:06,480 It's smart enough to know that if I set it on 6, 1204 00:56:06,480 --> 00:56:09,570 you really mean line 8 because that's the first actual line of code. 1205 00:56:09,570 --> 00:56:11,280 And watch, now, what happens. 1206 00:56:11,280 --> 00:56:15,780 If I step over this line, notice that i, which at the moment 1207 00:56:15,780 --> 00:56:18,090 seems to have a default value of 0-- 1208 00:56:18,090 --> 00:56:19,470 more on that another time. 1209 00:56:19,470 --> 00:56:24,750 But if I click Step Over like before, I'm prompted for a negative integer. 1210 00:56:24,750 --> 00:56:25,750 Let me type negative 1-- 1211 00:56:25,750 --> 00:56:27,300 Enter. 1212 00:56:27,300 --> 00:56:32,470 And now, notice, there's no additional yellow highlight. 1213 00:56:32,470 --> 00:56:32,970 Why? 1214 00:56:32,970 --> 00:56:35,160 Where am I currently stuck, logically? 1215 00:56:35,160 --> 00:56:37,937 AUDIENCE: [INAUDIBLE] 1216 00:56:37,937 --> 00:56:40,770 DAVID MALAN: Yeah, just logically, I must be in that do, while loop. 1217 00:56:40,770 --> 00:56:43,560 And even if you don't understand it, like that's the only explanation. 1218 00:56:43,560 --> 00:56:46,143 If you keep getting prompted, surely, there's a loop going on. 1219 00:56:46,143 --> 00:56:49,270 There's only one loop in my code, so there's probably a problem there. 1220 00:56:49,270 --> 00:56:52,900 So I can't just set a breakpoint in main, and then wait for this to work. 1221 00:56:52,900 --> 00:56:53,610 So let me just-- 1222 00:56:53,610 --> 00:56:56,280 let me stop this with the red square. 1223 00:56:56,280 --> 00:56:58,860 And let me think, all right, instead of-- 1224 00:56:58,860 --> 00:57:02,770 I can still set my breakpoint in main, but let me rerun the debugger instead. 1225 00:57:02,770 --> 00:57:05,470 And this time, not step over that line of code, 1226 00:57:05,470 --> 00:57:07,930 let me step into that line of code. 1227 00:57:07,930 --> 00:57:09,270 So watch what happens now. 1228 00:57:09,270 --> 00:57:11,430 Instead of clicking the second icon here, 1229 00:57:11,430 --> 00:57:14,610 let me click the third, whose name is, indeed, Step Into. 1230 00:57:14,610 --> 00:57:17,880 And watch as the yellow highlight does not move to line 9. 1231 00:57:17,880 --> 00:57:21,930 It dives into line 8-- the function on line 8, 1232 00:57:21,930 --> 00:57:25,170 thereby, bringing me down to line 17. 1233 00:57:25,170 --> 00:57:28,270 It's kind of going down into that next function. 1234 00:57:28,270 --> 00:57:31,422 Now, it didn't bother pausing on line 12 or 13 or 14 1235 00:57:31,422 --> 00:57:34,380 because there's nothing intellectually interesting there happening yet. 1236 00:57:34,380 --> 00:57:37,080 The juicy part really starts, it would seem, in line 17. 1237 00:57:37,080 --> 00:57:40,980 So, now notice, n is my variable at the top left. 1238 00:57:40,980 --> 00:57:42,270 If I click-- 1239 00:57:42,270 --> 00:57:45,420 I don't want to click Step Into now, though. 1240 00:57:45,420 --> 00:57:48,090 What would go wrong if I click on Step Into-- 1241 00:57:48,090 --> 00:57:52,480 or what would it do that I don't think I want to do? 1242 00:57:52,480 --> 00:57:52,990 Yeah? 1243 00:57:52,990 --> 00:57:54,755 AUDIENCE: [INAUDIBLE] 1244 00:57:54,755 --> 00:57:56,630 DAVID MALAN: Yeah, it would step into getInt. 1245 00:57:56,630 --> 00:57:59,620 But I'd like to think that the staff's version of getInt is correct, 1246 00:57:59,620 --> 00:58:02,120 and that's not our problem today, so I want to step over it. 1247 00:58:02,120 --> 00:58:06,710 And watch now at top left that nothing happens yet to the value of n 1248 00:58:06,710 --> 00:58:09,530 until I go to the terminal window now, and I type in something 1249 00:58:09,530 --> 00:58:10,670 like negative 1. 1250 00:58:10,670 --> 00:58:14,600 Now notice, it jumps to line 19, which is the next interesting line. 1251 00:58:14,600 --> 00:58:17,240 Top left, n, indeed, is negative 1. 1252 00:58:17,240 --> 00:58:19,160 And here's where I can now pause as a human 1253 00:58:19,160 --> 00:58:22,760 and think, all right, so while n is less than 0. 1254 00:58:22,760 --> 00:58:25,280 All right, n, per the top left corner, is negative 1. 1255 00:58:25,280 --> 00:58:27,830 So all right, while negative 1 is less than 0, 1256 00:58:27,830 --> 00:58:29,780 well, obviously that's true mathematically. 1257 00:58:29,780 --> 00:58:30,930 So what's going to happen? 1258 00:58:30,930 --> 00:58:32,130 It's a do while loop. 1259 00:58:32,130 --> 00:58:37,285 So when I click on Step Over again, it's going to go to this line 1260 00:58:37,285 --> 00:58:39,410 because it's at the end of the inside of that loop. 1261 00:58:39,410 --> 00:58:42,710 And now here, it's looping through again and again. 1262 00:58:42,710 --> 00:58:44,240 All right, let me do this once more. 1263 00:58:44,240 --> 00:58:45,980 I'm going to step over, all right? 1264 00:58:45,980 --> 00:58:48,777 I'm going to type in negative 2, and it's the exact same thing. 1265 00:58:48,777 --> 00:58:50,360 Now is my chance, on the yellow line-- 1266 00:58:50,360 --> 00:58:51,260 OK, wait a minute. 1267 00:58:51,260 --> 00:58:53,450 Negative 2 is obviously less than 0. 1268 00:58:53,450 --> 00:58:56,080 Let me try this one more time. 1269 00:58:56,080 --> 00:58:57,570 Click it once here. 1270 00:58:57,570 --> 00:58:59,040 All right, let me give it 50. 1271 00:58:59,040 --> 00:59:05,020 And now, OK, while 50 is less than 0, that's not true, 1272 00:59:05,020 --> 00:59:08,970 so the loop is over because it's not going to do it while 50 is less than 0. 1273 00:59:08,970 --> 00:59:09,730 That's not true. 1274 00:59:09,730 --> 00:59:12,240 So now watch, when I click Step Over once more, 1275 00:59:12,240 --> 00:59:15,810 it then finishes the loop, even though there's nothing more to do. 1276 00:59:15,810 --> 00:59:17,610 It's now about to return n. 1277 00:59:17,610 --> 00:59:21,360 It jumps back up to main, where I left off on line 9. 1278 00:59:21,360 --> 00:59:23,778 It now prints, in my terminal window, the number 50. 1279 00:59:23,778 --> 00:59:26,070 And hopefully, at this point, to your question earlier, 1280 00:59:26,070 --> 00:59:30,700 my human brain has realized, oh, I'm an idiot, like I flipped my sign there. 1281 00:59:30,700 --> 00:59:32,460 So I probably-- let me stop this. 1282 00:59:32,460 --> 00:59:34,780 I probably want to do something like this. 1283 00:59:34,780 --> 00:59:38,860 If the goal is to get a negative integer, I probably want to say, 1284 00:59:38,860 --> 00:59:45,070 while n is, for instance, greater than or equal to 0 would work. 1285 00:59:45,070 --> 00:59:48,630 So while n is greater than or equal to 0, keep doing this. 1286 00:59:48,630 --> 00:59:50,430 And that's the logic I wanted to express. 1287 00:59:50,430 --> 00:59:53,733 So the debugger just saves me from staring at the screen, raising a hand, 1288 00:59:53,733 --> 00:59:54,900 sort of asking someone else. 1289 00:59:54,900 --> 00:59:58,650 At least in this case, it allows me to go through it at a healthier pace. 1290 00:59:58,650 --> 01:00:03,000 Questions now on debug50, which should be your new friend, even if it's not 1291 01:00:03,000 --> 01:00:04,940 your first instinct after printf? 1292 01:00:04,940 --> 01:00:07,690 1293 01:00:07,690 --> 01:00:09,190 Any questions on debug50? 1294 01:00:09,190 --> 01:00:09,730 No? 1295 01:00:09,730 --> 01:00:13,960 All right, well, there's one last technique we can equip you with here. 1296 01:00:13,960 --> 01:00:17,470 And that is, in addition to printf and a debugger, no joke, 1297 01:00:17,470 --> 01:00:21,400 a rubber duck is actually a reasonably recommended solution 1298 01:00:21,400 --> 01:00:22,720 to finding bugs in your code. 1299 01:00:22,720 --> 01:00:24,640 To your question earlier, the duck two is not 1300 01:00:24,640 --> 01:00:26,390 going to solve the problem for you. 1301 01:00:26,390 --> 01:00:29,710 But if you've wondered why this little guy has been here for so long, 1302 01:00:29,710 --> 01:00:32,080 there's this technique, has its own Wikipedia article 1303 01:00:32,080 --> 01:00:33,760 of called rubber duck debugging. 1304 01:00:33,760 --> 01:00:37,390 The idea of which is that if you're home in your dorm room, 1305 01:00:37,390 --> 01:00:39,520 wrestling with some bug in your code, printf 1306 01:00:39,520 --> 01:00:42,820 didn't quite reveal the source to you, debugger isn't really helping, 1307 01:00:42,820 --> 01:00:46,960 honestly, maybe it would help to just sound out what problem you're having. 1308 01:00:46,960 --> 01:00:50,260 Similar to going to office hours, talking to a TA or a professor, 1309 01:00:50,260 --> 01:00:52,030 just walking through your problems because 1310 01:00:52,030 --> 01:00:54,730 in sort of talking to the duck about the fact 1311 01:00:54,730 --> 01:01:00,550 that you're doing this while n is less than 0, and then if it is-- 1312 01:01:00,550 --> 01:01:01,180 wait a minute. 1313 01:01:01,180 --> 01:01:03,820 I'm an idiot, not just for talking to the rubber duck. 1314 01:01:03,820 --> 01:01:05,980 You realize, hopefully, in expressing yourself, 1315 01:01:05,980 --> 01:01:09,910 literally verbally, you probably will hear with non-zero probability, 1316 01:01:09,910 --> 01:01:11,860 like some illogic in your statement. 1317 01:01:11,860 --> 01:01:16,430 And just by sounding things out, you'll realize like, oh, that's my problem. 1318 01:01:16,430 --> 01:01:19,720 And so, frankly, if you have roommates, you can also use a roommate for this. 1319 01:01:19,720 --> 01:01:21,700 But the rubber duck is just sort of a go-to 1320 01:01:21,700 --> 01:01:24,700 when your roommates have no interest in your C problem set, 1321 01:01:24,700 --> 01:01:28,150 talking something through that as such. 1322 01:01:28,150 --> 01:01:29,933 And this is an invaluable technique. 1323 01:01:29,933 --> 01:01:32,350 I admittedly tend not to do it so much with a rubber duck, 1324 01:01:32,350 --> 01:01:34,510 but ideally with colleagues, human colleagues. 1325 01:01:34,510 --> 01:01:38,260 But just talking through things often will help you just realize, 1326 01:01:38,260 --> 01:01:40,360 oh, I said something illogical. 1327 01:01:40,360 --> 01:01:41,860 Now I can go back to the code. 1328 01:01:41,860 --> 01:01:44,650 So don't solve problems by staring at your screen 1329 01:01:44,650 --> 01:01:46,240 endlessly for minutes, for hours. 1330 01:01:46,240 --> 01:01:48,100 At that point, it's time for a break, time 1331 01:01:48,100 --> 01:01:50,475 to walk away, time to talk to the duck, if you've already 1332 01:01:50,475 --> 01:01:52,900 exhausted some of those other tools. 1333 01:01:52,900 --> 01:01:55,330 As an aside, on your way out today at the end of class, 1334 01:01:55,330 --> 01:01:59,020 we have, clearly, plenty of rubber ducks for you. 1335 01:01:59,020 --> 01:02:01,600 And it's become a thing over the years, at least 1336 01:02:01,600 --> 01:02:05,770 among some, to bring the duck with them when they travel and send us photos. 1337 01:02:05,770 --> 01:02:10,480 Here, for instance, is CS50's rubber duck debugger, A.K.A. DDB, 1338 01:02:10,480 --> 01:02:15,940 for Duck Debugger, which is a pun on a geekier program called GDB, the GNU 1339 01:02:15,940 --> 01:02:18,740 Debugger, which is an actual piece of software for debugging. 1340 01:02:18,740 --> 01:02:25,270 This is CS50's debugger in the hills of Puerto Rico, also, here on the sea. 1341 01:02:25,270 --> 01:02:28,310 He made its way to San Francisco here. 1342 01:02:28,310 --> 01:02:30,640 Also, down by Fisherman's Wharf by the sea lions. 1343 01:02:30,640 --> 01:02:31,660 Familiar? 1344 01:02:31,660 --> 01:02:34,570 Here at Stanford, where there's a William Gates Computer Science 1345 01:02:34,570 --> 01:02:38,950 building for computer science, down the road in SF at Google. 1346 01:02:38,950 --> 01:02:41,650 And this is the Trevi Fountain in Rome. 1347 01:02:41,650 --> 01:02:43,810 And lastly, the Colosseum. 1348 01:02:43,810 --> 01:02:46,990 So we'll be curious to see in the coming years where your duck two travels. 1349 01:02:46,990 --> 01:02:49,120 So that, then, was quite a bit. 1350 01:02:49,120 --> 01:02:51,850 Why don't we go ahead here and take a short 5 minute break? 1351 01:02:51,850 --> 01:02:52,760 No snacks yet. 1352 01:02:52,760 --> 01:02:54,400 You're welcome to get up or sit down. 1353 01:02:54,400 --> 01:02:56,620 We'll return in about five. 1354 01:02:56,620 --> 01:03:00,020 All right, so we are back. 1355 01:03:00,020 --> 01:03:04,000 And if the goal, ultimately, today is to have a better understanding of things 1356 01:03:04,000 --> 01:03:06,940 like strings so that we can solve problems with text, 1357 01:03:06,940 --> 01:03:09,190 let's consider some simpler types of data 1358 01:03:09,190 --> 01:03:11,290 first, how we might represent those, and then 1359 01:03:11,290 --> 01:03:14,290 see if that doesn't lead us to a discovery as to how strings, 1360 01:03:14,290 --> 01:03:17,330 and just today's modern software is using things like that. 1361 01:03:17,330 --> 01:03:21,850 So when we talked on week zero about representation of data, 1362 01:03:21,850 --> 01:03:25,930 we had different ways of doing it, in terms of binary and decimal, 1363 01:03:25,930 --> 01:03:27,640 and unary even. 1364 01:03:27,640 --> 01:03:30,520 When we started talking about the same last week in code, 1365 01:03:30,520 --> 01:03:33,980 we started talking about data types instead. 1366 01:03:33,980 --> 01:03:36,820 And these data types were a way of telling 1367 01:03:36,820 --> 01:03:40,000 the computer, like do you want an integer, do you want a character, 1368 01:03:40,000 --> 01:03:44,260 do you want a floating point value, like a real number, or even a string, 1369 01:03:44,260 --> 01:03:45,070 as we've seen? 1370 01:03:45,070 --> 01:03:47,350 But it turns out that computers, of course, 1371 01:03:47,350 --> 01:03:49,930 only have finite amounts of resources. 1372 01:03:49,930 --> 01:03:53,740 Your computer only has a fixed amount of memory or RAM. 1373 01:03:53,740 --> 01:03:55,910 And that actually has very real world implications. 1374 01:03:55,910 --> 01:03:59,630 So for instance, here are some of the data types we've seen thus far. 1375 01:03:59,630 --> 01:04:04,090 And it turns out that each of these in C has a specific number 1376 01:04:04,090 --> 01:04:05,650 of bits allocated to it. 1377 01:04:05,650 --> 01:04:08,350 Now, admittedly, this can vary by system. 1378 01:04:08,350 --> 01:04:10,850 It's not so much the case nowadays, but for many years, 1379 01:04:10,850 --> 01:04:13,100 for decades, computers were getting better and better. 1380 01:04:13,100 --> 01:04:15,392 The earliest computers might have used fewer bits 1381 01:04:15,392 --> 01:04:16,600 for some of these data types. 1382 01:04:16,600 --> 01:04:18,663 More modern computers might use more bits. 1383 01:04:18,663 --> 01:04:21,830 So the numbers you're about to see are pretty much where we are present day. 1384 01:04:21,830 --> 01:04:25,030 So when it comes to these data types, a bool, 1385 01:04:25,030 --> 01:04:29,020 which is true or false, somewhat curiously, uses a whole byte, 1386 01:04:29,020 --> 01:04:32,380 even though that's way overkill because for a bool, true or false, 1387 01:04:32,380 --> 01:04:33,940 you, of course, only need one bit. 1388 01:04:33,940 --> 01:04:36,520 But it turns out, even though it's wasteful to use 1389 01:04:36,520 --> 01:04:39,938 eight bits, or one byte, just to represent true or false, 1390 01:04:39,938 --> 01:04:41,230 it's just easier for computers. 1391 01:04:41,230 --> 01:04:42,820 So a bool tends to be one byte. 1392 01:04:42,820 --> 01:04:47,590 An int, which we've been using a lot, uses 4 bytes, typically, or 32 bits. 1393 01:04:47,590 --> 01:04:50,590 And if I do some quick math from week zero, with 32 bits, 1394 01:04:50,590 --> 01:04:54,040 you have 4 billion possible values, roughly. 1395 01:04:54,040 --> 01:04:56,290 But if you want to represent positive and negative, 1396 01:04:56,290 --> 01:04:59,710 that means you can represent roughly negative 2 billion, all the way up 1397 01:04:59,710 --> 01:05:01,020 to positive 2 billion. 1398 01:05:01,020 --> 01:05:02,770 So that's the range, typically, with ints. 1399 01:05:02,770 --> 01:05:06,820 If that's too few numbers for you, turns out there's things called longs. 1400 01:05:06,820 --> 01:05:10,120 And longs use 64 bits, which allow you to have 1401 01:05:10,120 --> 01:05:13,220 like a quintillion number of possibilities, 1402 01:05:13,220 --> 01:05:15,730 which is a lot, certainly, a lot more than 4 billion. 1403 01:05:15,730 --> 01:05:17,410 So sometimes you might use a long. 1404 01:05:17,410 --> 01:05:18,670 But even that's finite. 1405 01:05:18,670 --> 01:05:21,640 And so as we discussed at the end of last week, 1406 01:05:21,640 --> 01:05:23,980 bad things can happen if you make certain assumptions 1407 01:05:23,980 --> 01:05:27,220 as to the data because of things like integer overflow or the like, 1408 01:05:27,220 --> 01:05:28,330 where things wrap around. 1409 01:05:28,330 --> 01:05:31,538 Then there's a float, which is a real number, something with a decimal point. 1410 01:05:31,538 --> 01:05:36,040 By convention, it's 4 bytes or 32 bits, which gives you, in short, 1411 01:05:36,040 --> 01:05:37,810 only a specific amount of precision. 1412 01:05:37,810 --> 01:05:41,620 It doesn't necessarily dictate how many numbers to the left or to the right. 1413 01:05:41,620 --> 01:05:45,250 In the aggregate, ultimately, you have though, 1414 01:05:45,250 --> 01:05:47,650 4 billion possible permutations still. 1415 01:05:47,650 --> 01:05:50,110 If you need more precision for scientific, for medical, 1416 01:05:50,110 --> 01:05:54,790 for financial applications, you might use 8 bytes, A.K.A. a double, 1417 01:05:54,790 --> 01:05:57,700 which just gives you more digits of precision. 1418 01:05:57,700 --> 01:06:01,360 They eventually get imprecise per the example we looked at last week, 1419 01:06:01,360 --> 01:06:03,610 but it at least gets you further down the line. 1420 01:06:03,610 --> 01:06:07,930 As an aside, in really, really important applications, in finance, 1421 01:06:07,930 --> 01:06:10,030 in medicine, in military operations, and the 1422 01:06:10,030 --> 01:06:12,640 like where you really can't have rounding errors-- 1423 01:06:12,640 --> 01:06:17,470 long story short, humans have developed libraries in C and other languages 1424 01:06:17,470 --> 01:06:19,317 that use more, even, than 8 bytes. 1425 01:06:19,317 --> 01:06:22,150 So there are solutions to these problems, but they're always finite. 1426 01:06:22,150 --> 01:06:24,070 You have to pick an upper bound. 1427 01:06:24,070 --> 01:06:27,070 Then there's char, which we saw briefly last week when I asked 1428 01:06:27,070 --> 01:06:29,470 the user for y or n, for yes or no. 1429 01:06:29,470 --> 01:06:32,470 And then there's a string, which I'm going to propose as a question mark 1430 01:06:32,470 --> 01:06:34,360 because a string totally depends. 1431 01:06:34,360 --> 01:06:35,380 Like, Hi! 1432 01:06:35,380 --> 01:06:38,890 H-I, exclamation point, would seem to be three bytes. 1433 01:06:38,890 --> 01:06:41,140 D-A-V-I-D, would seem to be five. 1434 01:06:41,140 --> 01:06:45,400 So the strings, clearly, are variable based on what you or the human type in. 1435 01:06:45,400 --> 01:06:48,140 So we'll see what this means, though, in just a bit. 1436 01:06:48,140 --> 01:06:51,580 This though, is the thing inside of your Mac, your PC, your phone. 1437 01:06:51,580 --> 01:06:53,680 It might not look exactly like this, but this is 1438 01:06:53,680 --> 01:06:56,187 a memory module for a modern computer. 1439 01:06:56,187 --> 01:06:57,520 And let's go ahead and use this. 1440 01:06:57,520 --> 01:06:59,920 Really, it's just representative of the finite amount of memory 1441 01:06:59,920 --> 01:07:01,360 that any computer, indeed, has. 1442 01:07:01,360 --> 01:07:06,160 Let's zoom in on one of these little black chips on the circuit board here. 1443 01:07:06,160 --> 01:07:10,180 Zoom in, and let me propose that this rectangle really represents 1444 01:07:10,180 --> 01:07:14,380 some number of bytes, like tucked inside of this little black circuit 1445 01:07:14,380 --> 01:07:16,750 on the board is maybe, I don't know, a gigabyte, 1446 01:07:16,750 --> 01:07:19,300 a billion bytes, maybe it's 100 bytes-- some number of bytes. 1447 01:07:19,300 --> 01:07:21,258 It totally depends on the computer and how much 1448 01:07:21,258 --> 01:07:22,700 you paid for the stick of memory. 1449 01:07:22,700 --> 01:07:27,850 But if there's a finite number of bytes physically implemented somehow 1450 01:07:27,850 --> 01:07:30,327 digitally inside of this hardware, well, then it 1451 01:07:30,327 --> 01:07:32,410 stands to reason that we could number those bytes. 1452 01:07:32,410 --> 01:07:36,940 We can just arbitrarily decide that the top left corner is byte number 1453 01:07:36,940 --> 01:07:38,800 one, or really byte number zero. 1454 01:07:38,800 --> 01:07:41,170 The one next to it is number one, then number two, 1455 01:07:41,170 --> 01:07:43,450 number 3, dot, dot, dot, number 2 billion 1456 01:07:43,450 --> 01:07:46,090 or whatever it is, however big this memory is. 1457 01:07:46,090 --> 01:07:50,530 So if you use a variable in a C program, that's only one byte. 1458 01:07:50,530 --> 01:07:54,190 Like a char, it might literally be stored in that top left-hand corner 1459 01:07:54,190 --> 01:07:55,120 of the memory. 1460 01:07:55,120 --> 01:07:57,760 In practice, you don't care where, physically, it is. 1461 01:07:57,760 --> 01:07:59,830 But really, the artist's rendition would be 1462 01:07:59,830 --> 01:08:02,872 this-- a char might use one of those single bytes 1463 01:08:02,872 --> 01:08:04,330 somewhere in the computer's memory. 1464 01:08:04,330 --> 01:08:07,450 If you use an int, which is 4 bytes, it would give you 1465 01:08:07,450 --> 01:08:10,840 4 bytes, contiguous-- that is left to right, top to bottom. 1466 01:08:10,840 --> 01:08:13,274 But all 32 bits would be next to each other 1467 01:08:13,274 --> 01:08:16,149 so the computer knows that those, indeed, all belong to the same int. 1468 01:08:16,149 --> 01:08:18,680 If you need a long, or a double for that matter, 1469 01:08:18,680 --> 01:08:21,140 then you might use a full 8 bytes in this case. 1470 01:08:21,140 --> 01:08:23,439 And you just keep using and using this memory, 1471 01:08:23,439 --> 01:08:26,170 kind of like a canvas, almost in Photoshop 1472 01:08:26,170 --> 01:08:29,845 or a spreadsheet where you can just move pixels or you can move data around, 1473 01:08:29,845 --> 01:08:31,720 that's really what your computer's memory is, 1474 01:08:31,720 --> 01:08:36,702 a canvas for storing information in units of bytes or 8 bits. 1475 01:08:36,702 --> 01:08:39,160 Now, we don't need to keep looking at these circuit boards. 1476 01:08:39,160 --> 01:08:41,287 We can abstract it away, as we often do. 1477 01:08:41,287 --> 01:08:43,120 And let's go ahead and zoom in on this grid, 1478 01:08:43,120 --> 01:08:45,740 just to consider some very specific variables. 1479 01:08:45,740 --> 01:08:49,180 So let me zoom in, and now I see fewer, but larger boxes 1480 01:08:49,180 --> 01:08:51,580 on the screen, each of which, again, represents a byte. 1481 01:08:51,580 --> 01:08:55,130 And now let me propose that we play with some actual code. 1482 01:08:55,130 --> 01:08:58,029 So here in C, albeit without a full program, 1483 01:08:58,029 --> 01:09:01,060 are three ints-- score1, score2, score3. 1484 01:09:01,060 --> 01:09:07,359 I have, coincidentally, given myself two scores around 72 and 73, 1485 01:09:07,359 --> 01:09:09,040 and then a pretty low score at 33. 1486 01:09:09,040 --> 01:09:12,048 Of course, last week or two weeks ago, this would have been high. 1487 01:09:12,048 --> 01:09:13,840 But now we're dealing with actual integers. 1488 01:09:13,840 --> 01:09:17,750 So these are three so-so scores on my quizzes or tests or the like. 1489 01:09:17,750 --> 01:09:19,250 So let me go to VS Code here. 1490 01:09:19,250 --> 01:09:22,210 And let's make a program called scores.c. 1491 01:09:22,210 --> 01:09:24,399 So I'm going to write, code scores.c. 1492 01:09:24,399 --> 01:09:26,149 That's going to give me my new file. 1493 01:09:26,149 --> 01:09:28,420 And let me go ahead and implement something like this. 1494 01:09:28,420 --> 01:09:34,149 Include stdio.h, int main(void), and then inside of here, 1495 01:09:34,149 --> 01:09:37,689 let me do int score1 will be 72. 1496 01:09:37,689 --> 01:09:40,029 Int score2 will be 73. 1497 01:09:40,029 --> 01:09:43,149 And int score3 will be 33. 1498 01:09:43,149 --> 01:09:45,460 And then let me just do something like write a program 1499 01:09:45,460 --> 01:09:48,043 to average my three test scores together, something like that. 1500 01:09:48,043 --> 01:09:52,240 So let me do printf, quote unquote, my average is-- 1501 01:09:52,240 --> 01:09:56,470 and I'm going to go ahead and do, say, %i, /n. 1502 01:09:56,470 --> 01:09:58,290 And now, let me plug in the results. 1503 01:09:58,290 --> 01:10:00,040 And this is kind of grade school math now. 1504 01:10:00,040 --> 01:10:02,210 How do I compute the average of three values? 1505 01:10:02,210 --> 01:10:09,110 Well, just like on paper, I can do score1 plus score2 plus score3 1506 01:10:09,110 --> 01:10:12,830 in parentheses, because of order of operations, divided by 3, 1507 01:10:12,830 --> 01:10:14,457 since there's three total scores. 1508 01:10:14,457 --> 01:10:16,040 All right, so I think this checks out. 1509 01:10:16,040 --> 01:10:19,040 And indeed, you can use parentheses and operators like plus in your code 1510 01:10:19,040 --> 01:10:23,180 like this in C. Let me go ahead now and do make scores. 1511 01:10:23,180 --> 01:10:24,327 No syntax error. 1512 01:10:24,327 --> 01:10:25,910 So that's good, nothing missing there. 1513 01:10:25,910 --> 01:10:28,850 And now let me do ./scores and see what my test average is. 1514 01:10:28,850 --> 01:10:32,270 All right, it's not great, but I think I still passed. 1515 01:10:32,270 --> 01:10:36,050 And indeed, my average here is 59. 1516 01:10:36,050 --> 01:10:38,360 Is it precisely 59 though? 1517 01:10:38,360 --> 01:10:39,140 Well, let's see. 1518 01:10:39,140 --> 01:10:42,110 Let's actually, instead of using an int, how about we go ahead 1519 01:10:42,110 --> 01:10:44,870 and use something like a floating point value here? 1520 01:10:44,870 --> 01:10:46,250 And let me go ahead and do this. 1521 01:10:46,250 --> 01:10:48,710 So let me recompile my code, make scores. 1522 01:10:48,710 --> 01:10:50,600 Huh, all right, I've got an issue. 1523 01:10:50,600 --> 01:10:52,340 Let me zoom in on my terminal window. 1524 01:10:52,340 --> 01:10:54,710 We've not seen this one, necessarily, before. 1525 01:10:54,710 --> 01:10:56,510 But error on line 9. 1526 01:10:56,510 --> 01:11:00,410 Format specifies type double, which is a lot of precision, 1527 01:11:00,410 --> 01:11:02,180 but the argument has type int. 1528 01:11:02,180 --> 01:11:03,300 So what does this mean? 1529 01:11:03,300 --> 01:11:06,508 Well, it's showing me with these green squiggles that something's bad between 1530 01:11:06,508 --> 01:11:09,060 the %f and this thing over here. 1531 01:11:09,060 --> 01:11:13,020 Well, on the left, I'm implying a float, or a double for that matter. 1532 01:11:13,020 --> 01:11:16,835 On the right, though, what data type are score1, score2, score3? 1533 01:11:16,835 --> 01:11:17,960 All right, so they're ints. 1534 01:11:17,960 --> 01:11:19,583 So clang does not like this. 1535 01:11:19,583 --> 01:11:22,250 The compiler just doesn't like that I'm using ints on the right, 1536 01:11:22,250 --> 01:11:24,170 but I want floats on the left. 1537 01:11:24,170 --> 01:11:26,670 So there's going to be different ways of solving this. 1538 01:11:26,670 --> 01:11:29,870 One way would be to just ignore the problem like I originally did, 1539 01:11:29,870 --> 01:11:32,450 and just go back to %i. 1540 01:11:32,450 --> 01:11:38,330 Or as an aside, %d is often an alternative to %i for a decimal number. 1541 01:11:38,330 --> 01:11:42,358 But we use %i because it sounds like int, so %i is fine here too. 1542 01:11:42,358 --> 01:11:44,150 But I don't want to just avoid the problem. 1543 01:11:44,150 --> 01:11:46,500 I want to actually display a floating point value. 1544 01:11:46,500 --> 01:11:47,730 So how can I fix this? 1545 01:11:47,730 --> 01:11:50,272 Well, it turns out, I can solve this in a few different ways. 1546 01:11:50,272 --> 01:11:53,990 The simplest is just to make sure that at least one number on the right 1547 01:11:53,990 --> 01:11:59,330 is a floating point value, like 3.0 instead of just 3. 1548 01:11:59,330 --> 01:12:01,700 Now I think clang will be happier. 1549 01:12:01,700 --> 01:12:03,320 Let me do make scores-- 1550 01:12:03,320 --> 01:12:04,400 Enter. 1551 01:12:04,400 --> 01:12:05,330 And indeed, it's OK. 1552 01:12:05,330 --> 01:12:05,930 Why? 1553 01:12:05,930 --> 01:12:10,050 As soon as you have at least one more precise data type on the right, 1554 01:12:10,050 --> 01:12:13,170 it just treats everything, at that point, as floating point value 1555 01:12:13,170 --> 01:12:14,330 so that the math works out. 1556 01:12:14,330 --> 01:12:17,720 So ./scores, Enter-- and now, there we go, right? 1557 01:12:17,720 --> 01:12:20,390 Some of us might really want that 1/3 of a point. 1558 01:12:20,390 --> 01:12:21,980 Our average was not 59. 1559 01:12:21,980 --> 01:12:25,010 It's 59 1/3, as in this case here. 1560 01:12:25,010 --> 01:12:26,750 All right, so we've solved that there. 1561 01:12:26,750 --> 01:12:30,890 As an aside, though, there's one other technique to show here. 1562 01:12:30,890 --> 01:12:33,320 If you didn't want to change it to 3.0 because that's 1563 01:12:33,320 --> 01:12:36,410 a little weird, because there were literally three scores, 1564 01:12:36,410 --> 01:12:38,760 it's not like that needs to have a decimal point, 1565 01:12:38,760 --> 01:12:43,970 you could also explicitly convert the 3 to a float 1566 01:12:43,970 --> 01:12:46,230 by saying, in parentheses, float. 1567 01:12:46,230 --> 01:12:48,050 This is what's called typecasting. 1568 01:12:48,050 --> 01:12:51,840 And this will just convert the thing right after it to that data type, 1569 01:12:51,840 --> 01:12:52,560 if it's possible. 1570 01:12:52,560 --> 01:12:56,970 So if I do this again, make scores, no errors now. ./scores, and I get, 1571 01:12:56,970 --> 01:12:59,960 in fact, the same result. There's a bit of a rounding issue here, 1572 01:12:59,960 --> 01:13:03,650 but we know the rounding relates to the imprecision from last week. 1573 01:13:03,650 --> 01:13:06,980 For now, let me just be happy with my 59.3 something. 1574 01:13:06,980 --> 01:13:08,360 I'll take that for now. 1575 01:13:08,360 --> 01:13:14,660 But this is as close to a good enough correct answer for me now. 1576 01:13:14,660 --> 01:13:15,942 But how do I-- 1577 01:13:15,942 --> 01:13:18,650 think about now, what's going on inside of the computer's memory? 1578 01:13:18,650 --> 01:13:19,310 Well, let's consider. 1579 01:13:19,310 --> 01:13:20,643 Here's that same grid of memory. 1580 01:13:20,643 --> 01:13:22,490 Each box represents a byte. 1581 01:13:22,490 --> 01:13:25,790 Where are score1, score2, and score3 in my memory? 1582 01:13:25,790 --> 01:13:28,790 Well, score1, let me just propose, is at the top left. 1583 01:13:28,790 --> 01:13:32,060 But it's taking up four boxes for 4 bytes. 1584 01:13:32,060 --> 01:13:34,842 Score2 probably ends up right next to it in memory, 1585 01:13:34,842 --> 01:13:36,800 though, this isn't always going to be the case, 1586 01:13:36,800 --> 01:13:38,180 but I've chosen simple examples. 1587 01:13:38,180 --> 01:13:40,910 73 is next to it, also taking up 4 bytes. 1588 01:13:40,910 --> 01:13:45,320 And then lastly, 33 is in score3, down there underneath. 1589 01:13:45,320 --> 01:13:48,343 Now, if we really look at the computer's memory, 1590 01:13:48,343 --> 01:13:50,510 look at it with some kind of microscope or the like, 1591 01:13:50,510 --> 01:13:54,110 there's actually 32 bits, 32 bits, 32 bits 1592 01:13:54,110 --> 01:13:59,308 in each of those four groups of four bytes representing those values. 1593 01:13:59,308 --> 01:14:01,100 But again, for today's purposes onwards, we 1594 01:14:01,100 --> 01:14:03,308 don't really need to think again and again in binary. 1595 01:14:03,308 --> 01:14:05,940 It's just, indeed, these decimal numbers being stored there. 1596 01:14:05,940 --> 01:14:08,240 But I claim now, this isn't the best design. 1597 01:14:08,240 --> 01:14:11,300 Even if you have never programmed before CS50, 1598 01:14:11,300 --> 01:14:13,220 what you're looking at here on the screen, 1599 01:14:13,220 --> 01:14:16,970 as an excerpt, in what sense is this perhaps bad design, even though it's 1600 01:14:16,970 --> 01:14:19,960 a correct way of storing three test scores? 1601 01:14:19,960 --> 01:14:20,960 What's kind of bad here? 1602 01:14:20,960 --> 01:14:21,882 Yeah? 1603 01:14:21,882 --> 01:14:26,220 AUDIENCE: The more scores you have, the more you [INAUDIBLE].. 1604 01:14:26,220 --> 01:14:28,950 DAVID MALAN: Yeah, always do exactly what you did-- extrapolate 1605 01:14:28,950 --> 01:14:31,740 to 4 scores, 5 scores 50 scores. 1606 01:14:31,740 --> 01:14:34,020 This can't be that well-designed because now you're 1607 01:14:34,020 --> 01:14:36,300 going to have 4 lines of code, 5 lines of code, 1608 01:14:36,300 --> 01:14:38,550 50 lines of code that are almost identical, 1609 01:14:38,550 --> 01:14:40,770 except for this like arbitrary number that we're 1610 01:14:40,770 --> 01:14:42,430 updating at the end of the variable. 1611 01:14:42,430 --> 01:14:44,940 So indeed, there's probably going to be a better 1612 01:14:44,940 --> 01:14:48,690 way, even though, at least in C, we haven't yet seen that technique. 1613 01:14:48,690 --> 01:14:52,440 But the solution, today onward, is going to be something called an array. 1614 01:14:52,440 --> 01:14:57,180 An array is a way of storing your data back 1615 01:14:57,180 --> 01:15:00,630 to back to back in the computer's memory in such a way 1616 01:15:00,630 --> 01:15:03,960 that you can access each individual member easily. 1617 01:15:03,960 --> 01:15:08,530 Put another way, with an array, you can instead do something like this. 1618 01:15:08,530 --> 01:15:12,300 Instead of saying int score1, int score2, int score3, 1619 01:15:12,300 --> 01:15:15,790 giving each a value, you can first tell the computer, 1620 01:15:15,790 --> 01:15:18,330 please give me a variable called scores-- 1621 01:15:18,330 --> 01:15:20,700 plural, though you can call it anything you want-- 1622 01:15:20,700 --> 01:15:24,090 of size three, each of which will be an integer. 1623 01:15:24,090 --> 01:15:28,680 That is to say, this is how you declare an array in C that will have 1624 01:15:28,680 --> 01:15:30,930 enough room to store three integers. 1625 01:15:30,930 --> 01:15:34,540 Put another way, this is the technical way of telling the computer, 1626 01:15:34,540 --> 01:15:38,880 please give me 12 bytes in total-- 1627 01:15:38,880 --> 01:15:42,660 3 times 4 each for an int, so give me 12 bytes in total. 1628 01:15:42,660 --> 01:15:44,640 And what the computer will do is guarantee 1629 01:15:44,640 --> 01:15:47,350 that they're back to back to back in the computer's memory. 1630 01:15:47,350 --> 01:15:49,360 And that'll be useful in just a moment. 1631 01:15:49,360 --> 01:15:51,820 So let me go ahead and do something useful with this. 1632 01:15:51,820 --> 01:15:53,640 Let me store three actual scores. 1633 01:15:53,640 --> 01:15:58,500 Here's how I could now store those same numeric scores in this array. 1634 01:15:58,500 --> 01:16:03,040 Syntax is a little different, but there's one variable called scores. 1635 01:16:03,040 --> 01:16:05,010 But if you want to go to its first location, 1636 01:16:05,010 --> 01:16:08,520 starting today, you use square brackets and go to location 0 1637 01:16:08,520 --> 01:16:13,080 first, which because things in C are 0 indexed, so to speak, 1638 01:16:13,080 --> 01:16:14,280 you start counting at 0. 1639 01:16:14,280 --> 01:16:16,410 The first int is at [0]. 1640 01:16:16,410 --> 01:16:18,030 Second int is at [1]. 1641 01:16:18,030 --> 01:16:19,530 Third int is at [2]. 1642 01:16:19,530 --> 01:16:20,730 So it's not one, two, three. 1643 01:16:20,730 --> 01:16:22,090 It's literally 0, 1, 2. 1644 01:16:22,090 --> 01:16:24,090 And this is not something you have control over. 1645 01:16:24,090 --> 01:16:26,250 You must start at 0. 1646 01:16:26,250 --> 01:16:29,940 So these lines now create an array of size three, 1647 01:16:29,940 --> 01:16:33,510 and then insert one, two, three values into that array. 1648 01:16:33,510 --> 01:16:37,770 But the upside now is that you only have one name of the variable to remember. 1649 01:16:37,770 --> 01:16:39,240 It's just called scores. 1650 01:16:39,240 --> 01:16:43,380 Yes, you need to go into the array to get individual values. 1651 01:16:43,380 --> 01:16:46,618 You need to index into it using those square brackets. 1652 01:16:46,618 --> 01:16:48,660 But at least you don't have this hackish approach 1653 01:16:48,660 --> 01:16:53,050 of declaring a separate variable for each and every one of these values. 1654 01:16:53,050 --> 01:16:56,070 So let me go back to scores.c here. 1655 01:16:56,070 --> 01:16:57,580 And let me propose that I do this. 1656 01:16:57,580 --> 01:17:00,580 Let me just use that same idea to do the following. 1657 01:17:00,580 --> 01:17:02,580 Let me get rid of these three separate integers. 1658 01:17:02,580 --> 01:17:06,210 Let me give myself an int scores array of size 3. 1659 01:17:06,210 --> 01:17:10,470 And then scores[0] will, as before, be 72. 1660 01:17:10,470 --> 01:17:14,070 Scores[1] will be 73. 1661 01:17:14,070 --> 01:17:16,830 And scores[2] will be 33. 1662 01:17:16,830 --> 01:17:18,780 And let me get rid of the little dot there. 1663 01:17:18,780 --> 01:17:23,490 All right, so now, if I go ahead and run this again with make scores-- 1664 01:17:23,490 --> 01:17:24,642 Enter. 1665 01:17:24,642 --> 01:17:29,060 Huh, what did I do wrong here? 1666 01:17:29,060 --> 01:17:31,680 I think I got a little too ahead of myself. 1667 01:17:31,680 --> 01:17:36,100 Let me increase my terminal window. 1668 01:17:36,100 --> 01:17:38,830 Let's focus on line 10 here, first. 1669 01:17:38,830 --> 01:17:42,310 Error, use of undeclared identifier, score1. 1670 01:17:42,310 --> 01:17:44,170 What did I do here that was dumb? 1671 01:17:44,170 --> 01:17:45,430 Yeah? 1672 01:17:45,430 --> 01:17:47,440 AUDIENCE: You didn't declare it a variable. 1673 01:17:47,440 --> 01:17:49,420 DAVID MALAN: Right, so I didn't declare score1. 1674 01:17:49,420 --> 01:17:50,530 I've got old code. 1675 01:17:50,530 --> 01:17:53,798 So I just kind of, honestly, got ahead of myself here, not even intentionally. 1676 01:17:53,798 --> 01:17:56,090 So let me go ahead and shrink my terminal window again. 1677 01:17:56,090 --> 01:17:57,740 I need to finish my thought here. 1678 01:17:57,740 --> 01:17:58,960 So let me clear my terminal. 1679 01:17:58,960 --> 01:18:04,960 And let me change this now to be scores[0] plus scores[1] plus 1680 01:18:04,960 --> 01:18:05,610 scores[2]. 1681 01:18:05,610 --> 01:18:07,360 So it's a little more verbose because I've 1682 01:18:07,360 --> 01:18:10,040 got these square brackets, so to speak. 1683 01:18:10,040 --> 01:18:12,220 But I think now my code is consistent. 1684 01:18:12,220 --> 01:18:13,870 So let me make scores now. 1685 01:18:13,870 --> 01:18:14,950 It now compiles. 1686 01:18:14,950 --> 01:18:19,870 ./scores gives me, indeed, the same rough average with those same values. 1687 01:18:19,870 --> 01:18:24,280 All right, so let me go ahead and maybe enhance this a little bit. 1688 01:18:24,280 --> 01:18:26,920 It's a little silly to have to write a special program just 1689 01:18:26,920 --> 01:18:31,610 to check your average of three test scores like 72, 73, 33. 1690 01:18:31,610 --> 01:18:33,550 Why don't I actually make the program dynamic 1691 01:18:33,550 --> 01:18:37,250 and ask the human for those scores? 1692 01:18:37,250 --> 01:18:39,140 So instead, let me do this. 1693 01:18:39,140 --> 01:18:43,480 How about we get rid of the 72, and change this to getInt. 1694 01:18:43,480 --> 01:18:46,300 And I'll just prompt the user for a score. 1695 01:18:46,300 --> 01:18:52,510 Let me get rid of the 73 and get this to be getInt score, quote unquote. 1696 01:18:52,510 --> 01:18:56,560 And then lastly, get rid of the 33, and replace it with getInt, quote unquote, 1697 01:18:56,560 --> 01:18:57,670 score. 1698 01:18:57,670 --> 01:19:03,680 getInt is a CS50 thing for now, so I need to include cs50.h, as always. 1699 01:19:03,680 --> 01:19:05,650 But I think now, it's sort of a better program 1700 01:19:05,650 --> 01:19:08,680 because now I can compile it once, I can even share it with my friends. 1701 01:19:08,680 --> 01:19:12,490 And now any of us can average three scores on some classes test. 1702 01:19:12,490 --> 01:19:15,190 They don't need to know the code or rewrite the code just 1703 01:19:15,190 --> 01:19:16,910 to type in their scores. 1704 01:19:16,910 --> 01:19:19,150 So make scores worked. 1705 01:19:19,150 --> 01:19:25,120 ./scores, now I can type anything I want-- maybe it's a 72, 73, 33, 1706 01:19:25,120 --> 01:19:26,320 still get the same answer. 1707 01:19:26,320 --> 01:19:31,210 Or maybe I'm having a better semester, 100, 100, maybe 99, 1708 01:19:31,210 --> 01:19:33,520 and now we get still a pretty high score there. 1709 01:19:33,520 --> 01:19:34,600 But now it's dynamic. 1710 01:19:34,600 --> 01:19:36,080 Now you don't need the source code. 1711 01:19:36,080 --> 01:19:37,747 You don't need to recompile the program. 1712 01:19:37,747 --> 01:19:39,670 It's just going to work again and again. 1713 01:19:39,670 --> 01:19:41,090 But this, too. 1714 01:19:41,090 --> 01:19:43,660 Let me propose that this code is correct if I 1715 01:19:43,660 --> 01:19:45,910 want to get three scores from the user. 1716 01:19:45,910 --> 01:19:50,950 But these highlighted lines now, 6 through 9, are they well-designed, 1717 01:19:50,950 --> 01:19:53,170 would you say? 1718 01:19:53,170 --> 01:19:53,680 Yeah? 1719 01:19:53,680 --> 01:19:54,898 AUDIENCE: Can you loop? 1720 01:19:54,898 --> 01:19:55,940 DAVID MALAN: Yeah, right? 1721 01:19:55,940 --> 01:19:58,220 This is-- we can use a loop, is the spoiler here. 1722 01:19:58,220 --> 01:19:58,820 Why? 1723 01:19:58,820 --> 01:20:01,590 I mean, my God, it's like the same code again and again and again. 1724 01:20:01,590 --> 01:20:03,465 The only thing that's changing is the number. 1725 01:20:03,465 --> 01:20:06,170 And this should have kind of had some code smell again, 1726 01:20:06,170 --> 01:20:09,080 because if I keep typing the same thing again and again, 1727 01:20:09,080 --> 01:20:11,810 that's clearly an opportunity to better design something. 1728 01:20:11,810 --> 01:20:13,650 So let me do this. 1729 01:20:13,650 --> 01:20:18,590 Let me go ahead and still create my array of size three. 1730 01:20:18,590 --> 01:20:23,270 But let me use our old friend, the for loop, for int i equals 0, 1731 01:20:23,270 --> 01:20:26,610 i less than 3, i++. 1732 01:20:26,610 --> 01:20:29,510 And then in here, let me do scores bracket-- 1733 01:20:29,510 --> 01:20:32,920 we haven't seen this before, but any intuition? 1734 01:20:32,920 --> 01:20:34,220 Scores bracket-- 1735 01:20:34,220 --> 01:20:34,720 AUDIENCE: i. 1736 01:20:34,720 --> 01:20:39,730 DAVID MALAN: i, because that will use whatever i is, be it 0 or 1 or 2 1737 01:20:39,730 --> 01:20:40,720 in iteration. 1738 01:20:40,720 --> 01:20:43,780 And then I can get an int, asking the user for score, 1739 01:20:43,780 --> 01:20:47,000 without having to repeat myself again and again. 1740 01:20:47,000 --> 01:20:50,560 So hopefully, if I didn't make any typos, make scores, all good. 1741 01:20:50,560 --> 01:20:54,665 ./scores, 72, 73, 33, and we're back in business. 1742 01:20:54,665 --> 01:20:56,540 But the code is arguably now better designed, 1743 01:20:56,540 --> 01:21:01,240 because now, I haven't actually hardcoded the scores, 1744 01:21:01,240 --> 01:21:04,940 and I haven't actually copied and pasted any of that code. 1745 01:21:04,940 --> 01:21:08,230 Well, if we consider now what's going on inside of the computer's memory, 1746 01:21:08,230 --> 01:21:10,510 it's pretty much the same in terms of the values. 1747 01:21:10,510 --> 01:21:15,490 But instead of the variables being, literally, score1, score2, score3, 1748 01:21:15,490 --> 01:21:17,210 there's just one variable. 1749 01:21:17,210 --> 01:21:19,030 It's an array called scores. 1750 01:21:19,030 --> 01:21:24,550 But you can index into its three locations by using scores[0] to get 1751 01:21:24,550 --> 01:21:28,810 the first, scores[1] to get the second, scores[2] to get the third. 1752 01:21:28,810 --> 01:21:29,990 But this is key. 1753 01:21:29,990 --> 01:21:33,040 The memory is contiguous. 1754 01:21:33,040 --> 01:21:35,380 The screen is only so large, so it wraps around. 1755 01:21:35,380 --> 01:21:38,950 But physically, digitally, the memory is contiguous-- top 1756 01:21:38,950 --> 01:21:40,270 to bottom, left to right. 1757 01:21:40,270 --> 01:21:41,530 And that's important, why? 1758 01:21:41,530 --> 01:21:46,060 Because the brackets indicate 0, 1, 2, that each of these integers 1759 01:21:46,060 --> 01:21:48,790 is just one integer away from the next. 1760 01:21:48,790 --> 01:21:51,220 It can't be randomly down here all of a sudden. 1761 01:21:51,220 --> 01:21:54,070 It's got to be back to back to back. 1762 01:21:54,070 --> 01:21:57,130 All right, now equipped with that paradigm, 1763 01:21:57,130 --> 01:22:00,710 what more could we actually do here? 1764 01:22:00,710 --> 01:22:04,270 Well, it turns out, it's worth knowing that it's possible in code 1765 01:22:04,270 --> 01:22:06,850 to even pass arrays around as arguments. 1766 01:22:06,850 --> 01:22:09,100 And let me just whip this program up somewhat quickly, 1767 01:22:09,100 --> 01:22:11,320 just so you've seen it before long. 1768 01:22:11,320 --> 01:22:13,190 But let me go ahead and do this. 1769 01:22:13,190 --> 01:22:18,130 Let me propose that I create a function that does this averaging for me. 1770 01:22:18,130 --> 01:22:22,510 So I'm going to create a function called average that returns a float. 1771 01:22:22,510 --> 01:22:26,860 And the arguments this thing is going to take-- 1772 01:22:26,860 --> 01:22:28,640 let's see, it's going to be the array. 1773 01:22:28,640 --> 01:22:31,480 So it turns out, if you want to take in an array of numbers-- 1774 01:22:31,480 --> 01:22:33,050 you can call it anything you want. 1775 01:22:33,050 --> 01:22:36,970 This is how you tell C that a function takes, not 1776 01:22:36,970 --> 01:22:39,790 an integer, but an array of integers. 1777 01:22:39,790 --> 01:22:41,290 And you don't have to call it array. 1778 01:22:41,290 --> 01:22:42,790 I'm doing that just for the sake of discussion. 1779 01:22:42,790 --> 01:22:43,660 It can be called x. 1780 01:22:43,660 --> 01:22:44,490 It can be numbers. 1781 01:22:44,490 --> 01:22:45,490 It can be anything else. 1782 01:22:45,490 --> 01:22:49,060 I'm just calling an array to be super explicit as to what it is there. 1783 01:22:49,060 --> 01:22:51,730 Now, how do I change my code down here? 1784 01:22:51,730 --> 01:22:55,130 What I think I'm going to do for the moment is just this. 1785 01:22:55,130 --> 01:22:59,110 I'm going to get rid of this code here, where I manually computed the average. 1786 01:22:59,110 --> 01:23:01,480 And let me just call the average function here 1787 01:23:01,480 --> 01:23:05,000 by passing in the whole array of scores. 1788 01:23:05,000 --> 01:23:07,030 So this is just an example of abstraction, 1789 01:23:07,030 --> 01:23:08,890 like now I have a function called average. 1790 01:23:08,890 --> 01:23:09,670 I don't care. 1791 01:23:09,670 --> 01:23:12,490 I don't have to remember how it works once I implement it. 1792 01:23:12,490 --> 01:23:15,010 It just kind of tightens up my main code a little bit. 1793 01:23:15,010 --> 01:23:17,030 But I do still have to implement this. 1794 01:23:17,030 --> 01:23:19,360 So later in my file-- let me repeat myself before, 1795 01:23:19,360 --> 01:23:22,270 the only time it's OK in C to repeat yourself again and again, 1796 01:23:22,270 --> 01:23:27,010 by typing out again, average, and then int array open bracket-- 1797 01:23:27,010 --> 01:23:28,580 but now not a semicolon. 1798 01:23:28,580 --> 01:23:30,250 Now I have to implement this thing. 1799 01:23:30,250 --> 01:23:33,400 And I can implement this in a bunch of different ways, 1800 01:23:33,400 --> 01:23:37,630 but I don't know in advance-- 1801 01:23:37,630 --> 01:23:39,040 I can't just do this. 1802 01:23:39,040 --> 01:23:48,400 I can't just do array[0] plus array[1] plus array[2], 1803 01:23:48,400 --> 01:23:52,130 unless this program's only ever going to work on three numbers. 1804 01:23:52,130 --> 01:23:55,460 So let me go ahead and do this. 1805 01:23:55,460 --> 01:23:58,570 Let me first propose that there's a poor design here. 1806 01:23:58,570 --> 01:24:01,930 In my main function, what value have I repeated twice? 1807 01:24:01,930 --> 01:24:05,050 1808 01:24:05,050 --> 01:24:07,550 Among the highlighted lines, what jumps out at you as twice? 1809 01:24:07,550 --> 01:24:09,020 AUDIENCE: The length of the array? 1810 01:24:09,020 --> 01:24:11,520 DAVID MALAN: Yeah, the length of the array, it's just three. 1811 01:24:11,520 --> 01:24:14,720 Now it's not a huge deal that I typed the number three on line 8 and line 9, 1812 01:24:14,720 --> 01:24:17,120 but this is exactly the kind of like shortcut 1813 01:24:17,120 --> 01:24:18,440 that's going to get you in trouble eventually. 1814 01:24:18,440 --> 01:24:18,860 Why? 1815 01:24:18,860 --> 01:24:20,240 Because, eventually, you or someone else is 1816 01:24:20,240 --> 01:24:22,407 going to go in and make the array bigger or smaller, 1817 01:24:22,407 --> 01:24:24,410 and you're not going to realize that magically, 1818 01:24:24,410 --> 01:24:26,270 that same number is in two places. 1819 01:24:26,270 --> 01:24:29,270 And indeed, this is what a programmer would often call a magic number. 1820 01:24:29,270 --> 01:24:31,940 A magic number is one that just kind of appears magically. 1821 01:24:31,940 --> 01:24:35,210 And you're on the honor system to change it here, if you change it here, 1822 01:24:35,210 --> 01:24:36,688 and then you change it over here. 1823 01:24:36,688 --> 01:24:39,230 That's not going to end well if the onus is on the programmer 1824 01:24:39,230 --> 01:24:43,190 to remember where they hardcoded-- that is, wrote out three explicitly. 1825 01:24:43,190 --> 01:24:46,250 So any time you reuse a value like this, you know what? 1826 01:24:46,250 --> 01:24:50,690 We should probably do what we did last week, which was to declare a variable, 1827 01:24:50,690 --> 01:24:53,510 perhaps at the very top of my program, so it's super obvious 1828 01:24:53,510 --> 01:24:56,990 what it is, called, maybe n, and set that equal to 3. 1829 01:24:56,990 --> 01:24:59,030 Better yet, what did I do last week to make sure 1830 01:24:59,030 --> 01:25:02,390 that I can't screw up and accidentally change that value? 1831 01:25:02,390 --> 01:25:03,440 Yeah, constant. 1832 01:25:03,440 --> 01:25:05,810 And the keyword there was just const for short. 1833 01:25:05,810 --> 01:25:09,110 And now I have a global variable-- global in the sense that I can 1834 01:25:09,110 --> 01:25:11,870 access it anywhere-- that is called n. 1835 01:25:11,870 --> 01:25:12,680 It's an int. 1836 01:25:12,680 --> 01:25:14,450 And it's always going to be 3. 1837 01:25:14,450 --> 01:25:18,500 And now I can improve my main function a little bit by just changing 1838 01:25:18,500 --> 01:25:22,662 the 3's to n, so now if I, if a colleague realized, oh, wait a minute, 1839 01:25:22,662 --> 01:25:23,870 there's four tests this year. 1840 01:25:23,870 --> 01:25:25,610 You change n to four, recompile the code, 1841 01:25:25,610 --> 01:25:31,190 and it just works everywhere else, except in my average function. 1842 01:25:31,190 --> 01:25:33,830 Let me change it back to 3, just for consistency. 1843 01:25:33,830 --> 01:25:39,770 This is not going to fly now, to just sum up things like this, for instance, 1844 01:25:39,770 --> 01:25:43,610 and then return this divided by 3. 1845 01:25:43,610 --> 01:25:51,130 Why will this not work now as I've defined it? 1846 01:25:51,130 --> 01:25:52,159 Yeah? 1847 01:25:52,159 --> 01:25:58,030 AUDIENCE: [INAUDIBLE] 1848 01:25:58,030 --> 01:26:00,980 DAVID MALAN: OK, I might be returning an integer value when 1849 01:26:00,980 --> 01:26:02,870 I intend to return a float per this. 1850 01:26:02,870 --> 01:26:05,870 But I think I'm OK because I used that little trick where I made sure 1851 01:26:05,870 --> 01:26:08,810 that at least one of the numbers in my arithmetic expression 1852 01:26:08,810 --> 01:26:11,010 is, in fact, a floating point value. 1853 01:26:11,010 --> 01:26:14,180 And just by adding the point 0, make sure that everything 1854 01:26:14,180 --> 01:26:15,650 gets treated as a float. 1855 01:26:15,650 --> 01:26:17,864 So I think that's OK. 1856 01:26:17,864 --> 01:26:19,034 AUDIENCE: [INAUDIBLE] 1857 01:26:19,034 --> 01:26:20,701 DAVID MALAN: I'm sorry, a little louder. 1858 01:26:20,701 --> 01:26:24,385 AUDIENCE: It just seems like you're [INAUDIBLE].. 1859 01:26:24,385 --> 01:26:25,260 DAVID MALAN: Exactly. 1860 01:26:25,260 --> 01:26:27,093 So left hand's not talking to the right hand 1861 01:26:27,093 --> 01:26:30,210 here, in that my current implementation of average 1862 01:26:30,210 --> 01:26:33,510 is still assuming that there's only going to be three tests or whatever. 1863 01:26:33,510 --> 01:26:35,670 But wait a minute, I just went through the trouble 1864 01:26:35,670 --> 01:26:39,480 of modifying this to be n, generically. 1865 01:26:39,480 --> 01:26:43,205 And if I change this to 4, I'm not going to be happy, perhaps, 1866 01:26:43,205 --> 01:26:46,080 with my average because now I'm going to ignore one of my test scores 1867 01:26:46,080 --> 01:26:46,690 altogether. 1868 01:26:46,690 --> 01:26:48,450 So let me change this back to 3. 1869 01:26:48,450 --> 01:26:51,180 And unfortunately, if it's a variable now, 1870 01:26:51,180 --> 01:26:55,500 n, and therefore, I have literally a variable number of scores, 1871 01:26:55,500 --> 01:27:00,920 how do I take the average of a variable number of things? 1872 01:27:00,920 --> 01:27:02,630 I mean, what's my building block there? 1873 01:27:02,630 --> 01:27:03,170 Yeah? 1874 01:27:03,170 --> 01:27:10,100 AUDIENCE: [INAUDIBLE] 1875 01:27:10,100 --> 01:27:10,850 DAVID MALAN: Yeah. 1876 01:27:10,850 --> 01:27:14,880 Why don't I use a loop that goes through the array and adds things up as you go? 1877 01:27:14,880 --> 01:27:17,360 I mean, kind of like grade school, as you take the average on your calculator 1878 01:27:17,360 --> 01:27:19,730 or paper and pencil, you just keep adding the numbers together, 1879 01:27:19,730 --> 01:27:22,380 and then you divide at the end by the total number of things. 1880 01:27:22,380 --> 01:27:23,520 So how can I do this? 1881 01:27:23,520 --> 01:27:25,730 Well, let me change my implementation of average 1882 01:27:25,730 --> 01:27:30,515 to first declare a variable called sum, or whatever, set it equal to 0. 1883 01:27:30,515 --> 01:27:33,140 So this is like me on my piece of paper getting ready to count, 1884 01:27:33,140 --> 01:27:36,590 or my calculator, of course, when you turn it on, typically defaults to zero. 1885 01:27:36,590 --> 01:27:41,570 And now, let me do for, int i equals 0. i is less than a-- 1886 01:27:41,570 --> 01:27:43,700 well, no, I didn't do that. 1887 01:27:43,700 --> 01:27:46,730 i is less than n, i++. 1888 01:27:46,730 --> 01:27:52,640 And now in here, let me go ahead and add to the current sum, whatever 1889 01:27:52,640 --> 01:27:55,910 is in the array's location, i. 1890 01:27:55,910 --> 01:28:00,740 And then down here, I think I can just return some divided by 3.0-- 1891 01:28:00,740 --> 01:28:04,560 not 3.0, n, perhaps here. 1892 01:28:04,560 --> 01:28:08,492 And actually, I think I'm going to get-- let's make sure it's a float. 1893 01:28:08,492 --> 01:28:11,450 Let's use the type casting trick just to make sure I don't accidentally 1894 01:28:11,450 --> 01:28:15,540 shortchange someone and throw away everything after the decimal point. 1895 01:28:15,540 --> 01:28:17,300 So it just escalated quickly, right? 1896 01:28:17,300 --> 01:28:18,990 Average just got a lot more involved. 1897 01:28:18,990 --> 01:28:22,130 It's not just a single one line of code, but now it's dynamic. 1898 01:28:22,130 --> 01:28:25,070 I initialize a variable called sum to 0. 1899 01:28:25,070 --> 01:28:30,920 In this loop, I go through and just keep adding to sum, which is initially 0, 1900 01:28:30,920 --> 01:28:33,200 whatever's in array[i]-- 1901 01:28:33,200 --> 01:28:36,740 or specifically array[0], array[1], array[2]. 1902 01:28:36,740 --> 01:28:40,970 That gives me a total sum that I return, divided by the total number of things. 1903 01:28:40,970 --> 01:28:42,560 Now, this I can tighten slightly. 1904 01:28:42,560 --> 01:28:45,650 Recall that this is syntactic sugar for just adding things. 1905 01:28:45,650 --> 01:28:48,620 I can't use plus plus because that only literally adds one. 1906 01:28:48,620 --> 01:28:52,630 But I can use here, plus equals. 1907 01:28:52,630 --> 01:28:54,880 Questions on this implementation here? 1908 01:28:54,880 --> 01:28:58,000 Really the only takeaway-- or the most important takeaway 1909 01:28:58,000 --> 01:29:00,730 is that this is the syntax for how you tell 1910 01:29:00,730 --> 01:29:04,210 a function that it expects a whole array, not 1911 01:29:04,210 --> 01:29:06,450 a single variable like an int or the like. 1912 01:29:06,450 --> 01:29:08,200 You literally use square brackets, but you 1913 01:29:08,200 --> 01:29:11,530 don't specify the length inside there. 1914 01:29:11,530 --> 01:29:12,748 Yeah? 1915 01:29:12,748 --> 01:29:16,410 AUDIENCE: What variable [INAUDIBLE] at the top? 1916 01:29:16,410 --> 01:29:18,410 DAVID MALAN: What about the variable at the top? 1917 01:29:18,410 --> 01:29:22,205 AUDIENCE: [INAUDIBLE] 1918 01:29:22,205 --> 01:29:23,330 DAVID MALAN: Good question. 1919 01:29:23,330 --> 01:29:25,220 What do I have it defined as at the top? 1920 01:29:25,220 --> 01:29:31,280 This variable, N, it must be an integer if you're going to use it inside 1921 01:29:31,280 --> 01:29:33,840 of an arrays square brackets here. 1922 01:29:33,840 --> 01:29:38,360 So this line 10, notice, no longer says 3, it says N. 1923 01:29:38,360 --> 01:29:42,350 And so whatever N is 3 or 4 or something else, that's how many 1924 01:29:42,350 --> 01:29:43,970 integers I will get in that array. 1925 01:29:43,970 --> 01:29:47,070 And it must be, by definition of an array, an integer that 1926 01:29:47,070 --> 01:29:48,320 goes in those square brackets. 1927 01:29:48,320 --> 01:29:50,000 And here's a common source of confusion. 1928 01:29:50,000 --> 01:29:52,350 When you create the array, that is declare it, 1929 01:29:52,350 --> 01:29:54,350 you use square brackets like this, where you put 1930 01:29:54,350 --> 01:29:56,210 the total number of elements you want. 1931 01:29:56,210 --> 01:29:59,820 When you subsequently use the array, like I'm doing here, 1932 01:29:59,820 --> 01:30:02,690 you don't mention int again-- just like you don't mention int 1933 01:30:02,690 --> 01:30:04,610 again and again once a variable exists. 1934 01:30:04,610 --> 01:30:10,220 You use the square brackets still, but you don't use N. You use 0 or 1 or 2 1935 01:30:10,220 --> 01:30:11,990 or, generically here, i. 1936 01:30:11,990 --> 01:30:14,810 So when C was designed, they sometimes used the same syntax 1937 01:30:14,810 --> 01:30:17,060 for two different ideas or contexts. 1938 01:30:17,060 --> 01:30:17,984 Yeah? 1939 01:30:17,984 --> 01:30:22,645 AUDIENCE: Do you have to include line 6 [INAUDIBLE]?? 1940 01:30:22,645 --> 01:30:23,770 DAVID MALAN: Good question. 1941 01:30:23,770 --> 01:30:25,900 Do I have to include line 6? 1942 01:30:25,900 --> 01:30:29,290 Short answer, yes, because of the reason we ran into last week. 1943 01:30:29,290 --> 01:30:32,750 C, or clang really, reads your code top to bottom, left to right. 1944 01:30:32,750 --> 01:30:38,890 And so if the compiler sees some mention of this function average on line 16, 1945 01:30:38,890 --> 01:30:41,800 but you haven't told the compiler that average exists, 1946 01:30:41,800 --> 01:30:43,610 you're going to get an error on the screen. 1947 01:30:43,610 --> 01:30:45,490 So the conventional way to do that is you 1948 01:30:45,490 --> 01:30:48,670 just copy paste the first line of code from the function, 1949 01:30:48,670 --> 01:30:51,260 it's so-called prototype or declaration. 1950 01:30:51,260 --> 01:30:51,760 Yeah? 1951 01:30:51,760 --> 01:30:55,662 AUDIENCE: Is there a library if you don't know the size of the array? 1952 01:30:55,662 --> 01:30:58,120 DAVID MALAN: Really good question, and a perfect segue way. 1953 01:30:58,120 --> 01:31:01,078 Is there a library you can use if you don't know the size of the array? 1954 01:31:01,078 --> 01:31:01,720 No. 1955 01:31:01,720 --> 01:31:07,660 And so if any of you have programmed in Java or Python or other languages, 1956 01:31:07,660 --> 01:31:11,020 you can actually just ask the array, how big is it? 1957 01:31:11,020 --> 01:31:13,778 In C, you and I, the programmers, have to remember it. 1958 01:31:13,778 --> 01:31:15,820 And so short answer, no, there's no function that 1959 01:31:15,820 --> 01:31:17,445 will just automatically do this for us. 1960 01:31:17,445 --> 01:31:20,230 And in fact, let me make a more subtle claim 1961 01:31:20,230 --> 01:31:23,950 that it's fine to use global variables like this if they're really 1962 01:31:23,950 --> 01:31:25,160 for configuration options. 1963 01:31:25,160 --> 01:31:25,660 Why? 1964 01:31:25,660 --> 01:31:28,160 It's just convenient to put them at the very top of the file 1965 01:31:28,160 --> 01:31:30,565 because everyone, you, your colleagues, your TAs 1966 01:31:30,565 --> 01:31:32,440 are going to see them at the top of the code. 1967 01:31:32,440 --> 01:31:36,130 But you really shouldn't be using them everywhere throughout your code. 1968 01:31:36,130 --> 01:31:38,380 It'd be better if the average function, itself, were 1969 01:31:38,380 --> 01:31:40,610 independent of that special variable. 1970 01:31:40,610 --> 01:31:42,025 So by that, I mean this. 1971 01:31:42,025 --> 01:31:46,240 You know what I should really do, if I really want to be well-designed? 1972 01:31:46,240 --> 01:31:51,400 I should pass in the length of the array to the average function. 1973 01:31:51,400 --> 01:31:54,310 I should give the average function a second argument-- 1974 01:31:54,310 --> 01:31:57,800 I'll call it length, for instance, but I could call it anything I want. 1975 01:31:57,800 --> 01:32:02,500 And so rather than putting N all the way down here at the bottom of my file, 1976 01:32:02,500 --> 01:32:05,745 let me just dynamically say length instead. 1977 01:32:05,745 --> 01:32:08,620 And this is a subtlety-- and no need to get too tripped up over this. 1978 01:32:08,620 --> 01:32:11,830 But this, now, is just an example of how the same function can 1979 01:32:11,830 --> 01:32:13,690 take not one, but two arguments. 1980 01:32:13,690 --> 01:32:19,400 But indeed, in C, you must remember, yourself, what the length of an array 1981 01:32:19,400 --> 01:32:19,900 is. 1982 01:32:19,900 --> 01:32:22,810 You can't just ask the array via some syntax 1983 01:32:22,810 --> 01:32:26,560 like you can, those of you who've programmed before in Java or Python. 1984 01:32:26,560 --> 01:32:27,070 Yeah? 1985 01:32:27,070 --> 01:32:35,115 AUDIENCE: [INAUDIBLE] 1986 01:32:35,115 --> 01:32:36,240 DAVID MALAN: Good question. 1987 01:32:36,240 --> 01:32:39,198 Would it be better designed to write a function that computes the size? 1988 01:32:39,198 --> 01:32:42,570 Short answer, can't do that in C. As soon as you pass an array 1989 01:32:42,570 --> 01:32:47,263 into a function in C, you cannot figure out its size if it's a generic array 1990 01:32:47,263 --> 01:32:48,180 like that of integers. 1991 01:32:48,180 --> 01:32:51,040 There are special cases that you can do that. 1992 01:32:51,040 --> 01:32:53,283 But in general, no, it's just not possible in C. 1993 01:32:53,283 --> 01:32:55,200 And if that's some frustration, honestly, this 1994 01:32:55,200 --> 01:32:57,180 is why more modern languages add that feature. 1995 01:32:57,180 --> 01:32:57,680 Why? 1996 01:32:57,680 --> 01:32:59,910 Because it was really annoying, as I'm alluding here 1997 01:32:59,910 --> 01:33:01,560 to not having that information. 1998 01:33:01,560 --> 01:33:03,643 Now, just to make sure I didn't screw up anywhere, 1999 01:33:03,643 --> 01:33:07,540 let me compile this final version of scores. 2000 01:33:07,540 --> 01:33:08,620 Suspense. 2001 01:33:08,620 --> 01:33:14,030 All good. ./scores, 72, 73, 33, and we're still back in business. 2002 01:33:14,030 --> 01:33:15,530 So this version is more complicated. 2003 01:33:15,530 --> 01:33:18,738 And as always, we'll have this version on the course's website for reference. 2004 01:33:18,738 --> 01:33:20,740 But the point, really, is that arrays, not only 2005 01:33:20,740 --> 01:33:23,290 can be used as containers to store multiple values-- 2006 01:33:23,290 --> 01:33:25,490 three or more in this case-- 2007 01:33:25,490 --> 01:33:30,440 you can also even pass them around as arguments, as such. 2008 01:33:30,440 --> 01:33:34,300 All right, now besides that, let's simplify for just a moment, 2009 01:33:34,300 --> 01:33:36,100 and consider now the world of chars. 2010 01:33:36,100 --> 01:33:39,200 If we've just got single bytes, where does this lead us? 2011 01:33:39,200 --> 01:33:41,200 And how does this get us, ultimately, to strings 2012 01:33:41,200 --> 01:33:44,170 to solve problems like readability and cryptography and the like? 2013 01:33:44,170 --> 01:33:46,390 Well here, for instance, are three lines of code, 2014 01:33:46,390 --> 01:33:48,967 out of context, that simply store three chars. 2015 01:33:48,967 --> 01:33:50,800 And you can already see where this is going. 2016 01:33:50,800 --> 01:33:53,920 Having three variables called c1, c2, c3 is clearly 2017 01:33:53,920 --> 01:33:57,470 going to end up being bad design because of all the silly redundancy here. 2018 01:33:57,470 --> 01:33:59,650 But notice, I'm using single quotes like last week 2019 01:33:59,650 --> 01:34:01,330 because these are single chars. 2020 01:34:01,330 --> 01:34:03,647 What does this look like in the computer's memory? 2021 01:34:03,647 --> 01:34:05,480 Well, it looks a little something like this. 2022 01:34:05,480 --> 01:34:09,730 If we clear out the old memory, c1, c2, c3 probably 2023 01:34:09,730 --> 01:34:12,562 will end up here, maybe not literally in the top left-hand corner. 2024 01:34:12,562 --> 01:34:14,020 This is just an artist's rendition. 2025 01:34:14,020 --> 01:34:18,440 But c1, c2, c3 will probably end up like that. 2026 01:34:18,440 --> 01:34:20,020 Now, what's really there? 2027 01:34:20,020 --> 01:34:21,730 It's really those same three numbers-- 2028 01:34:21,730 --> 01:34:23,350 72, 73, 33. 2029 01:34:23,350 --> 01:34:27,920 But how many bits does a byte have? 2030 01:34:27,920 --> 01:34:28,880 Just eight. 2031 01:34:28,880 --> 01:34:33,830 So if we were to look at the binary representation of these characters, 2032 01:34:33,830 --> 01:34:35,330 it would only be eight bits each. 2033 01:34:35,330 --> 01:34:39,140 That's enough to store small numbers like 72, 73, 33. 2034 01:34:39,140 --> 01:34:41,580 We're not dealing with Unicode and emoji and the like. 2035 01:34:41,580 --> 01:34:42,837 But the point is the same. 2036 01:34:42,837 --> 01:34:45,170 You don't have to use four bytes to store these numbers. 2037 01:34:45,170 --> 01:34:48,087 You can use a different data type like chars, and underneath the hood, 2038 01:34:48,087 --> 01:34:51,420 it's, indeed, going to use just single bytes for each. 2039 01:34:51,420 --> 01:34:55,850 But this is sort of like a-- this isn't really how we implement strings, right? 2040 01:34:55,850 --> 01:34:59,270 When you wanted to say, hi, last week, or this, we used double quotes. 2041 01:34:59,270 --> 01:35:02,400 And we wrote all of the things together and used one variable, not three, 2042 01:35:02,400 --> 01:35:02,900 right? 2043 01:35:02,900 --> 01:35:06,260 When I typed in David, I didn't have a variable for D-A-V-I-D. 2044 01:35:06,260 --> 01:35:09,750 I had one variable called name that stored the whole thing. 2045 01:35:09,750 --> 01:35:13,310 So in C, we keep talking about these things called strings. 2046 01:35:13,310 --> 01:35:17,427 We'll see, eventually, that strings are not necessarily what they seem to be. 2047 01:35:17,427 --> 01:35:19,760 But for now, the key thing about strings is that they're 2048 01:35:19,760 --> 01:35:22,070 variable length, so to speak, right? 2049 01:35:22,070 --> 01:35:25,250 They might be three characters, Hi, or five characters, David, 2050 01:35:25,250 --> 01:35:28,250 or anything smaller or larger. 2051 01:35:28,250 --> 01:35:30,980 So how do we go about implementing strings, 2052 01:35:30,980 --> 01:35:33,110 if all we have at the end of the day is my memory? 2053 01:35:33,110 --> 01:35:36,290 Well, here is an example of just creating, declaring, 2054 01:35:36,290 --> 01:35:39,650 and defining a string called s. s because it's just a simple string, 2055 01:35:39,650 --> 01:35:41,900 and quote unquote, HI!, in double quotes. 2056 01:35:41,900 --> 01:35:44,090 What does this look like in the computer's memory? 2057 01:35:44,090 --> 01:35:45,230 Well, let's clear it again. 2058 01:35:45,230 --> 01:35:48,110 And here, now, because it's technically stored in one variable, 2059 01:35:48,110 --> 01:35:50,960 s, here is how I might draw it as an artist. 2060 01:35:50,960 --> 01:35:52,520 It's three bytes in total-- 2061 01:35:52,520 --> 01:35:53,990 H-I exclamation point. 2062 01:35:53,990 --> 01:35:59,630 But there's no c1, c2, c3, it's just, the whole thing is s. 2063 01:35:59,630 --> 01:36:03,800 But it turns out that a string, fun fact, 2064 01:36:03,800 --> 01:36:06,990 is really just what underneath the hood? 2065 01:36:06,990 --> 01:36:09,610 Kind of leading up to this-- 2066 01:36:09,610 --> 01:36:12,090 what is a string, if this is how it's laid out in memory? 2067 01:36:12,090 --> 01:36:13,190 AUDIENCE: An array. 2068 01:36:13,190 --> 01:36:15,830 DAVID MALAN: Literally, it's just an array of characters. 2069 01:36:15,830 --> 01:36:18,590 And we didn't have to know about arrays last week to use strings. 2070 01:36:18,590 --> 01:36:21,382 This is where, again, the training wheels are starting to come off. 2071 01:36:21,382 --> 01:36:23,730 But a string is just an array of characters. 2072 01:36:23,730 --> 01:36:26,040 H-I exclamation point, for instance. 2073 01:36:26,040 --> 01:36:28,370 So technically, an array-- 2074 01:36:28,370 --> 01:36:33,890 or a string called s is really a variable called s that allows you 2075 01:36:33,890 --> 01:36:38,150 to get at the first character with s[0], if you want-- s[1], s[2]. 2076 01:36:38,150 --> 01:36:40,340 You can literally get individual characters 2077 01:36:40,340 --> 01:36:43,820 just by treating s as though it's an array, which it really 2078 01:36:43,820 --> 01:36:47,000 is underneath the hood, in this case. 2079 01:36:47,000 --> 01:36:48,560 But there's a catch. 2080 01:36:48,560 --> 01:36:51,500 How do you know where strings end? 2081 01:36:51,500 --> 01:36:54,560 In the past, when I drew some integers on the screen, 2082 01:36:54,560 --> 01:36:57,080 I know, I claim they always take up 4 bytes. 2083 01:36:57,080 --> 01:37:00,200 If I had drawn a long, it always takes up 8 bytes. 2084 01:37:00,200 --> 01:37:03,530 If I had drawn a character, it always takes up 1 byte. 2085 01:37:03,530 --> 01:37:06,533 But how many bytes does a string take up? 2086 01:37:06,533 --> 01:37:08,450 Yeah, I mean, that's kind of the right answer. 2087 01:37:08,450 --> 01:37:10,490 In this case, three, it would seem. 2088 01:37:10,490 --> 01:37:13,490 But if it's David, that's a good five characters. 2089 01:37:13,490 --> 01:37:16,173 But where do we put the number three? 2090 01:37:16,173 --> 01:37:17,840 Where do you put the number five, right? 2091 01:37:17,840 --> 01:37:20,190 This is literally all that's inside your computer. 2092 01:37:20,190 --> 01:37:23,430 This is all our building blocks in front of us. 2093 01:37:23,430 --> 01:37:25,490 So how can we-- where does the three go? 2094 01:37:25,490 --> 01:37:26,540 Where does the five go? 2095 01:37:26,540 --> 01:37:29,420 Well, it turns out you can solve this in a couple of different ways. 2096 01:37:29,420 --> 01:37:34,160 But the way humans decided to implement strings years ago is, indeed, an array, 2097 01:37:34,160 --> 01:37:38,960 but they added one extra byte at the end of every such string array, 2098 01:37:38,960 --> 01:37:41,840 just to make clear, with a so-called sentinel value, 2099 01:37:41,840 --> 01:37:44,480 that the string ends here. 2100 01:37:44,480 --> 01:37:45,050 Why? 2101 01:37:45,050 --> 01:37:47,930 So that if you have two strings in the computer's memory like, HI! 2102 01:37:47,930 --> 01:37:52,760 and bye, you know where the barrier is between the exclamation point of one 2103 01:37:52,760 --> 01:37:54,590 and the letter B in the next, right? 2104 01:37:54,590 --> 01:37:56,000 You need some kind of delimiter. 2105 01:37:56,000 --> 01:38:00,110 And so what really is underneath the hood is this. 2106 01:38:00,110 --> 01:38:04,460 When you store a string in memory, when you type in a string-- as the user, 2107 01:38:04,460 --> 01:38:07,040 if you type in 3 characters, it's going to use 2108 01:38:07,040 --> 01:38:10,280 3 plus 1 equals 4 bytes in total. 2109 01:38:10,280 --> 01:38:14,130 If you type in David, it's going to use 5 plus 1 equals 6 bytes in total. 2110 01:38:14,130 --> 01:38:14,630 Why? 2111 01:38:14,630 --> 01:38:20,210 Because C automatically adds this special 0 at the end of the string. 2112 01:38:20,210 --> 01:38:24,710 I've drawn it with backslash 0 because this is how you represent 0 as a char, 2113 01:38:24,710 --> 01:38:25,710 as a character. 2114 01:38:25,710 --> 01:38:28,230 But this is literally just 0, as we'll soon see. 2115 01:38:28,230 --> 01:38:31,100 So any time there's a string in memory, it always takes up 2116 01:38:31,100 --> 01:38:36,197 one more byte than you, yourself, as the programmer or human typed in. 2117 01:38:36,197 --> 01:38:38,780 In fact, if we convert this again, just for discussion's sake, 2118 01:38:38,780 --> 01:38:41,572 to those integers, what's literally stored in the computer's memory 2119 01:38:41,572 --> 01:38:45,170 is going to be 72, 73, 33, and now a 0. 2120 01:38:45,170 --> 01:38:48,240 And the computer, because of C and how it was invented, 2121 01:38:48,240 --> 01:38:51,350 it's just smart enough to know that when you print out a string, 2122 01:38:51,350 --> 01:38:54,530 it prints out every character until it sees a 0, 2123 01:38:54,530 --> 01:38:56,150 and then it just stops printing. 2124 01:38:56,150 --> 01:38:58,470 In particular, printf knows how this works. 2125 01:38:58,470 --> 01:39:02,050 And this is why printf knows when to stop printing. 2126 01:39:02,050 --> 01:39:03,800 Decimal numbers are not that enlightening. 2127 01:39:03,800 --> 01:39:05,940 We'll generally write the characters like this. 2128 01:39:05,940 --> 01:39:09,350 And again, backslash 0 is just special symbology. 2129 01:39:09,350 --> 01:39:13,190 It's what the programmer types to make clear that you're not saying, HI!, 0. 2130 01:39:13,190 --> 01:39:15,980 You're saying HI!, and then it's a special 0. 2131 01:39:15,980 --> 01:39:20,887 Specifically, it is eight 0 bits that indicate 2132 01:39:20,887 --> 01:39:22,220 that it's the end of the string. 2133 01:39:22,220 --> 01:39:26,330 Technically, that backslash zero, if you want to be fancy, it's called null, 2134 01:39:26,330 --> 01:39:27,320 N-U-L-L. 2135 01:39:27,320 --> 01:39:30,320 And it turns out, you've seen this before, though we didn't call it out. 2136 01:39:30,320 --> 01:39:33,230 Here's that same ASCII chart from the past couple of weeks. 2137 01:39:33,230 --> 01:39:39,080 If I highlight this, what is decimal number 0 mapping to? 2138 01:39:39,080 --> 01:39:42,830 NUL, which is just programmer speak for the special null character. 2139 01:39:42,830 --> 01:39:46,550 All 0 bits that means the string ends here. 2140 01:39:46,550 --> 01:39:48,510 This all happens automatically for you. 2141 01:39:48,510 --> 01:39:53,420 You do not need to create these null characters or these zeros. 2142 01:39:53,420 --> 01:40:00,030 Any questions then, on this implementation thus far? 2143 01:40:00,030 --> 01:40:01,820 Any questions here? 2144 01:40:01,820 --> 01:40:02,320 No? 2145 01:40:02,320 --> 01:40:03,195 Well, let me do this. 2146 01:40:03,195 --> 01:40:05,310 Let me go back to VS Code in a second. 2147 01:40:05,310 --> 01:40:07,770 And let's actually corroborate this with some code. 2148 01:40:07,770 --> 01:40:10,830 Let me go ahead and create a small program called hi.c. 2149 01:40:10,830 --> 01:40:12,070 And how about we do this? 2150 01:40:12,070 --> 01:40:14,550 Let me include stdio.h. 2151 01:40:14,550 --> 01:40:18,670 Let me include-- let me type out int main void, as always. 2152 01:40:18,670 --> 01:40:20,910 And now let me do something simple and kind of bad, 2153 01:40:20,910 --> 01:40:24,960 but char c1 equals quote unquote, h, in single quotes. 2154 01:40:24,960 --> 01:40:28,590 Char c2 equals quote unquote, I, in single quotes. 2155 01:40:28,590 --> 01:40:32,830 And lastly, char c3 equals exclamation point, in single quotes. 2156 01:40:32,830 --> 01:40:34,500 And now, let me just print this out. 2157 01:40:34,500 --> 01:40:36,960 I can't use %s because that is not a string. 2158 01:40:36,960 --> 01:40:40,290 That's literally three chars, because that's the design decision I made. 2159 01:40:40,290 --> 01:40:41,430 But I could do this-- 2160 01:40:41,430 --> 01:40:48,600 %c, %c, %c, which we haven't seen before, but %s is string, %i is int, 2161 01:40:48,600 --> 01:40:51,060 %c is, indeed, char. 2162 01:40:51,060 --> 01:40:54,150 So let me put a backslash n at the end for cleanliness, 2163 01:40:54,150 --> 01:40:56,280 and now do, c1, c2, c3. 2164 01:40:56,280 --> 01:41:00,430 So this is like a char-based version of printing string. 2165 01:41:00,430 --> 01:41:01,650 So let me make HI! 2166 01:41:01,650 --> 01:41:05,880 And then let me do ./hi, and it looks like I used printf with %s. 2167 01:41:05,880 --> 01:41:09,750 But I did things very manually by printing out each individual character. 2168 01:41:09,750 --> 01:41:11,700 What's cool now, though, is that once you 2169 01:41:11,700 --> 01:41:15,270 know that characters are just numbers and strings are just characters, 2170 01:41:15,270 --> 01:41:16,560 you can kind of poke around. 2171 01:41:16,560 --> 01:41:21,970 Let me change all three placeholders to %i instead. 2172 01:41:21,970 --> 01:41:23,860 And this is totally fine, too. 2173 01:41:23,860 --> 01:41:26,310 Let me rerun this, make hi. 2174 01:41:26,310 --> 01:41:31,570 Actually, let me make one change, just so we can see this. 2175 01:41:31,570 --> 01:41:37,710 Let me add spaces, just for aesthetics sake, let me do make hi, ./hi, Enter, 2176 01:41:37,710 --> 01:41:40,350 and voila, like now, you can actually see the numbers, 2177 01:41:40,350 --> 01:41:44,085 that I claimed back in week zero, were in fact happening underneath the hood. 2178 01:41:44,085 --> 01:41:45,960 Well, this is not how you would make strings. 2179 01:41:45,960 --> 01:41:49,457 It'd be incredibly tedious to have three variables for three letter words, five 2180 01:41:49,457 --> 01:41:50,790 variables for five letter words. 2181 01:41:50,790 --> 01:41:52,998 We've been using, of course, strings since last week, 2182 01:41:52,998 --> 01:41:54,450 so let's do that instead. 2183 01:41:54,450 --> 01:41:59,370 String s equals quote unquote, double quotes "HI!" 2184 01:41:59,370 --> 01:42:02,520 For this, no, because of these training wheels, 2185 01:42:02,520 --> 01:42:04,560 I need to include the CS50 library. 2186 01:42:04,560 --> 01:42:06,580 But we'll come back to that in the coming weeks. 2187 01:42:06,580 --> 01:42:10,530 But for now, I'm going to go ahead and create a string s called quote unquote, 2188 01:42:10,530 --> 01:42:11,580 "HI!" 2189 01:42:11,580 --> 01:42:14,760 And now I'm going to change this to be my familiar %s, 2190 01:42:14,760 --> 01:42:17,610 and now just print out s itself. 2191 01:42:17,610 --> 01:42:20,430 This, of course, is the same thing as last week, ./hi, 2192 01:42:20,430 --> 01:42:24,750 gives me the exact same thing, but now, we're dealing, of course, with strings. 2193 01:42:24,750 --> 01:42:27,610 But how can we see a little beyond that? 2194 01:42:27,610 --> 01:42:28,810 Well, how about this? 2195 01:42:28,810 --> 01:42:31,530 Let's poke around further with today's primitives. 2196 01:42:31,530 --> 01:42:35,580 Even though s is a string, I could technically print out its first 2197 01:42:35,580 --> 01:42:39,000 character with %c by doing s[0]. 2198 01:42:39,000 --> 01:42:43,110 I could technically print out its second character with %c by doing s[1]. 2199 01:42:43,110 --> 01:42:47,820 I could print out its third character with %c and printing out s[2]. 2200 01:42:47,820 --> 01:42:50,430 So again, this just derives logically from my understanding 2201 01:42:50,430 --> 01:42:52,770 now that strings are arrays, as you note. 2202 01:42:52,770 --> 01:42:54,540 Let me do make-- 2203 01:42:54,540 --> 01:42:57,300 let me do make hi, ./hi. 2204 01:42:57,300 --> 01:43:00,760 And no visual change, but I'm just kind of now tinkering around. 2205 01:43:00,760 --> 01:43:03,400 And in fact, if you're really curious, let me do this. 2206 01:43:03,400 --> 01:43:06,870 Let me change these back to i, back to i-- 2207 01:43:06,870 --> 01:43:08,250 oops, back to i. 2208 01:43:08,250 --> 01:43:11,310 And let me add a fourth one because if I'm really curious now, 2209 01:43:11,310 --> 01:43:14,490 let's see what's in s[3]. 2210 01:43:14,490 --> 01:43:16,020 This is the fourth byte. 2211 01:43:16,020 --> 01:43:18,990 And even though the string itself is H-I, 2212 01:43:18,990 --> 01:43:21,840 I think we can corroborate this whole null thing. 2213 01:43:21,840 --> 01:43:26,248 Make hi, ./hi, Enter, and there it is. 2214 01:43:26,248 --> 01:43:28,290 You could have done this last week, if you really 2215 01:43:28,290 --> 01:43:29,580 wanted to geek out on strings. 2216 01:43:29,580 --> 01:43:33,060 But for now, it's just revealing what's going on underneath the hood. 2217 01:43:33,060 --> 01:43:36,480 Questions then, on what these strings are? 2218 01:43:36,480 --> 01:43:37,498 Yeah? 2219 01:43:37,498 --> 01:43:41,293 AUDIENCE: [INAUDIBLE] 2220 01:43:41,293 --> 01:43:42,960 DAVID MALAN: Why do we need the bracket? 2221 01:43:42,960 --> 01:43:45,430 AUDIENCE: [INAUDIBLE] 2222 01:43:45,430 --> 01:43:47,180 DAVID MALAN: Why do you not need brackets? 2223 01:43:47,180 --> 01:43:47,780 Good question. 2224 01:43:47,780 --> 01:43:51,620 Why do I not need brackets on line 6? 2225 01:43:51,620 --> 01:43:53,300 Because s is a string. 2226 01:43:53,300 --> 01:43:56,930 We'll see in a couple of weeks that s is, essentially, 2227 01:43:56,930 --> 01:44:00,200 implemented underneath the hood, indeed, as an array, 2228 01:44:00,200 --> 01:44:02,240 but that happens automatically for you. 2229 01:44:02,240 --> 01:44:06,800 You can treat s as just a variable name without square brackets. 2230 01:44:06,800 --> 01:44:09,500 You will use square brackets when you have arrays of ints 2231 01:44:09,500 --> 01:44:13,730 or you manually create arrays of chars or doubles or floats or anything else. 2232 01:44:13,730 --> 01:44:14,900 But strings are special. 2233 01:44:14,900 --> 01:44:15,440 Why? 2234 01:44:15,440 --> 01:44:19,190 I mean, every program you write seems to use strings, text in some form. 2235 01:44:19,190 --> 01:44:21,930 We're humans we like text, not just numbers and such. 2236 01:44:21,930 --> 01:44:25,910 So this is just treated a little specially in C and many other languages 2237 01:44:25,910 --> 01:44:28,580 as well. 2238 01:44:28,580 --> 01:44:31,170 Other questions on this here? 2239 01:44:31,170 --> 01:44:31,670 No? 2240 01:44:31,670 --> 01:44:33,530 Let's add then, one other string to the mix. 2241 01:44:33,530 --> 01:44:36,290 So instead of just saying, HI!, why don't we consider a version 2242 01:44:36,290 --> 01:44:38,660 of the program that says both, HI! and BYE!. 2243 01:44:38,660 --> 01:44:41,420 And I claim now that that backslash zero, 2244 01:44:41,420 --> 01:44:44,270 that null character is going to be ever more important now 2245 01:44:44,270 --> 01:44:46,820 if we've got two strings in memory, so that C knows 2246 01:44:46,820 --> 01:44:48,570 how to distinguish one from the other. 2247 01:44:48,570 --> 01:44:51,487 So let me go ahead and just get rid of these two lines for the moment. 2248 01:44:51,487 --> 01:44:55,430 Let me recreate string s equals, quote unquote double quotes, "HI!" 2249 01:44:55,430 --> 01:44:56,780 Let me give myself another one. 2250 01:44:56,780 --> 01:44:59,905 And because I'm just playing around, I'll choose very short variable names. 2251 01:44:59,905 --> 01:45:04,410 String t equals quote unquote, "BYE!" 2252 01:45:04,410 --> 01:45:06,470 And then let me just print them both out. 2253 01:45:06,470 --> 01:45:11,300 Let me go ahead and print out %s, backslash n, comma s, 2254 01:45:11,300 --> 01:45:16,910 and then printf %s backslash n, and then t. 2255 01:45:16,910 --> 01:45:19,970 So very simple demonstration of just these two variables. 2256 01:45:19,970 --> 01:45:26,090 Make hi, ./hi, and of course, it prints out two lines, one after the other. 2257 01:45:26,090 --> 01:45:27,980 What's actually going on underneath the hood? 2258 01:45:27,980 --> 01:45:29,510 Well, let's go back to the computer's memory. 2259 01:45:29,510 --> 01:45:32,160 HI!, I think, is going to be, I claim, pretty much the same. 2260 01:45:32,160 --> 01:45:36,170 So s, I'll claim, is in the top left, followed by the backslash zero. 2261 01:45:36,170 --> 01:45:40,035 And that's important now because BYE! probably is going to end up there. 2262 01:45:40,035 --> 01:45:43,160 And visually, it wraps just by nature of how I've drawn this grid of bytes, 2263 01:45:43,160 --> 01:45:44,330 but it's contiguous. 2264 01:45:44,330 --> 01:45:46,340 B-Y-E-! 2265 01:45:46,340 --> 01:45:51,470 null, A.K.A. backslash zero, this is now helpful to printf 2266 01:45:51,470 --> 01:45:55,550 because now printf knows where one begins and ends 2267 01:45:55,550 --> 01:45:58,580 by way of that special null character. 2268 01:45:58,580 --> 01:46:00,230 But we can poke around now, too. 2269 01:46:00,230 --> 01:46:01,620 What else can I do here? 2270 01:46:01,620 --> 01:46:02,840 How about this? 2271 01:46:02,840 --> 01:46:08,870 How about I go into my code here, back to VS code, and let me go ahead 2272 01:46:08,870 --> 01:46:13,790 and say something like, well, if I've got two of these strings, 2273 01:46:13,790 --> 01:46:15,410 you know, let's put them in an array. 2274 01:46:15,410 --> 01:46:20,520 Let's kind of do this sort of arrays in arrays, sort of inception-style here. 2275 01:46:20,520 --> 01:46:23,060 So string words[2]. 2276 01:46:23,060 --> 01:46:25,100 So give me an array of two strings is what 2277 01:46:25,100 --> 01:46:28,100 I'm saying here in code, even though we've not done it with strings yet. 2278 01:46:28,100 --> 01:46:29,270 We only did it with ints. 2279 01:46:29,270 --> 01:46:30,770 And now let me do this. 2280 01:46:30,770 --> 01:46:35,480 The first word A.K.A. words[0] will equal, as before, HI! 2281 01:46:35,480 --> 01:46:40,940 And now words[1] will equal quote unquote, "BYE!" 2282 01:46:40,940 --> 01:46:43,760 And now I've done the exact same thing, but again, I'm 2283 01:46:43,760 --> 01:46:48,650 just avoiding having s, t, q, r, and all these different variables in my code. 2284 01:46:48,650 --> 01:46:52,790 I just now am treating them as one single array of strings. 2285 01:46:52,790 --> 01:46:54,750 How do I change my code down here? 2286 01:46:54,750 --> 01:46:57,380 Well, if I want to print the first word, I do words[0]. 2287 01:46:57,380 --> 01:46:59,900 And if I want to print the second word, I do words[1]. 2288 01:46:59,900 --> 01:47:02,088 This is not a useful exercise at the moment 2289 01:47:02,088 --> 01:47:04,130 because I'm just making my code more complicated. 2290 01:47:04,130 --> 01:47:06,830 But again, it allows us to poke around and see what's 2291 01:47:06,830 --> 01:47:08,690 going on because there is that HI! 2292 01:47:08,690 --> 01:47:09,530 and BYE!. 2293 01:47:09,530 --> 01:47:10,700 But watch this. 2294 01:47:10,700 --> 01:47:14,670 If I really want to be cool, I can do this. 2295 01:47:14,670 --> 01:47:24,380 Let's print out %c, %c, %c, backslash n, and then here, %c, %c, %c, %c, 2296 01:47:24,380 --> 01:47:25,700 so four of those. 2297 01:47:25,700 --> 01:47:28,430 And now here's where things get interesting. 2298 01:47:28,430 --> 01:47:30,620 Words is an array of strings. 2299 01:47:30,620 --> 01:47:33,400 Again, if I may, what's a string? 2300 01:47:33,400 --> 01:47:35,060 An array of characters. 2301 01:47:35,060 --> 01:47:36,790 So just use the same logic. 2302 01:47:36,790 --> 01:47:41,110 If words is an array of strings, you get at the first string with words[0]. 2303 01:47:41,110 --> 01:47:44,530 How do you get at the first character in the first string? 2304 01:47:44,530 --> 01:47:52,150 Bracket 0, words[0][1], and lastly, words[0][2]. 2305 01:47:52,150 --> 01:47:57,460 And now down here, words[1], but the first character is there. 2306 01:47:57,460 --> 01:48:00,400 Word[1], the second character is here. 2307 01:48:00,400 --> 01:48:03,190 Words[1], the third character is here-- 2308 01:48:03,190 --> 01:48:04,720 whoops-- third character's here. 2309 01:48:04,720 --> 01:48:07,898 And words[1], the fourth character is here. 2310 01:48:07,898 --> 01:48:09,190 This is not how people program. 2311 01:48:09,190 --> 01:48:10,840 This is only for demonstrations sake. 2312 01:48:10,840 --> 01:48:13,060 My God, it's so tedious and verbose already. 2313 01:48:13,060 --> 01:48:20,410 But if I make hi now, ./hi, now, I'm manually reinventing %s, 2314 01:48:20,410 --> 01:48:22,990 if I forgot it existed, using %c alone. 2315 01:48:22,990 --> 01:48:25,900 But you can indeed manipulate arrays in this way. 2316 01:48:25,900 --> 01:48:28,300 But because strings are arrays of characters, 2317 01:48:28,300 --> 01:48:32,200 you can manipulate strings in this way too. 2318 01:48:32,200 --> 01:48:34,675 Any question now on this syntax? 2319 01:48:34,675 --> 01:48:37,210 2320 01:48:37,210 --> 01:48:38,800 Any questions here? 2321 01:48:38,800 --> 01:48:39,460 No? 2322 01:48:39,460 --> 01:48:39,970 No? 2323 01:48:39,970 --> 01:48:42,070 All right, well, let's go ahead and propose 2324 01:48:42,070 --> 01:48:45,830 that we solve a couple of other problems we might not have as before. 2325 01:48:45,830 --> 01:48:49,150 But first, a quick visual of what's been going on underneath the hood here. 2326 01:48:49,150 --> 01:48:52,420 If here, again, is where we left off on the screen, HI! and BYE! 2327 01:48:52,420 --> 01:48:56,470 back to back, here is really how I just treated these things. 2328 01:48:56,470 --> 01:49:00,880 s bracket 0, 1, 2, 3 and then t 0, 1, 2, 3, 4. 2329 01:49:00,880 --> 01:49:04,840 But really, once I put them in an array, the picture becomes this. 2330 01:49:04,840 --> 01:49:07,030 Words[0] is the whole HI!. 2331 01:49:07,030 --> 01:49:08,680 Words[1] is the whole BYE!. 2332 01:49:08,680 --> 01:49:11,470 But if I really get into the weeds and start indexing 2333 01:49:11,470 --> 01:49:14,980 into individual characters in those strings, all I'm using 2334 01:49:14,980 --> 01:49:20,710 is new syntax in order to represent these same values here. 2335 01:49:20,710 --> 01:49:28,710 Questions then, on these representations before we forge ahead? 2336 01:49:28,710 --> 01:49:29,430 No? 2337 01:49:29,430 --> 01:49:30,030 Yeah? 2338 01:49:30,030 --> 01:49:33,390 AUDIENCE: Does the new line character not [INAUDIBLE]?? 2339 01:49:33,390 --> 01:49:36,030 DAVID MALAN: Does the new line character-- say that once more? 2340 01:49:36,030 --> 01:49:38,597 AUDIENCE: Does the new line character take up any space? 2341 01:49:38,597 --> 01:49:40,180 DAVID MALAN: Ah, really good question. 2342 01:49:40,180 --> 01:49:42,730 Does the new line character take up any space? 2343 01:49:42,730 --> 01:49:45,340 It does, so far as printf is concerned. 2344 01:49:45,340 --> 01:49:48,790 But I'm not storing the backslash n in my strings, 2345 01:49:48,790 --> 01:49:53,460 printf is being manually handed that thing instead. 2346 01:49:53,460 --> 01:49:55,520 All right, so let's go ahead then and consider 2347 01:49:55,520 --> 01:49:58,970 how we might solve some problems that have arisen now with these strings, 2348 01:49:58,970 --> 01:50:00,680 as follows here. 2349 01:50:00,680 --> 01:50:02,760 Suppose I-- let's do this. 2350 01:50:02,760 --> 01:50:04,400 Let me go back to VS Code here. 2351 01:50:04,400 --> 01:50:09,980 And let me go ahead and open up a new file called, how about, length.c. 2352 01:50:09,980 --> 01:50:12,680 And let's consider for a moment how I might actually figure out 2353 01:50:12,680 --> 01:50:16,130 what the length of a string is, which is distinct from the length of an array. 2354 01:50:16,130 --> 01:50:19,680 I claimed earlier, you cannot figure out dynamically what the length of an array 2355 01:50:19,680 --> 01:50:20,180 is. 2356 01:50:20,180 --> 01:50:24,020 But I can figure out the length of a string, specifically, because 2357 01:50:24,020 --> 01:50:26,960 of this implementation detail of that null character. 2358 01:50:26,960 --> 01:50:28,500 So let me go ahead and do this. 2359 01:50:28,500 --> 01:50:31,940 Let me include cs50.h in this second program here. 2360 01:50:31,940 --> 01:50:35,090 Let me include stdio.h, as before. 2361 01:50:35,090 --> 01:50:38,120 And let me do this, int main void-- 2362 01:50:38,120 --> 01:50:40,970 and the first thing I'll do is just get a string from the user. 2363 01:50:40,970 --> 01:50:43,250 I'll ask the user, as always, for their name. 2364 01:50:43,250 --> 01:50:48,170 So I'll call getString, and say, what's your name, question mark, as always. 2365 01:50:48,170 --> 01:50:51,620 And then down here, if I want to figure out the length of this string 2366 01:50:51,620 --> 01:50:56,210 and print the length out on the screen, well, I 2367 01:50:56,210 --> 01:50:58,465 can kind of do this similar in spirit to the average, 2368 01:50:58,465 --> 01:50:59,840 where I'm accumulating something. 2369 01:50:59,840 --> 01:51:02,600 Let me go ahead and initialize N to 0. 2370 01:51:02,600 --> 01:51:05,120 Let me give myself-- 2371 01:51:05,120 --> 01:51:07,035 it's not a for loop because I don't have a-- 2372 01:51:07,035 --> 01:51:08,660 I don't know in advance how long it is. 2373 01:51:08,660 --> 01:51:09,980 But what if I do this? 2374 01:51:09,980 --> 01:51:20,600 While the value at name[n] does not equal '/0'-- 2375 01:51:20,600 --> 01:51:23,390 crazy syntax at the moment, but it's just the culmination 2376 01:51:23,390 --> 01:51:25,590 of these various building blocks. 2377 01:51:25,590 --> 01:51:28,970 Let me just finish the thought here, n++. 2378 01:51:28,970 --> 01:51:33,656 And then down here, let's just print out, with printf and %i, 2379 01:51:33,656 --> 01:51:38,930 that value of N. So I claim this is going to show me the length of any 2380 01:51:38,930 --> 01:51:43,220 string I type in, whether it's hi or bye or David or anything else. 2381 01:51:43,220 --> 01:51:45,410 I initialize a variable to zero, and that's good 2382 01:51:45,410 --> 01:51:47,535 because that's where you start counting in general. 2383 01:51:47,535 --> 01:51:50,990 While name[0] does not equal backslash zero. 2384 01:51:50,990 --> 01:51:51,930 What is this saying? 2385 01:51:51,930 --> 01:51:55,580 Well, if name is the string the user typed in-- and name is just an array, 2386 01:51:55,580 --> 01:51:56,460 as you noted-- 2387 01:51:56,460 --> 01:51:59,390 the name[0] is going to be the first character. 2388 01:51:59,390 --> 01:52:02,750 And I'm asking the question, well, does the first character not equal 2389 01:52:02,750 --> 01:52:03,680 backslash zero? 2390 01:52:03,680 --> 01:52:08,750 And if I type in David, D, it's not, so I keep going and I add 1 to N. 2391 01:52:08,750 --> 01:52:10,750 Then I'm going to check name[1]. 2392 01:52:10,750 --> 01:52:13,895 Well, if I typed in David, name[1] is going to be A. 2393 01:52:13,895 --> 01:52:18,020 A does not equal backslash zero, and so it's going to go again and again 2394 01:52:18,020 --> 01:52:18,740 and again. 2395 01:52:18,740 --> 01:52:23,090 But five steps in total later, it's going to get to the byte after 2396 01:52:23,090 --> 01:52:26,480 D-A-V-I-D, realize, wait a minute, that is a backslash n. 2397 01:52:26,480 --> 01:52:29,750 The loop finishes, and I print out the total length. 2398 01:52:29,750 --> 01:52:33,050 Arrays, in general, do not have this null character. 2399 01:52:33,050 --> 01:52:34,910 However, strings do. 2400 01:52:34,910 --> 01:52:38,150 Again, strings are special versus all of the other data types 2401 01:52:38,150 --> 01:52:39,590 we've talked about thus far. 2402 01:52:39,590 --> 01:52:43,220 But how could I, for instance, do this differently? 2403 01:52:43,220 --> 01:52:47,220 Well, let's actually factor this out as a function, as I've commonly done. 2404 01:52:47,220 --> 01:52:50,540 But rather than implement it myself, you know what? 2405 01:52:50,540 --> 01:52:54,140 It turns out what's nice about strings being so common, 2406 01:52:54,140 --> 01:52:57,260 there are many other people who have solved these problems before. 2407 01:52:57,260 --> 01:53:00,290 And in fact, there's a whole string library in C. 2408 01:53:00,290 --> 01:53:04,190 It is used by way of a header file called string.h. 2409 01:53:04,190 --> 01:53:08,400 And what string.h is, is a library of string-related functions. 2410 01:53:08,400 --> 01:53:10,760 In fact, you can see in CS50's manual pages 2411 01:53:10,760 --> 01:53:16,217 for C, the string.h functions, at least those that we recommend as most useful, 2412 01:53:16,217 --> 01:53:18,050 and in particular, if you poke around there, 2413 01:53:18,050 --> 01:53:20,290 you'll see that there's a function called strlen. 2414 01:53:20,290 --> 01:53:22,055 It means string length. 2415 01:53:22,055 --> 01:53:24,680 It was named very succinctly, just because it's a little easier 2416 01:53:24,680 --> 01:53:25,850 to type than string length. 2417 01:53:25,850 --> 01:53:28,800 But strlen tells you the length of a string. 2418 01:53:28,800 --> 01:53:30,990 So how might I use this in my code here? 2419 01:53:30,990 --> 01:53:34,020 Well, it turns out, I can simplify this quite a bit. 2420 01:53:34,020 --> 01:53:37,700 Let me get rid of my loop, get rid of my accounting 2421 01:53:37,700 --> 01:53:40,880 manually, and do something like this-- int n 2422 01:53:40,880 --> 01:53:45,630 equals strlen of the humans name, name. 2423 01:53:45,630 --> 01:53:49,430 And now I'll just use printf, as before, with %i backslash n, 2424 01:53:49,430 --> 01:53:51,290 and output the value of n. 2425 01:53:51,290 --> 01:53:54,380 But there's a bug at the moment. 2426 01:53:54,380 --> 01:53:58,480 What have I forgotten to do? 2427 01:53:58,480 --> 01:54:01,670 Yeah, I have to include the header file at the top of the screen, 2428 01:54:01,670 --> 01:54:03,260 so let me-- at the top of the code. 2429 01:54:03,260 --> 01:54:07,640 So let me also include string.h at the top of my file, 2430 01:54:07,640 --> 01:54:10,970 so that C knows that, in fact, strlen exists. 2431 01:54:10,970 --> 01:54:14,170 Let me go ahead and make length, as before. 2432 01:54:14,170 --> 01:54:18,670 ./length-- or actually, really for the first time, what's your name? 2433 01:54:18,670 --> 01:54:22,360 D-A-V-I-D. And hopefully, I'm going to see, in fact, 5. 2434 01:54:22,360 --> 01:54:26,950 By contrast, if I run it again and type in HI!, now I see three. 2435 01:54:26,950 --> 01:54:29,785 So strlen is just one of the functions in that library. 2436 01:54:29,785 --> 01:54:30,910 And there are so many more. 2437 01:54:30,910 --> 01:54:33,700 In fact, yet another library that might be useful moving forward 2438 01:54:33,700 --> 01:54:37,570 is this one, ctype, which relates to C data 2439 01:54:37,570 --> 01:54:40,580 types and lots of functions therein that can be useful. 2440 01:54:40,580 --> 01:54:43,690 For instance, if you review its documentation in the manual pages 2441 01:54:43,690 --> 01:54:46,930 online, you'll see that there are functions via which 2442 01:54:46,930 --> 01:54:49,460 we can solve problems like this. 2443 01:54:49,460 --> 01:54:52,480 Let me go ahead and propose here-- 2444 01:54:52,480 --> 01:54:53,680 let me see. 2445 01:54:53,680 --> 01:54:59,080 Let's do an example here involving-- 2446 01:54:59,080 --> 01:55:03,250 how about checking if something is uppercase or lowercase, 2447 01:55:03,250 --> 01:55:06,700 and converting it to uppercase only. 2448 01:55:06,700 --> 01:55:10,810 Let me go back to VS Code, and code a program called uppercase.c. 2449 01:55:10,810 --> 01:55:15,220 In this, file I'm going to start by including now, as always, cs50.h. 2450 01:55:15,220 --> 01:55:17,710 I'm going to include stdio.h. 2451 01:55:17,710 --> 01:55:21,670 And I'm going to add one other to the mix, which 2452 01:55:21,670 --> 01:55:26,230 is string.h now too, so I can access the length of things as needed. 2453 01:55:26,230 --> 01:55:28,570 Int main void comes next. 2454 01:55:28,570 --> 01:55:30,460 And then within my main function, I'm going 2455 01:55:30,460 --> 01:55:32,230 to go ahead and declare a string called s. 2456 01:55:32,230 --> 01:55:34,240 I'm going to call getString, as before. 2457 01:55:34,240 --> 01:55:38,170 And I'm going to go ahead and just ask the user for a string called before. 2458 01:55:38,170 --> 01:55:39,670 I want to do a before and after. 2459 01:55:39,670 --> 01:55:41,350 Whatever the user types in is before. 2460 01:55:41,350 --> 01:55:44,770 But I want to force everything to uppercase, thereafter. 2461 01:55:44,770 --> 01:55:48,740 Let me now, in this loop here, do this. 2462 01:55:48,740 --> 01:55:53,800 Let me printf quote unquote, "After," just so we can see this on the screen. 2463 01:55:53,800 --> 01:56:02,440 And let me do four int i gets 0, i is less than strlen of s, i++. 2464 01:56:02,440 --> 01:56:03,610 What am I about to do? 2465 01:56:03,610 --> 01:56:06,190 I'm about to iterate over every character in the string 2466 01:56:06,190 --> 01:56:11,230 from left to right, from 0 on up to, but not through, the length of s. 2467 01:56:11,230 --> 01:56:13,990 And how do I check if something is lowercase, 2468 01:56:13,990 --> 01:56:16,990 so that I can actually force it to uppercase? 2469 01:56:16,990 --> 01:56:19,630 Well, it turns out, I could do this literally. 2470 01:56:19,630 --> 01:56:27,436 If the character in s at location i is greater than or equal to capital A, 2471 01:56:27,436 --> 01:56:31,780 ampersand, ampersand, which means and instead of or, which we saw 2472 01:56:31,780 --> 01:56:37,930 in the past, s[i] is less than or equal to little z, that means, 2473 01:56:37,930 --> 01:56:41,800 logically in English, that this is indeed lowercase. 2474 01:56:41,800 --> 01:56:44,830 How do I now convert it to uppercase, this character? 2475 01:56:44,830 --> 01:56:48,160 Well, I could just literally print out the same character. 2476 01:56:48,160 --> 01:56:52,280 But that would not be the answer here because that's not changing the value. 2477 01:56:52,280 --> 01:56:54,470 But what could I do instead? 2478 01:56:54,470 --> 01:56:59,890 Well, let me actually pull up here real fast the ASCII chart as before, 2479 01:56:59,890 --> 01:57:03,220 and let's see if we can't glean some insight. 2480 01:57:03,220 --> 01:57:05,710 If I pull up the same ASCII chart, and suppose 2481 01:57:05,710 --> 01:57:09,790 the human has typed in a lowercase a, that's 97. 2482 01:57:09,790 --> 01:57:13,240 What letter-- I want to convert it to uppercase 2483 01:57:13,240 --> 01:57:18,660 A, so what number do I want to convert the 97 to, per week zero? 2484 01:57:18,660 --> 01:57:21,000 So 65, we keep coming back to that one. 2485 01:57:21,000 --> 01:57:23,010 What if the user types in lowercase b? 2486 01:57:23,010 --> 01:57:27,550 I want to change the 98 value to 66, and so forth. 2487 01:57:27,550 --> 01:57:30,130 And any quick math, how far apart are those? 2488 01:57:30,130 --> 01:57:33,120 So it's always 32, like uppercase to lowercase 2489 01:57:33,120 --> 01:57:37,990 is always, wonderfully, good design, 32 away, one from the other. 2490 01:57:37,990 --> 01:57:39,100 So what does this mean? 2491 01:57:39,100 --> 01:57:41,350 Well, I think we saw earlier that underneath the hood, 2492 01:57:41,350 --> 01:57:42,600 a char is just a number. 2493 01:57:42,600 --> 01:57:44,340 You can certainly do arithmetic on it. 2494 01:57:44,340 --> 01:57:46,507 And here, again, if you understand these lower level 2495 01:57:46,507 --> 01:57:48,180 primitives, what if I do this? 2496 01:57:48,180 --> 01:57:53,940 Whatever s[i] is, if I know on line 13 that it's lowercase, 2497 01:57:53,940 --> 01:57:57,048 do I want to add or subtract 32? 2498 01:57:57,048 --> 01:57:57,840 AUDIENCE: Subtract. 2499 01:57:57,840 --> 01:58:01,910 DAVID MALAN: So I want to subtract because I want to go from like 97 to 65 2500 01:58:01,910 --> 01:58:06,560 or 98 to 66, so indeed, if you do some quick math, that gives you 32. 2501 01:58:06,560 --> 01:58:10,970 So it's suffices to just treat chars as numbers, subtract the 32, 2502 01:58:10,970 --> 01:58:16,370 and printing it with %c, I think, will just convert lowercase to uppercase. 2503 01:58:16,370 --> 01:58:19,795 If you now fast forward to the real world, Microsoft Word or Google Docs, 2504 01:58:19,795 --> 01:58:22,670 if you've ever chosen the menu option that forces things to uppercase 2505 01:58:22,670 --> 01:58:24,980 or lowercase on occasion, literally, that's 2506 01:58:24,980 --> 01:58:26,480 what Microsoft and Google have done. 2507 01:58:26,480 --> 01:58:29,605 They iterate over every character in the document, check if it's lowercase, 2508 01:58:29,605 --> 01:58:33,810 and if so, they subtract 32 from it and show you the new value. 2509 01:58:33,810 --> 01:58:36,650 What if, though, it is not a lowercase letter? 2510 01:58:36,650 --> 01:58:40,520 I think I can keep it easy and just print out the current letter unchanged, 2511 01:58:40,520 --> 01:58:44,850 if my goal is to simply force things to all uppercase, and that letter, 2512 01:58:44,850 --> 01:58:46,490 then would be s[i]. 2513 01:58:46,490 --> 01:58:50,750 So let me go ahead now and make uppercase, hopefully, no errors. 2514 01:58:50,750 --> 01:58:55,670 ./uppercase, and I'll now type in David with an uppercase D, 2515 01:58:55,670 --> 01:58:57,120 but lowercase everything else. 2516 01:58:57,120 --> 01:59:00,020 But now the after version is DAVID-- 2517 01:59:00,020 --> 01:59:01,190 an aesthetic bug. 2518 01:59:01,190 --> 01:59:04,400 Notice here, I forgot to include, just for prettiness sake, 2519 01:59:04,400 --> 01:59:05,930 a backslash n at the end. 2520 01:59:05,930 --> 01:59:07,640 No problem, I'll add that. 2521 01:59:07,640 --> 01:59:08,870 Let me fix my mistake. 2522 01:59:08,870 --> 01:59:12,050 Make uppercase, ./uppercase, Enter. 2523 01:59:12,050 --> 01:59:14,240 D-A-V-I-D, Enter, and voila. 2524 01:59:14,240 --> 01:59:16,820 And I deliberately added another space after, 2525 01:59:16,820 --> 01:59:19,130 just so they would line up pretty, even though before 2526 01:59:19,130 --> 01:59:22,070 and after have different numbers of letters. 2527 01:59:22,070 --> 01:59:25,630 Questions then, on this implementation of forcing something 2528 01:59:25,630 --> 01:59:28,380 to uppercase, which in and of itself is not all that enlightening, 2529 01:59:28,380 --> 01:59:33,990 but is representative now of how you can leverage these low level primitives. 2530 01:59:33,990 --> 01:59:35,880 Question? 2531 01:59:35,880 --> 01:59:36,380 No? 2532 01:59:36,380 --> 01:59:38,633 All right, well, this honestly is tedious. 2533 01:59:38,633 --> 01:59:40,550 My God, like does Microsoft, Google, everyone, 2534 01:59:40,550 --> 01:59:43,550 you have to literally write out this code just to do something simple? 2535 01:59:43,550 --> 01:59:46,310 Well, no, that's, again, why we have things like libraries. 2536 01:59:46,310 --> 01:59:49,220 And increasingly now, for problem sets, projects, and beyond, 2537 01:59:49,220 --> 01:59:52,040 well, you just use libraries more often off-the-shelf 2538 01:59:52,040 --> 01:59:55,940 so as to solve problems that, surely, other people have had before you. 2539 01:59:55,940 --> 01:59:59,570 So how can I now use this library, ctype.h? 2540 01:59:59,570 --> 02:00:01,320 Well, let me go back into my code. 2541 02:00:01,320 --> 02:00:05,090 Let me include this among my header files here. 2542 02:00:05,090 --> 02:00:08,030 Just so I can skim things easily, I tend to alphabetize my headers. 2543 02:00:08,030 --> 02:00:11,238 But that's not strictly necessary, but it allows me, at a glance, to realize, 2544 02:00:11,238 --> 02:00:13,400 did I or did I not include something I need? 2545 02:00:13,400 --> 02:00:15,570 Now, let me go ahead and do this. 2546 02:00:15,570 --> 02:00:20,390 It turns out if you read the documentation for the C type library, 2547 02:00:20,390 --> 02:00:24,710 there's a function, wonderfully called, if islower, 2548 02:00:24,710 --> 02:00:28,910 that takes in a character as its argument, essentially, so s[i]. 2549 02:00:28,910 --> 02:00:32,182 And if that returns true, a Boolean value, if you will, 2550 02:00:32,182 --> 02:00:33,890 well, I'm going to force it to lowercase. 2551 02:00:33,890 --> 02:00:36,560 But I don't have to do this math anymore. 2552 02:00:36,560 --> 02:00:40,610 Turns out, in the C type library, there's also a function called to upper 2553 02:00:40,610 --> 02:00:43,130 that takes a character as input, like s[i], 2554 02:00:43,130 --> 02:00:45,060 and it just does the math for you. 2555 02:00:45,060 --> 02:00:47,270 So that you can abstract away the 32 thing, 2556 02:00:47,270 --> 02:00:50,400 and just know that someone else has solved that problem for you. 2557 02:00:50,400 --> 02:00:53,030 Otherwise, I can leave my code unchanged down below 2558 02:00:53,030 --> 02:00:55,200 because I'm not changing anything else. 2559 02:00:55,200 --> 02:01:00,410 So if I do make uppercase now, and then ./uppercase, D-a-v-i-d, 2560 02:01:00,410 --> 02:01:03,710 with just a capital D, and now it still works. 2561 02:01:03,710 --> 02:01:06,890 But if you read the documentation further, it turns out that to upper 2562 02:01:06,890 --> 02:01:07,520 is smart. 2563 02:01:07,520 --> 02:01:10,220 If you pass in a character to to upper, that's lowercase, 2564 02:01:10,220 --> 02:01:13,040 it obviously converts it to uppercase by doing that math. 2565 02:01:13,040 --> 02:01:17,240 But if you pass in a character to to upper that's already uppercase, 2566 02:01:17,240 --> 02:01:21,540 the documentation you would see tells you that it leaves it unchanged. 2567 02:01:21,540 --> 02:01:23,910 So I can tighten all of this up. 2568 02:01:23,910 --> 02:01:25,880 I can get rid of the whole else. 2569 02:01:25,880 --> 02:01:29,150 I can get rid of the whole if, and arguably now, 2570 02:01:29,150 --> 02:01:33,620 implement a program that's just as correct, but better designed. 2571 02:01:33,620 --> 02:01:34,250 Why? 2572 02:01:34,250 --> 02:01:38,000 Fewer lines of code easier to read, lower probability of mistakes, 2573 02:01:38,000 --> 02:01:39,740 assuming the library is correct. 2574 02:01:39,740 --> 02:01:43,160 It just makes it easier and faster for me, now, to write code. 2575 02:01:43,160 --> 02:01:47,960 So if I now do, one last time, make uppercase, Enter, ./uppercase, 2576 02:01:47,960 --> 02:01:50,190 and type in my name, still working. 2577 02:01:50,190 --> 02:01:53,810 But now notice, we've whittled this down to far fewer lines of code, 2578 02:01:53,810 --> 02:01:57,740 albeit, using now this additional library. 2579 02:01:57,740 --> 02:02:00,140 Questions then on how we did this? 2580 02:02:00,140 --> 02:02:03,930 2581 02:02:03,930 --> 02:02:06,230 Well, even though this code, I daresay, is correct, 2582 02:02:06,230 --> 02:02:09,120 it's not necessarily well-designed just yet. 2583 02:02:09,120 --> 02:02:12,590 In fact, there's one line of code, one function 2584 02:02:12,590 --> 02:02:14,690 call in this current implementation that's 2585 02:02:14,690 --> 02:02:17,900 more inefficient than it needs to be. 2586 02:02:17,900 --> 02:02:20,630 And allow me to draw your attention to this here, 2587 02:02:20,630 --> 02:02:24,320 line 10, wherein we're calling strlen. 2588 02:02:24,320 --> 02:02:27,350 But we're calling it inside of this for loop, specifically, 2589 02:02:27,350 --> 02:02:29,000 inside of the condition. 2590 02:02:29,000 --> 02:02:33,720 And why might that not necessarily be the best idea? 2591 02:02:33,720 --> 02:02:36,810 Well, is the length of the string as changing, ever? 2592 02:02:36,810 --> 02:02:38,950 I mean, certainly not within the span of this loop. 2593 02:02:38,950 --> 02:02:42,840 And so here we are within our for loop on line 10, 11, 12, and 13, 2594 02:02:42,840 --> 02:02:45,242 asking on every iteration that same question. 2595 02:02:45,242 --> 02:02:46,200 What's the length of s? 2596 02:02:46,200 --> 02:02:47,190 What's the length of s? 2597 02:02:47,190 --> 02:02:48,330 What's the length of s? 2598 02:02:48,330 --> 02:02:50,702 And in turn, we're calling strlen every time, 2599 02:02:50,702 --> 02:02:52,660 even though we're getting back the same answer. 2600 02:02:52,660 --> 02:02:54,960 So I daresay a better solution here would 2601 02:02:54,960 --> 02:02:58,230 be to maybe figure out the length of s earlier on in my code, 2602 02:02:58,230 --> 02:02:59,490 and maybe declare a variable. 2603 02:02:59,490 --> 02:03:02,580 Or perhaps do something that's syntactically a little more elegant, 2604 02:03:02,580 --> 02:03:05,070 and in fact, a very common design in a loop like this, 2605 02:03:05,070 --> 02:03:07,860 would be to declare not just one variable like i, 2606 02:03:07,860 --> 02:03:12,060 but to actually declare a second variable called n, for instance, where 2607 02:03:12,060 --> 02:03:16,530 n is just some number, set n equal to the length of s. 2608 02:03:16,530 --> 02:03:18,900 But thereafter, inside of this condition, 2609 02:03:18,900 --> 02:03:24,540 instead of calling strlen of s again and again and again, what might I now do? 2610 02:03:24,540 --> 02:03:28,110 I could instead just compare i against n itself, 2611 02:03:28,110 --> 02:03:31,080 because n now will only be calculated once when it's initialized, 2612 02:03:31,080 --> 02:03:32,730 just as i is initialize to zero. 2613 02:03:32,730 --> 02:03:36,000 And thereafter, we're going to be comparing i, which is changing, 2614 02:03:36,000 --> 02:03:37,350 against n, which will not be. 2615 02:03:37,350 --> 02:03:40,330 So it's going to be marginally more efficient by design. 2616 02:03:40,330 --> 02:03:42,900 Now with that said, a good compiler could also 2617 02:03:42,900 --> 02:03:46,080 recognize that there is this optimization possibility, 2618 02:03:46,080 --> 02:03:47,100 and maybe do it for us. 2619 02:03:47,100 --> 02:03:49,080 But for now, best to get into the habit, best 2620 02:03:49,080 --> 02:03:52,260 to develop the muscle memory for making those better design decisions 2621 02:03:52,260 --> 02:03:54,010 yourselves. 2622 02:03:54,010 --> 02:03:56,380 Questions, then, on how we did this? 2623 02:03:56,380 --> 02:03:58,900 2624 02:03:58,900 --> 02:03:59,650 No? 2625 02:03:59,650 --> 02:04:03,050 All right, a few final building blocks for the day. 2626 02:04:03,050 --> 02:04:07,870 So we started by talking about those command line arguments that clang uses, 2627 02:04:07,870 --> 02:04:13,090 whereby, anything after the command that you type at a prompt, be it make 2628 02:04:13,090 --> 02:04:18,160 or clang or even CD in Linux, any word thereafter, or something 2629 02:04:18,160 --> 02:04:21,350 cryptic like -o is a command line argument. 2630 02:04:21,350 --> 02:04:22,840 It's an input to the command. 2631 02:04:22,840 --> 02:04:26,132 It's different from a function argument because a function argument, of course, 2632 02:04:26,132 --> 02:04:27,280 is an input to a function. 2633 02:04:27,280 --> 02:04:28,345 But it's the same idea. 2634 02:04:28,345 --> 02:04:30,970 It's just different syntax after the dollar sign at the prompt. 2635 02:04:30,970 --> 02:04:33,880 Well, it turns out that command line arguments 2636 02:04:33,880 --> 02:04:37,660 are something you can now use in your own programs 2637 02:04:37,660 --> 02:04:41,800 by accessing words after the prompt. 2638 02:04:41,800 --> 02:04:45,410 And let me propose that we invent this as follows. 2639 02:04:45,410 --> 02:04:49,540 Let me propose that we switch back to VS Code here, 2640 02:04:49,540 --> 02:04:53,560 and I'll open a new file here called greet.c. 2641 02:04:53,560 --> 02:04:56,410 So in greet.c, it's going to be a program that very simply greets 2642 02:04:56,410 --> 02:04:57,070 the user. 2643 02:04:57,070 --> 02:04:59,440 Had we written this last week, we would have done this. 2644 02:04:59,440 --> 02:05:08,200 Include cs50.h, and then include stdio.h, and then int main void, 2645 02:05:08,200 --> 02:05:13,060 and then we might do something simple like string name equals getString, 2646 02:05:13,060 --> 02:05:15,980 quote unquote, "What's your name?" 2647 02:05:15,980 --> 02:05:20,020 And then we would have printed out, as always, Hello, %s, 2648 02:05:20,020 --> 02:05:21,490 and then plugging in that name. 2649 02:05:21,490 --> 02:05:25,300 So this is the same program we've implemented many times, just 2650 02:05:25,300 --> 02:05:26,590 to make sure it works-- 2651 02:05:26,590 --> 02:05:29,140 although, nope, that's not quite the same program. 2652 02:05:29,140 --> 02:05:30,940 Semicolon's in the wrong place. 2653 02:05:30,940 --> 02:05:32,960 This now is the same program. 2654 02:05:32,960 --> 02:05:37,610 So make greet, dot ./greet, and I'll type in my own name. hello, David. 2655 02:05:37,610 --> 02:05:38,770 So we're back there. 2656 02:05:38,770 --> 02:05:41,770 Now, what's arguably a little annoying about this program, 2657 02:05:41,770 --> 02:05:44,110 if I type in something else like, Carter, 2658 02:05:44,110 --> 02:05:48,130 Enter, I have to run the program, wait for the prompt, type in my name, 2659 02:05:48,130 --> 02:05:48,910 hit Enter. 2660 02:05:48,910 --> 02:05:52,360 And that's fine, but imagine if every program worked like this. 2661 02:05:52,360 --> 02:05:55,415 Like make, suppose you could only type make, then you wait for a prompt, 2662 02:05:55,415 --> 02:05:58,540 then you type the name of the program you want to make, then you hit Enter. 2663 02:05:58,540 --> 02:06:01,720 Or worse, in Linux when you have to change directories, 2664 02:06:01,720 --> 02:06:05,263 as you might have for problem set one, what if you had to type CD, Enter, 2665 02:06:05,263 --> 02:06:07,930 now type the name of the folder you want to change into, Enter-- 2666 02:06:07,930 --> 02:06:09,710 I mean, it just slows life down. 2667 02:06:09,710 --> 02:06:11,470 And so it just gets annoying quickly. 2668 02:06:11,470 --> 02:06:16,070 So command line arguments just let you express your whole thought all at once. 2669 02:06:16,070 --> 02:06:18,200 So how can I do this? 2670 02:06:18,200 --> 02:06:22,450 Well, if I want to express the notion of command line arguments in my code, 2671 02:06:22,450 --> 02:06:25,640 I could do something like this. 2672 02:06:25,640 --> 02:06:28,750 I could, for the very first time, go up and get 2673 02:06:28,750 --> 02:06:33,730 rid of this void, which as of today means, this program takes no command 2674 02:06:33,730 --> 02:06:34,780 line arguments. 2675 02:06:34,780 --> 02:06:37,540 And I can change it to exactly this. 2676 02:06:37,540 --> 02:06:43,490 Int argc, string argv, with brackets. 2677 02:06:43,490 --> 02:06:44,950 Now it's cryptic, admittedly. 2678 02:06:44,950 --> 02:06:46,150 And let me zoom in. 2679 02:06:46,150 --> 02:06:49,300 But I think we can perhaps infer now, what's going on. 2680 02:06:49,300 --> 02:06:52,750 If main now does not have void as its input, which 2681 02:06:52,750 --> 02:06:55,600 means it takes no arguments, surely, the spoiler 2682 02:06:55,600 --> 02:06:59,230 here is that now main will take command line arguments somehow. 2683 02:06:59,230 --> 02:07:05,180 Any guesses as to what argv is or will be? 2684 02:07:05,180 --> 02:07:08,330 What might this represent? 2685 02:07:08,330 --> 02:07:11,390 It's an array of strings, right, by way of the syntax. 2686 02:07:11,390 --> 02:07:13,223 Yeah? 2687 02:07:13,223 --> 02:07:15,480 AUDIENCE: All the characters will be typed out. 2688 02:07:15,480 --> 02:07:16,050 DAVID MALAN: Exactly. 2689 02:07:16,050 --> 02:07:18,550 It will be all of the characters, or really all of the words 2690 02:07:18,550 --> 02:07:19,830 that you type at the prompt. 2691 02:07:19,830 --> 02:07:21,765 Argc, as an int, any guess? 2692 02:07:21,765 --> 02:07:24,360 2693 02:07:24,360 --> 02:07:28,700 Argument count is what it generally stands for, though technically, 2694 02:07:28,700 --> 02:07:30,290 you could call these things anything. 2695 02:07:30,290 --> 02:07:31,520 But this is the convention. 2696 02:07:31,520 --> 02:07:35,780 Because I claimed earlier that arrays don't keep track of their own length, 2697 02:07:35,780 --> 02:07:38,930 if you want to know how many words the human typed at the prompt 2698 02:07:38,930 --> 02:07:41,420 after your program's name, you have to be told, 2699 02:07:41,420 --> 02:07:45,650 not just the array of the words, but the length of that array. 2700 02:07:45,650 --> 02:07:48,530 The strings, you can figure out the length of using strlen, 2701 02:07:48,530 --> 02:07:53,360 but you can't figure out the length of the array of strings, the collection 2702 02:07:53,360 --> 02:07:55,020 of words that the human typed in. 2703 02:07:55,020 --> 02:07:56,760 So how can I now use this? 2704 02:07:56,760 --> 02:07:59,190 Well, let me go ahead and do this. 2705 02:07:59,190 --> 02:08:04,190 Let me go ahead and change this program now just to be printf, quote unquote, 2706 02:08:04,190 --> 02:08:11,630 "hello, %2 /n", then argv[1]. 2707 02:08:11,630 --> 02:08:14,780 So this is not the best version of my code yet, but it's my first. 2708 02:08:14,780 --> 02:08:21,020 Make greet, and now let me do ./greet, David all at once. 2709 02:08:21,020 --> 02:08:23,210 Enter, hello, David. 2710 02:08:23,210 --> 02:08:25,820 Now let me run it again, ./greet, Carter. 2711 02:08:25,820 --> 02:08:27,620 Enter, hello, Carter. 2712 02:08:27,620 --> 02:08:29,840 It's a marginal improvement, but I don't have 2713 02:08:29,840 --> 02:08:32,330 to wait for getString to prompt me to hit Enter. 2714 02:08:32,330 --> 02:08:34,370 It's just speeding things up, twice as fast. 2715 02:08:34,370 --> 02:08:36,890 One less command to type in. 2716 02:08:36,890 --> 02:08:41,390 But I deliberately did [1], but what's the beginning of argv? 2717 02:08:41,390 --> 02:08:42,170 It would be [0]. 2718 02:08:42,170 --> 02:08:44,730 2719 02:08:44,730 --> 02:08:45,780 Well, what's that? 2720 02:08:45,780 --> 02:08:48,840 This is sometimes useful, though for now, it's not. 2721 02:08:48,840 --> 02:08:54,110 Suppose I recompile my code and run this program now, greet David. 2722 02:08:54,110 --> 02:08:58,598 Anyone want to guess what's in argv[0]? 2723 02:08:58,598 --> 02:08:59,530 AUDIENCE: [INAUDIBLE] 2724 02:08:59,530 --> 02:09:00,220 DAVID MALAN: Say again? 2725 02:09:00,220 --> 02:09:01,230 AUDIENCE: Greet, hello. 2726 02:09:01,230 --> 02:09:04,530 DAVID MALAN: Greet, Enter, hello, ./greet. 2727 02:09:04,530 --> 02:09:08,280 So if you want to sort of inception style your program to figure out what 2728 02:09:08,280 --> 02:09:11,910 its own name is, or at least how it was executed at the command line, 2729 02:09:11,910 --> 02:09:14,460 at the terminal, you can look at argv[0]. 2730 02:09:14,460 --> 02:09:17,160 In general, probably not that useful, probably better 2731 02:09:17,160 --> 02:09:21,900 to start looking at [1], which was the first word after the program name. 2732 02:09:21,900 --> 02:09:25,320 And if there were more, I could do this how about argv[2], 2733 02:09:25,320 --> 02:09:27,690 let me add in a second %s. 2734 02:09:27,690 --> 02:09:29,550 Let me recompile greet. 2735 02:09:29,550 --> 02:09:35,490 Let me do ./greet David Malan, Enter, and that, too, now works, 2736 02:09:35,490 --> 02:09:37,112 taking in two words at the prompt. 2737 02:09:37,112 --> 02:09:38,820 If I really want to be smart at this now, 2738 02:09:38,820 --> 02:09:40,445 I could do something like this, though. 2739 02:09:40,445 --> 02:09:44,700 How about if the count of arguments, A.K.A. argc, 2740 02:09:44,700 --> 02:09:49,890 equals equals to, then assume that the human typed in only their first name, 2741 02:09:49,890 --> 02:09:58,440 and do printf hello comma %s /n, and then argv[1]. 2742 02:09:58,440 --> 02:10:01,470 Else, if the human did not provide exactly two 2743 02:10:01,470 --> 02:10:04,920 arguments, the name of the program and their own name, 2744 02:10:04,920 --> 02:10:07,890 let's just print out a default value, lest they forgot their name 2745 02:10:07,890 --> 02:10:09,990 or they typed in two names or three names. 2746 02:10:09,990 --> 02:10:13,110 Let's just do, hello comma world as a default. 2747 02:10:13,110 --> 02:10:15,270 And we'll just ignore what the human typed in. 2748 02:10:15,270 --> 02:10:20,850 If I recompile this, make greet, I can do ./greet and David again, Enter. 2749 02:10:20,850 --> 02:10:24,840 Oops-- sorry, what am I missing? 2750 02:10:24,840 --> 02:10:26,640 Yeah, so newbie mistake. 2751 02:10:26,640 --> 02:10:30,090 Else, all right, make greet again. 2752 02:10:30,090 --> 02:10:34,050 ./greet, David, Enter, there's my hello, David. 2753 02:10:34,050 --> 02:10:37,870 But if I omit my name, I just get the generic, like a default value. 2754 02:10:37,870 --> 02:10:41,590 And if I get a little curious and I type in both names, then I get ignored too. 2755 02:10:41,590 --> 02:10:42,090 Why? 2756 02:10:42,090 --> 02:10:44,880 Because I just haven't built in support for argc of three. 2757 02:10:44,880 --> 02:10:47,610 I could do anything I want, but now we have access 2758 02:10:47,610 --> 02:10:50,730 to these kinds of building blocks. 2759 02:10:50,730 --> 02:10:52,780 All right, what else might I do here? 2760 02:10:52,780 --> 02:10:57,660 Well, it turns out there might be some final features for us to now execute. 2761 02:10:57,660 --> 02:11:00,090 Notice, though, that in C, despite what you 2762 02:11:00,090 --> 02:11:02,820 might see in books or online tutorials, nowadays, 2763 02:11:02,820 --> 02:11:06,180 the two official formats for defining a main function 2764 02:11:06,180 --> 02:11:11,130 are either this, which we've been using now for two plus weeks or now this, 2765 02:11:11,130 --> 02:11:14,250 whereby, you change the void to int argc, 2766 02:11:14,250 --> 02:11:17,880 and then for now, string argv, and then empty brackets. 2767 02:11:17,880 --> 02:11:20,608 And we'll see that this, too, is a simplification, some training 2768 02:11:20,608 --> 02:11:21,400 wheels if you will. 2769 02:11:21,400 --> 02:11:23,550 But for now, those are the two forms, even 2770 02:11:23,550 --> 02:11:26,550 though you will see in online tutorials and even books, some people 2771 02:11:26,550 --> 02:11:27,840 use main in different ways. 2772 02:11:27,840 --> 02:11:30,142 These are the two now to keep in mind. 2773 02:11:30,142 --> 02:11:32,100 And I'll note that these command line arguments 2774 02:11:32,100 --> 02:11:33,360 are kind of all over the place. 2775 02:11:33,360 --> 02:11:35,590 Didn't probably expect to see this word on the screen here. 2776 02:11:35,590 --> 02:11:36,490 And what does it mean? 2777 02:11:36,490 --> 02:11:37,920 Well, it turns out that for decades-- there's 2778 02:11:37,920 --> 02:11:40,080 actually this program that comes with Linux systems 2779 02:11:40,080 --> 02:11:41,880 in particular called cowsay. 2780 02:11:41,880 --> 02:11:42,510 Why? 2781 02:11:42,510 --> 02:11:45,300 Probably because someone had too much free time once and decided 2782 02:11:45,300 --> 02:11:49,920 to write a program that creates ASCII art out of a cow saying something 2783 02:11:49,920 --> 02:11:51,520 textually on the screen. 2784 02:11:51,520 --> 02:11:55,780 But you use cowsay, just for fun, by way of command line arguments. 2785 02:11:55,780 --> 02:12:00,660 So for instance, let me propose that I go back to VS Code 2786 02:12:00,660 --> 02:12:03,020 here, not because I want to write any code, 2787 02:12:03,020 --> 02:12:04,770 but I just want to use my terminal window. 2788 02:12:04,770 --> 02:12:07,320 And let me maximize my terminal window here. 2789 02:12:07,320 --> 02:12:11,880 And let me go ahead and type in something like, how about cowsay, 2790 02:12:11,880 --> 02:12:13,170 space moo? 2791 02:12:13,170 --> 02:12:14,822 So cowsay is not a program I wrote. 2792 02:12:14,822 --> 02:12:16,030 It's been around for decades. 2793 02:12:16,030 --> 02:12:18,870 But we installed it in VS Code for you in the cloud. 2794 02:12:18,870 --> 02:12:21,330 It takes at least one command line argument. 2795 02:12:21,330 --> 02:12:23,070 What do you want the cow to say? 2796 02:12:23,070 --> 02:12:26,190 I can say, cowsay moo, and hit Enter, and voila, there 2797 02:12:26,190 --> 02:12:29,490 is my ASCII art of a cow saying moo on the screen. 2798 02:12:29,490 --> 02:12:31,090 It can say multiple words. 2799 02:12:31,090 --> 02:12:33,960 So I can say, Hello, world, Enter. 2800 02:12:33,960 --> 02:12:35,800 And now it says, Hello, world. 2801 02:12:35,800 --> 02:12:38,730 So this is just an example of a silly program that uses command line 2802 02:12:38,730 --> 02:12:40,470 arguments, but it takes others too. 2803 02:12:40,470 --> 02:12:43,650 Just like clang, use this convention of hyphens 2804 02:12:43,650 --> 02:12:45,750 to change the output of the program. 2805 02:12:45,750 --> 02:12:49,350 Dash something is just a super common convention with command line arguments 2806 02:12:49,350 --> 02:12:53,520 when you want a very terse notation for some option like output. 2807 02:12:53,520 --> 02:12:56,460 In cowsay, I read the documentation, and it turns out 2808 02:12:56,460 --> 02:12:59,040 there's a dash f command line argument that 2809 02:12:59,040 --> 02:13:03,460 allows you to change the appearance of the cow, if you will. 2810 02:13:03,460 --> 02:13:10,170 So if I do cowsay dash f, duck, and then some other word like quack, 2811 02:13:10,170 --> 02:13:11,640 it's no longer a cow. 2812 02:13:11,640 --> 02:13:15,850 That command line argument turns it into a tiny, adorable duck instead. 2813 02:13:15,850 --> 02:13:19,020 And then lastly, just for fun, because I spent way too much time 2814 02:13:19,020 --> 02:13:20,790 playing with command line arguments. 2815 02:13:20,790 --> 02:13:25,260 Cowsay dash f, dragon, and then how about, rawr, Enter, 2816 02:13:25,260 --> 02:13:27,910 you can even get this on the screen here. 2817 02:13:27,910 --> 02:13:30,150 So this, too, is just an example of what you 2818 02:13:30,150 --> 02:13:34,230 can do with these command line arguments now that we have this building block. 2819 02:13:34,230 --> 02:13:36,960 And there's one final thing we can now do with code. 2820 02:13:36,960 --> 02:13:39,150 There's one last feature today that we'll 2821 02:13:39,150 --> 02:13:41,610 introduce before we now connect all of these dots 2822 02:13:41,610 --> 02:13:47,520 to readability and encryption by talking, lastly, about something called 2823 02:13:47,520 --> 02:13:48,450 exit status. 2824 02:13:48,450 --> 02:13:52,380 It turns out that whenever your main function exits, 2825 02:13:52,380 --> 02:13:55,590 it returns a secret integer that you can figure out, 2826 02:13:55,590 --> 02:13:58,260 as the programmer or an advanced user, what it was. 2827 02:13:58,260 --> 02:14:02,398 And these exit codes, exit statuses, are typically used to indicate errors. 2828 02:14:02,398 --> 02:14:05,190 So for instance, over the past couple of years, if you've used zoom 2829 02:14:05,190 --> 02:14:08,560 and you ever got some kind of error, you might have seen a screen like this. 2830 02:14:08,560 --> 02:14:11,040 It's usually not that helpful, maybe tells you to click 2831 02:14:11,040 --> 02:14:13,050 Report Problem or Contact Support. 2832 02:14:13,050 --> 02:14:16,980 But very often in our human world on Macs, PCs, and phones, 2833 02:14:16,980 --> 02:14:20,010 you see cryptic error codes, like literally numbers 2834 02:14:20,010 --> 02:14:23,640 that probably only Zoom knows, or Microsoft or Google or whatever company 2835 02:14:23,640 --> 02:14:25,050 wrote the software you're using. 2836 02:14:25,050 --> 02:14:28,260 But that number corresponds to a specific error 2837 02:14:28,260 --> 02:14:32,070 that some human somewhere knows might very well happen. 2838 02:14:32,070 --> 02:14:34,950 These are used similarly, although under a different name 2839 02:14:34,950 --> 02:14:38,260 that we'll talk about later in the term, on the web as well. 2840 02:14:38,260 --> 02:14:41,350 Have you ever seen this-- maybe not character, but number? 2841 02:14:41,350 --> 02:14:43,485 So, 404 means what? 2842 02:14:43,485 --> 02:14:44,880 AUDIENCE: Error. 2843 02:14:44,880 --> 02:14:47,790 DAVID MALAN: So error, yes, but really, not found. 2844 02:14:47,790 --> 02:14:48,410 So, why? 2845 02:14:48,410 --> 02:14:49,993 I mean, this is the most arcane thing. 2846 02:14:49,993 --> 02:14:53,000 And we'll talk in a few weeks about what this and other numbers mean, 2847 02:14:53,000 --> 02:14:54,917 but numbers are all around us in technology, 2848 02:14:54,917 --> 02:14:57,500 and they very often mean something to the technical people who 2849 02:14:57,500 --> 02:15:00,270 wrote the software, less so to humans like you and me. 2850 02:15:00,270 --> 02:15:03,230 Why so many of us recognize 404 is kind of weird, 2851 02:15:03,230 --> 02:15:05,900 that like that's been around long enough that we all know it. 2852 02:15:05,900 --> 02:15:10,250 But it really is just a special number that represents an error of some sort. 2853 02:15:10,250 --> 02:15:13,100 So it turns out, the last thing we'll reveal today 2854 02:15:13,100 --> 02:15:15,530 about what we've been taking for granted for two weeks, 2855 02:15:15,530 --> 02:15:18,200 is what the int is in main. 2856 02:15:18,200 --> 02:15:21,650 We've seen, just a moment ago, that the thing in the parentheses, which 2857 02:15:21,650 --> 02:15:24,680 up until now has been void, which means no command line arguments. 2858 02:15:24,680 --> 02:15:29,690 now int argc string argv brackets just means, yes, command line arguments. 2859 02:15:29,690 --> 02:15:31,290 And we've seen how to access them. 2860 02:15:31,290 --> 02:15:33,620 So the last piece of the puzzle, honestly, 2861 02:15:33,620 --> 02:15:37,460 of all the cryptic syntax the past two weeks, is just what int means. 2862 02:15:37,460 --> 02:15:40,610 Int is always there for main, and it indicates 2863 02:15:40,610 --> 02:15:44,300 that main will always return an integer, even though you and I have never 2864 02:15:44,300 --> 02:15:46,010 done so explicitly. 2865 02:15:46,010 --> 02:15:50,450 Usually, main returns 0, by default. But it 2866 02:15:50,450 --> 02:15:53,928 would be weird if you saw an error message saying 0, so 0 is just hidden. 2867 02:15:53,928 --> 02:15:55,470 You would never see it on the screen. 2868 02:15:55,470 --> 02:15:58,670 But it's happening automatically by way of how C is designed. 2869 02:15:58,670 --> 02:16:01,550 So let me write one final program here. 2870 02:16:01,550 --> 02:16:05,750 I'll call it, for instance, status.c to show you these exit statuses. 2871 02:16:05,750 --> 02:16:10,790 Code of status.c, and then up here, let me do something simple like include 2872 02:16:10,790 --> 02:16:18,020 cs50.h, then include stdio.h, and then int main-- 2873 02:16:18,020 --> 02:16:21,350 actually, let's use a command line argument. int argc, string argv[], 2874 02:16:21,350 --> 02:16:23,180 so that's copy, paste. 2875 02:16:23,180 --> 02:16:26,000 But now let's do this. 2876 02:16:26,000 --> 02:16:29,280 If argc does not equal to-- 2877 02:16:29,280 --> 02:16:30,780 why don't we do something like this? 2878 02:16:30,780 --> 02:16:33,740 Let's not just default to hello, world like last time. 2879 02:16:33,740 --> 02:16:34,770 Let's yell at the user. 2880 02:16:34,770 --> 02:16:38,802 So let's say something like printf missing command line argument, 2881 02:16:38,802 --> 02:16:40,760 so that they know they screwed up and they need 2882 02:16:40,760 --> 02:16:43,160 to run the program again correctly. 2883 02:16:43,160 --> 02:16:51,320 Else, let's go ahead and say, print out, as before, Hello, comma %s, 2884 02:16:51,320 --> 02:16:56,730 and then plug in argv[1], so the human's name from the prompt. 2885 02:16:56,730 --> 02:17:01,910 Now at this point, let me go ahead and run status, ./status, 2886 02:17:01,910 --> 02:17:03,590 and I'll type nothing first. 2887 02:17:03,590 --> 02:17:04,700 I get yelled at. 2888 02:17:04,700 --> 02:17:10,170 This time, I'll type it again. ./status David, and it works properly. 2889 02:17:10,170 --> 02:17:14,090 But now let me show you a somewhat secret, cryptic command. 2890 02:17:14,090 --> 02:17:17,330 You can type this at your prompt, and it's just a coincidence 2891 02:17:17,330 --> 02:17:18,740 that there's another dollar sign. 2892 02:17:18,740 --> 02:17:22,400 Echo $?, totally arcane, but it allows you 2893 02:17:22,400 --> 02:17:25,490 to see what exit status your program has ended with. 2894 02:17:25,490 --> 02:17:27,559 So let me run this again the wrong way. 2895 02:17:27,559 --> 02:17:31,040 ./status, I get the error message. 2896 02:17:31,040 --> 02:17:32,780 What was secretly returned? 2897 02:17:32,780 --> 02:17:33,440 I can't see it. 2898 02:17:33,440 --> 02:17:37,280 There's obviously no error screen, but by typing echo $?, 2899 02:17:37,280 --> 02:17:41,420 I can see that, oh, my program automatically, by default, returns 2900 02:17:41,420 --> 02:17:42,170 zero. 2901 02:17:42,170 --> 02:17:46,879 However, if I run it again correctly, ./status David, Enter, 2902 02:17:46,879 --> 02:17:48,690 this is the correct version. 2903 02:17:48,690 --> 02:17:50,629 But if I run echo $? 2904 02:17:50,629 --> 02:17:52,879 status again, it's still entered with 0. 2905 02:17:52,879 --> 02:17:55,879 And long story short, this is just a missed opportunity. 2906 02:17:55,879 --> 02:17:59,570 When something goes wrong, why don't I return a value other than 0? 2907 02:17:59,570 --> 02:18:01,070 0, by default, means success. 2908 02:18:01,070 --> 02:18:02,690 And it's always there automatically. 2909 02:18:02,690 --> 02:18:04,940 But you can control this. 2910 02:18:04,940 --> 02:18:11,160 I can go into my code here and return 1, else, if something works fine, 2911 02:18:11,160 --> 02:18:14,870 I can return 0, by default. And honestly, if I omit the return zero, 2912 02:18:14,870 --> 02:18:17,129 again, zero automatically is returned. 2913 02:18:17,129 --> 02:18:20,719 So let me go ahead and go be explicit, just so I know what's going on. 2914 02:18:20,719 --> 02:18:26,360 Make status again, ./status, and let's do this correctly with David. 2915 02:18:26,360 --> 02:18:28,520 Enter, hello, David. 2916 02:18:28,520 --> 02:18:32,059 Echo $?, zero. 2917 02:18:32,059 --> 02:18:33,270 So all is well. 2918 02:18:33,270 --> 02:18:38,240 But now if I do ./status and nothing, or multiple things, but not just David, 2919 02:18:38,240 --> 02:18:40,530 Enter, I get the error message. 2920 02:18:40,530 --> 02:18:45,230 But now if I do echo $?, voila, there now is the one. 2921 02:18:45,230 --> 02:18:47,330 So what does this now mean? 2922 02:18:47,330 --> 02:18:49,490 This is, in the graphical world, we would just 2923 02:18:49,490 --> 02:18:51,020 show something like this on the screen, which is 2924 02:18:51,020 --> 02:18:52,459 a little more informative to the user. 2925 02:18:52,459 --> 02:18:54,469 But even in the Linux world where you don't have a GUI, 2926 02:18:54,469 --> 02:18:56,690 necessarily, even for the programs we've written, 2927 02:18:56,690 --> 02:18:58,549 you can check these exit statuses. 2928 02:18:58,549 --> 02:19:01,070 And in fact, more comfortable, more advanced programmers, 2929 02:19:01,070 --> 02:19:03,889 when they write code that calls programs, 2930 02:19:03,889 --> 02:19:07,340 be it cowsay or anything else, you can encode, 2931 02:19:07,340 --> 02:19:11,030 check what the exit status is of a program, and then decide, 2932 02:19:11,030 --> 02:19:13,170 did my program work or did it not? 2933 02:19:13,170 --> 02:19:16,219 And now let's connect the final dots before we 2934 02:19:16,219 --> 02:19:19,070 adjourn for some fruit snacks. 2935 02:19:19,070 --> 02:19:22,100 Cryptography, namely one of the applications this week 2936 02:19:22,100 --> 02:19:24,770 via which you'll be able to send, if you will, 2937 02:19:24,770 --> 02:19:27,650 secret messages, and better yet, decrypt secret messages. 2938 02:19:27,650 --> 02:19:29,780 This will be in addition to perhaps analyzing 2939 02:19:29,780 --> 02:19:32,120 the readability of text using heuristics, like we 2940 02:19:32,120 --> 02:19:34,040 identified at the start of class two. 2941 02:19:34,040 --> 02:19:38,299 So cryptography is just the art, the science of encrypting information, 2942 02:19:38,299 --> 02:19:41,330 scrambling information so that if you have a secret message 2943 02:19:41,330 --> 02:19:45,980 to send in so-called plaintext, you can run it through some algorithm 2944 02:19:45,980 --> 02:19:49,910 and turn it into what's called ciphertext, thereby, encrypting it. 2945 02:19:49,910 --> 02:19:53,150 And only someone who knows what algorithm you've used 2946 02:19:53,150 --> 02:19:55,880 and what input you've used to the algorithm, theoretically, 2947 02:19:55,880 --> 02:19:59,880 can decrypt that process and convert it back to the original message. 2948 02:19:59,880 --> 02:20:03,030 So if we use our mental model from last week, here is a problem. 2949 02:20:03,030 --> 02:20:04,910 Here is an input and output. 2950 02:20:04,910 --> 02:20:08,120 The goal I claim here is to take some plain text, like the message 2951 02:20:08,120 --> 02:20:10,250 you want to send, think back to grade school 2952 02:20:10,250 --> 02:20:13,640 if you ever passed a note to a friend or to your crush saying, I love you, 2953 02:20:13,640 --> 02:20:16,910 it's a little awkward if the teacher or someone else intercepts the paper. 2954 02:20:16,910 --> 02:20:19,490 And in English, it just says, I love you, or whatever it is. 2955 02:20:19,490 --> 02:20:22,350 It'd be nice if you had at least encrypted it in some way. 2956 02:20:22,350 --> 02:20:25,220 But the other person needs to know what algorithm you used 2957 02:20:25,220 --> 02:20:27,230 and what inputs you use to that algorithm 2958 02:20:27,230 --> 02:20:31,100 so that, ultimately, they can decode the so-called ciphertext, which 2959 02:20:31,100 --> 02:20:32,040 is the output. 2960 02:20:32,040 --> 02:20:34,190 So what goes inside of the box today? 2961 02:20:34,190 --> 02:20:37,970 Well, an algorithm, as it relates to cryptography, is called a cipher. 2962 02:20:37,970 --> 02:20:41,390 And a cipher is a fancy name for an algorithm that encrypts text 2963 02:20:41,390 --> 02:20:43,250 from plaintext to ciphertext. 2964 02:20:43,250 --> 02:20:46,760 The catch is, there needs to be not just the algorithm, 2965 02:20:46,760 --> 02:20:48,750 there needs to be an input to it. 2966 02:20:48,750 --> 02:20:52,590 And so, for instance, you might draw the picture like this for the first time 2967 02:20:52,590 --> 02:20:53,090 today. 2968 02:20:53,090 --> 02:20:54,257 And we've seen this in code. 2969 02:20:54,257 --> 02:20:57,180 You can give multiple inputs or arguments to functions. 2970 02:20:57,180 --> 02:20:59,960 So in this black box, can you imagine passing in the message 2971 02:20:59,960 --> 02:21:02,510 you want to send, and then some secret. 2972 02:21:02,510 --> 02:21:05,300 So for instance, suppose that, the simplest 2973 02:21:05,300 --> 02:21:08,750 thing I could think of as a kid was instead of sending the letter A, 2974 02:21:08,750 --> 02:21:10,310 why don't I write the letter B? 2975 02:21:10,310 --> 02:21:13,070 Instead of the letter B, why don't I write the letter C? 2976 02:21:13,070 --> 02:21:16,280 So I can kind of shift the English alphabet by one space. 2977 02:21:16,280 --> 02:21:18,740 So A becomes B, B becomes C, dot, dot, dot, 2978 02:21:18,740 --> 02:21:21,690 Z becomes A. You can wrap around at the end. 2979 02:21:21,690 --> 02:21:24,120 And let's assume no punctuation in this part of the story. 2980 02:21:24,120 --> 02:21:29,420 So that's a very simple algorithm-- add a value to each letter 2981 02:21:29,420 --> 02:21:32,090 and send the value as the ciphertext. 2982 02:21:32,090 --> 02:21:35,540 And now the teacher, the classmate, they have to know that you use, 2983 02:21:35,540 --> 02:21:39,410 not only this rotational algorithm, also known as a Caesar cipher, 2984 02:21:39,410 --> 02:21:41,300 they also need to know what number you use. 2985 02:21:41,300 --> 02:21:45,200 Did you add 1 to every letter, 2 to every letter, 25 to every letter? 2986 02:21:45,200 --> 02:21:49,310 Now if they're super smart and probably not the young age in this story, 2987 02:21:49,310 --> 02:21:51,165 they could also just try all possibilities. 2988 02:21:51,165 --> 02:21:53,040 And that would be an attack on the algorithm. 2989 02:21:53,040 --> 02:21:55,310 This is not a sophisticated algorithm, but it's 2990 02:21:55,310 --> 02:21:56,970 enough to send a message in class. 2991 02:21:56,970 --> 02:21:58,940 So if the two inputs now are HI! 2992 02:21:58,940 --> 02:22:04,280 as the plain text message, and 1 as the so-called key, the secret number 2993 02:22:04,280 --> 02:22:06,950 that only you and the other person know, you 2994 02:22:06,950 --> 02:22:11,040 might be able to encrypt a message from one way to the other. 2995 02:22:11,040 --> 02:22:13,400 And so in this case, for instance, HI! 2996 02:22:13,400 --> 02:22:16,198 would become I-J-!. 2997 02:22:16,198 --> 02:22:17,990 In this version of the algorithm, we're not 2998 02:22:17,990 --> 02:22:19,823 going to bother with numbers or punctuation. 2999 02:22:19,823 --> 02:22:23,090 We'll only operate on A through Z, be it uppercase or lowercase. 3000 02:22:23,090 --> 02:22:28,250 So now if you were to receive a slip of paper in class with I-J on it, 3001 02:22:28,250 --> 02:22:31,290 you, the recipient, would know what it is 3002 02:22:31,290 --> 02:22:33,440 so long as you know that the sender used one, 3003 02:22:33,440 --> 02:22:36,500 because you just reverse the algorithm and you subtract one instead. 3004 02:22:36,500 --> 02:22:39,110 The teacher, they probably don't know what this means, 3005 02:22:39,110 --> 02:22:41,443 and they're not going to spend time hacking the message, 3006 02:22:41,443 --> 02:22:42,975 so it just looks scrambled to them. 3007 02:22:42,975 --> 02:22:44,600 And that's what we get from encryption. 3008 02:22:44,600 --> 02:22:47,430 Someone who intercepts it, be it in class or in the real world, 3009 02:22:47,430 --> 02:22:51,080 on the internet or anywhere else, can't actually figure out, ideally, 3010 02:22:51,080 --> 02:22:52,700 what it is you have sent. 3011 02:22:52,700 --> 02:22:55,130 The opposite, of course, is indeed called decryption, 3012 02:22:55,130 --> 02:22:56,300 but the process is the same. 3013 02:22:56,300 --> 02:22:58,370 We now pass in negative 1. 3014 02:22:58,370 --> 02:23:00,300 And so how about this? 3015 02:23:00,300 --> 02:23:02,840 Why don't we end with a demonstration here? 3016 02:23:02,840 --> 02:23:08,360 UIJT XBT DT50-- there's a bit of a tell there. 3017 02:23:08,360 --> 02:23:11,060 If we pass that in and do negative 1, well, 3018 02:23:11,060 --> 02:23:14,180 how do we get out the plaintext originally? 3019 02:23:14,180 --> 02:23:18,200 Well, if this is the ciphertext, and we subtract 1 from each letter, 3020 02:23:18,200 --> 02:23:28,010 I think U becomes T, I becomes H, J becomes I, T becomes S, X becomes W, 3021 02:23:28,010 --> 02:23:37,580 B becomes A, T becomes S, D becomes C, T becomes S, and this was, indeed, CS50. 3022 02:23:37,580 --> 02:23:40,250 Have a duck on your way out, and some snacks in the lobby. 3023 02:23:40,250 --> 02:23:42,350 [APPLAUSE] 3024 02:23:42,350 --> 02:23:43,850 [FILM ROLLING] 3025 02:23:43,850 --> 02:23:47,500 [MUSIC PLAYING] 3026 02:23:47,500 --> 02:24:19,000253018

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.