All language subtitles for lecture2(1337)-720p-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian Download
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:00,000 --> 00:00:00,000 [MUSIC PLAYING] 1 00:01:18,000 --> 00:01:20,825 DAVID MALAN: This is CS50 and this is week 2. 2 00:01:20,825 --> 00:01:23,450 Now that you have some programming experience under your belts, 3 00:01:23,450 --> 00:01:25,910 in this more arcane language called c. 4 00:01:25,910 --> 00:01:28,790 Among our goals today is to help you understand exactly what you have 5 00:01:28,790 --> 00:01:30,650 been doing these past several days. 6 00:01:30,650 --> 00:01:33,955 Wrestling with your first programs in C, so that you have more of a bottom 7 00:01:33,955 --> 00:01:36,080 up understanding of what some of these commands do. 8 00:01:36,080 --> 00:01:38,580 And, ultimately, what more we can do with this language. 9 00:01:38,580 --> 00:01:41,750 So this recall was the very first program you wrote, 10 00:01:41,750 --> 00:01:44,870 I wrote in this language called C, much more textual, 11 00:01:44,870 --> 00:01:46,970 certainly, than the Scratch equivalent. 12 00:01:46,970 --> 00:01:51,200 But at the end of the day, computers, your Mac, your PC, 13 00:01:51,200 --> 00:01:54,555 VS Code doesn't understand this actual code. 14 00:01:54,555 --> 00:01:57,680 What's the format into which we need to get any program that we write, just 15 00:01:57,680 --> 00:01:58,180 to recap? 16 00:01:58,180 --> 00:01:59,202 AUDIENCE: [INAUDIBLE] 17 00:01:59,202 --> 00:02:01,790 DAVID MALAN: So binary, otherwise known as machine code. 18 00:02:01,790 --> 00:02:02,290 Right? 19 00:02:02,290 --> 00:02:05,870 The 0s and 1s that your computer actually does understand. 20 00:02:05,870 --> 00:02:08,030 So somehow we need to get to this format. 21 00:02:08,030 --> 00:02:10,730 And up until now, we've been using this command called make, 22 00:02:10,730 --> 00:02:13,670 which is aptly named, because it lets you make programs. 23 00:02:13,670 --> 00:02:16,430 And the invocation of that has been pretty simple. 24 00:02:16,430 --> 00:02:20,450 Make hello looks in your current directory or folder for a file called 25 00:02:20,450 --> 00:02:25,100 hello.c, implicitly, and then it compiles that into a file called hello, 26 00:02:25,100 --> 00:02:27,650 which itself is executable, which just means runnable, 27 00:02:27,650 --> 00:02:29,900 so that you can then do ./hello. 28 00:02:29,900 --> 00:02:34,190 But it turns out that make is actually not a compiler itself. 29 00:02:34,190 --> 00:02:35,840 It does help you make programs. 30 00:02:35,840 --> 00:02:40,520 But make is this utility that comes on a lot of systems that makes it easier 31 00:02:40,520 --> 00:02:44,060 to actually compile code by using an actual compiler, 32 00:02:44,060 --> 00:02:48,290 the program that converts source code to machine code, on your own Mac, or PC, 33 00:02:48,290 --> 00:02:50,660 or whatever cloud environment you might be using. 34 00:02:50,660 --> 00:02:53,330 In fact, what make is doing for us, is actually, 35 00:02:53,330 --> 00:02:57,230 running a command automatically known as clang, for C language. 36 00:02:57,230 --> 00:03:01,590 And, so here, for instance, in VS Code, is that very first program again, 37 00:03:01,590 --> 00:03:03,470 this time in the context of a text editor, 38 00:03:03,470 --> 00:03:06,680 and I could compile this with make hello. 39 00:03:06,680 --> 00:03:09,567 Let me go ahead and use the compiler itself manually. 40 00:03:09,567 --> 00:03:12,650 And we'll see in a moment why we've been automating the process with make. 41 00:03:12,650 --> 00:03:15,060 I'm going to run clang instead. 42 00:03:15,060 --> 00:03:17,340 And then I'm going to run hello.c. 43 00:03:17,340 --> 00:03:19,490 So it's a little different how the compiler's used. 44 00:03:19,490 --> 00:03:22,160 It needs to know, explicitly, what the file is called. 45 00:03:22,160 --> 00:03:25,280 I'll go ahead and run clang, hello.c, Enter. 46 00:03:25,280 --> 00:03:28,415 Nothing seems to happen, which, generally speaking, is a good thing. 47 00:03:28,415 --> 00:03:29,790 Because no errors have popped up. 48 00:03:29,790 --> 00:03:36,140 And if I do ls for list, you'll see there is not a file called hello. 49 00:03:36,140 --> 00:03:39,230 But there is a curiously-named file called a.out. 50 00:03:39,230 --> 00:03:42,620 This is a historical convention, stands for assembler output. 51 00:03:42,620 --> 00:03:45,380 And this is, just, the default file name for a program 52 00:03:45,380 --> 00:03:49,400 that you might compile yourself, manually, using clang itself. 53 00:03:49,400 --> 00:03:51,830 Let me go ahead now and point out that that's 54 00:03:51,830 --> 00:03:53,340 kind of a stupid name for a program. 55 00:03:53,340 --> 00:03:56,435 Even though it works, ./a.out would work. 56 00:03:56,435 --> 00:03:59,060 But if you actually want to customize the name of your program, 57 00:03:59,060 --> 00:04:02,720 we could just resort to make, or we could do explicitly 58 00:04:02,720 --> 00:04:03,920 what make is doing for us. 59 00:04:03,920 --> 00:04:06,770 It turns out, some programs, among them make, 60 00:04:06,770 --> 00:04:08,990 support what are called command line arguments, 61 00:04:08,990 --> 00:04:10,310 and more on those later today. 62 00:04:10,310 --> 00:04:13,670 But these are literally words or numbers that you type at your prompt 63 00:04:13,670 --> 00:04:17,330 after the name of a program that just influences its behavior in some way. 64 00:04:17,330 --> 00:04:20,040 It modifies its behavior. 65 00:04:20,040 --> 00:04:22,940 And it turns out, if you read the documentation for clang, 66 00:04:22,940 --> 00:04:28,040 you can actually pass a -o, for output, command line argument, that 67 00:04:28,040 --> 00:04:30,260 lets you specify, explicitly what do you want 68 00:04:30,260 --> 00:04:31,795 your outputted program to be called? 69 00:04:31,795 --> 00:04:34,670 And then you go ahead and type the name of the file that you actually 70 00:04:34,670 --> 00:04:37,110 want to compile, from source code to machine code. 71 00:04:37,110 --> 00:04:38,720 Let me hit Enter now. 72 00:04:38,720 --> 00:04:41,990 Again, nothing seems to happen, and I type ls and voila. 73 00:04:41,990 --> 00:04:45,010 Now we still have the old a.out, because I didn't delete it yet. 74 00:04:45,010 --> 00:04:46,010 And I do have hello now. 75 00:04:46,010 --> 00:04:50,420 So ./hello, voila, runs hello, world again. 76 00:04:50,420 --> 00:04:52,160 And let me go ahead and remove this file. 77 00:04:52,160 --> 00:04:56,593 I could, of course, resort to using the Explorer, on the left hand side. 78 00:04:56,593 --> 00:04:59,510 Which, I am in the habit of closing, just to give us more room to see. 79 00:04:59,510 --> 00:05:02,240 But I could go ahead and right-click or control-click on a.out 80 00:05:02,240 --> 00:05:03,365 if I want to get rid of it. 81 00:05:03,365 --> 00:05:06,300 Or again, let me focus on the command line interface. 82 00:05:06,300 --> 00:05:07,250 And I can use-- 83 00:05:07,250 --> 00:05:08,030 anyone recall? 84 00:05:08,030 --> 00:05:11,000 We didn't really use it much, but what command removes a file? 85 00:05:11,000 --> 00:05:12,665 AUDIENCE: rm. 86 00:05:12,665 --> 00:05:16,430 DAVID MALAN: So rm for remove. rm, a.out, Enter. 87 00:05:16,430 --> 00:05:20,060 Remove regular file, a.out, y for yes, enter. 88 00:05:20,060 --> 00:05:22,640 And now, if I do ls again, voila, it's gone. 89 00:05:22,640 --> 00:05:24,650 All right, so, let's now enhance this program 90 00:05:24,650 --> 00:05:30,290 to do the second version we ever did, which was to also include cs50.h, 91 00:05:30,290 --> 00:05:33,149 so that we have access to functions like, get string, and the like. 92 00:05:33,149 --> 00:05:40,340 Let me do string, name, gets, get string, what's your name, 93 00:05:40,340 --> 00:05:41,550 question mark. 94 00:05:41,550 --> 00:05:46,010 And now, let me go ahead and say hello to that name with our %s placeholder, 95 00:05:46,010 --> 00:05:46,920 comma, name. 96 00:05:46,920 --> 00:05:49,160 So this was version 2 of our program last time, 97 00:05:49,160 --> 00:05:53,300 that very easily compiled with make hello, but notice the difference now. 98 00:05:53,300 --> 00:05:56,360 If I want to compile this thing myself with clang, using 99 00:05:56,360 --> 00:05:58,520 that same lesson learned, all right, let's do it. 100 00:05:58,520 --> 00:06:05,300 clang-o, hello, just so I get a better name for the program, hello.c, Enter. 101 00:06:05,300 --> 00:06:09,750 And a new error pops up that some of you might have encountered on your own. 102 00:06:09,750 --> 00:06:13,580 So it's a bit arcane here, and there's this mention of a cryptic-looking path 103 00:06:13,580 --> 00:06:15,330 with temp for temporary there. 104 00:06:15,330 --> 00:06:18,560 But somehow, my issue's in main, as we can see here. 105 00:06:18,560 --> 00:06:20,257 It somehow relates to hello.c. 106 00:06:20,257 --> 00:06:23,090 Even though we might not have seen this language last time in class, 107 00:06:23,090 --> 00:06:25,970 but there's an undefined reference to get string. 108 00:06:25,970 --> 00:06:27,800 As though get string doesn't exist. 109 00:06:27,800 --> 00:06:31,340 Now, your first instinct might be, well maybe I forgot cs50.h, but of course, 110 00:06:31,340 --> 00:06:32,180 I didn't. 111 00:06:32,180 --> 00:06:34,310 That's the very first line of my program. 112 00:06:34,310 --> 00:06:37,910 But it turns out, make is doing something else for us, all this time. 113 00:06:37,910 --> 00:06:41,930 Just putting cs50.h, or any header file at the top of your code, 114 00:06:41,930 --> 00:06:46,730 for that matter, just teaches the compiler that a function will exist. 115 00:06:46,730 --> 00:06:49,310 It, sort of, asks the compiler to-- it asks the compiler 116 00:06:49,310 --> 00:06:52,610 to trust that I will, eventually, get around to implementing functions, 117 00:06:52,610 --> 00:06:58,130 like get string, and cs50.h, and stdio.h, printf, therein. 118 00:06:58,130 --> 00:07:03,830 But this error here, some kind of linker command, relates to the fact 119 00:07:03,830 --> 00:07:05,960 that there's a separate process for actually 120 00:07:05,960 --> 00:07:10,280 finding the 0s and 1s that cs50 compiled long ago for you. 121 00:07:10,280 --> 00:07:13,850 That authors of this operating system compiled for you, long ago, 122 00:07:13,850 --> 00:07:14,900 in the form of printf. 123 00:07:14,900 --> 00:07:17,840 We need to, somehow, tell the compiler that we 124 00:07:17,840 --> 00:07:20,450 need to link in code that someone else wrote, 125 00:07:20,450 --> 00:07:23,750 the actual machine code that someone else wrote and then compiled. 126 00:07:23,750 --> 00:07:27,497 So to do that, you'd have to type -lcs50, for instance, 127 00:07:27,497 --> 00:07:28,580 at the end of the command. 128 00:07:28,580 --> 00:07:31,548 So additionally, telling clang that, not only do you want to output 129 00:07:31,548 --> 00:07:34,340 a file called hello, and you want to compile a file called hello.c, 130 00:07:34,340 --> 00:07:39,200 you also want to quote-unquote link in a bunch of 0s and 1s 131 00:07:39,200 --> 00:07:43,010 that collectively implement get string and printf. 132 00:07:43,010 --> 00:07:47,220 So now, if I hit enter, this time it compiled OK. 133 00:07:47,220 --> 00:07:53,142 And now if I run ./hello, it works as it did last week, just like that. 134 00:07:53,142 --> 00:07:56,100 But honestly, this is just going to get really tedious, really quickly. 135 00:07:56,100 --> 00:07:57,930 Notice, already, just to compile my code, 136 00:07:57,930 --> 00:08:01,417 I have to run clang-o, hello, hello.c, lcs50, 137 00:08:01,417 --> 00:08:03,500 and you're going to have to type more things, too. 138 00:08:03,500 --> 00:08:06,890 If you wanted to use the math library, like, to use that round function, 139 00:08:06,890 --> 00:08:09,440 you would also have to do -lm, typically, 140 00:08:09,440 --> 00:08:12,890 to specify give me the math bits that someone else compiled. 141 00:08:12,890 --> 00:08:14,970 And the commands just get longer and longer. 142 00:08:14,970 --> 00:08:19,520 So moving forward, we won't have to resort to running clang itself, 143 00:08:19,520 --> 00:08:21,330 but clang is, indeed, the compiler. 144 00:08:21,330 --> 00:08:24,380 That is the program that converts from source code to machine code. 145 00:08:24,380 --> 00:08:28,438 But we'll continue to use make because it just automates that process. 146 00:08:28,438 --> 00:08:30,230 And the commands are only going to get more 147 00:08:30,230 --> 00:08:34,640 cryptic the more sophisticated and more feature full year programs get. 148 00:08:34,640 --> 00:08:39,620 And make, again, is just a tool that makes all that happen. 149 00:08:39,620 --> 00:08:44,300 Let me pause there to see if there's any questions before then we 150 00:08:44,300 --> 00:08:45,890 take a look further under the hood. 151 00:08:45,890 --> 00:08:47,185 Yeah, in front. 152 00:08:47,185 --> 00:08:50,185 AUDIENCE: Can you explain again what the -lcs50-- just why you put that? 153 00:08:50,185 --> 00:08:52,518 DAVID MALAN: Sure, let me come back to that in a moment. 154 00:08:52,518 --> 00:08:53,750 What does the -lcs50 mean? 155 00:08:53,750 --> 00:08:55,917 We'll come back to that, visually, in just a moment. 156 00:08:55,917 --> 00:08:58,850 But it means to link in the 0s and 1s that collectively 157 00:08:58,850 --> 00:09:00,435 implement get string and printf. 158 00:09:00,435 --> 00:09:02,060 But we'll see that, visually, in a sec. 159 00:09:02,060 --> 00:09:03,341 Yeah, behind you. 160 00:09:03,341 --> 00:09:07,073 AUDIENCE: [INAUDIBLE]. 161 00:09:07,073 --> 00:09:08,490 DAVID MALAN: Really good question. 162 00:09:08,490 --> 00:09:10,850 How come I didn't have to link in standard I/O? 163 00:09:10,850 --> 00:09:12,950 Because I used printf in version 1. 164 00:09:12,950 --> 00:09:16,280 Standard I/O is just, literally, so standard that it's built in, 165 00:09:16,280 --> 00:09:17,480 it just works for free. 166 00:09:17,480 --> 00:09:18,800 CS50, of course, is not. 167 00:09:18,800 --> 00:09:21,080 It did not come with the language C or the compiler. 168 00:09:21,080 --> 00:09:22,250 We ourselves wrote it. 169 00:09:22,250 --> 00:09:26,600 And other libraries, even though they might come with the language C, 170 00:09:26,600 --> 00:09:30,600 they might not be enabled by default, generally for efficiency purposes. 171 00:09:30,600 --> 00:09:33,470 So you're not loading more 0s and 1s into the computer's memory 172 00:09:33,470 --> 00:09:34,280 than you need to. 173 00:09:34,280 --> 00:09:37,250 So standard I/O is special, if you will. 174 00:09:37,250 --> 00:09:38,510 Other questions? 175 00:09:38,510 --> 00:09:39,500 Yeah? 176 00:09:39,500 --> 00:09:41,420 AUDIENCE: [INAUDIBLE] 177 00:09:41,420 --> 00:09:43,160 DAVID MALAN: Oh, what does the -o mean? 178 00:09:43,160 --> 00:09:46,190 So -o is shorthand for the English word output, 179 00:09:46,190 --> 00:09:51,260 and so -o is telling clang to please output a file called hello, 180 00:09:51,260 --> 00:09:53,850 because the next thing I wrote after the command line 181 00:09:53,850 --> 00:09:59,929 recall was clang -o hello, then the name of the file, then -lcs50. 182 00:09:59,929 --> 00:10:03,407 And this is where these commands do get and stay fairly arcane. 183 00:10:03,407 --> 00:10:05,240 It's just through muscle memory and practice 184 00:10:05,240 --> 00:10:07,610 that you'll start to remember, oh what are the other commands that you-- 185 00:10:07,610 --> 00:10:10,277 what are the command line arguments you can provide to programs? 186 00:10:10,277 --> 00:10:11,570 But we've seen this before. 187 00:10:11,570 --> 00:10:14,780 Technically, when you run make hello, the program is called make, 188 00:10:14,780 --> 00:10:16,980 hello is the command line argument. 189 00:10:16,980 --> 00:10:19,040 It's an input to the make function, albeit, 190 00:10:19,040 --> 00:10:22,250 typed at the prompt, that tells make what you want to make. 191 00:10:22,250 --> 00:10:26,180 Even when I used rm a moment ago, and did rm of a.out, 192 00:10:26,180 --> 00:10:28,280 the command line argument there was called a.out 193 00:10:28,280 --> 00:10:30,740 and it's telling rm what to delete. 194 00:10:30,740 --> 00:10:35,270 It is entirely dependent on the programs to decide what their conventions are, 195 00:10:35,270 --> 00:10:38,090 whether you use dash this or dash that, but we'll 196 00:10:38,090 --> 00:10:40,805 see over time, which ones actually matter in practice. 197 00:10:40,805 --> 00:10:46,220 So to come back to the first question about what actually is happening there, 198 00:10:46,220 --> 00:10:48,562 let's consider the code more closely. 199 00:10:48,562 --> 00:10:50,270 So here is that first version of the code 200 00:10:50,270 --> 00:10:54,590 again, with stdio.h and only printf, so no cs50 stuff yet. 201 00:10:54,590 --> 00:10:56,840 Until we add it back in and had the second version, 202 00:10:56,840 --> 00:10:59,630 where we actually get the human's name. 203 00:10:59,630 --> 00:11:02,783 When you run this command, there's a few things 204 00:11:02,783 --> 00:11:04,700 that are happening underneath the hood, and we 205 00:11:04,700 --> 00:11:06,650 won't dwell on these kinds of details, indeed, 206 00:11:06,650 --> 00:11:08,870 we'll abstract it away by using make. 207 00:11:08,870 --> 00:11:10,940 But it's worth understanding from the get-go, 208 00:11:10,940 --> 00:11:13,880 how much automation is going on, so that when you run these commands, 209 00:11:13,880 --> 00:11:14,850 it's not magic. 210 00:11:14,850 --> 00:11:17,940 You have this bottom-up understanding of what's going on. 211 00:11:17,940 --> 00:11:21,530 So when we say you've been compiling your code with make, 212 00:11:21,530 --> 00:11:23,600 that's a bit of an oversimplification. 213 00:11:23,600 --> 00:11:26,780 Technically, every time you compile your code, 214 00:11:26,780 --> 00:11:29,570 you're having the computer do four distinct things for you. 215 00:11:29,570 --> 00:11:33,020 And this is not four distinct things that you need to memorize and remember 216 00:11:33,020 --> 00:11:35,180 every time you run your program, what's happening, 217 00:11:35,180 --> 00:11:37,820 but it helps to break it down into building blocks, 218 00:11:37,820 --> 00:11:42,110 as to how we're getting from source code, like C, into 0s and 1s. 219 00:11:42,110 --> 00:11:46,640 It turns out, that when you compile, quote-unquote, "your code," technically 220 00:11:46,640 --> 00:11:50,510 speaking, you're doing four things automatically, and all at once. 221 00:11:50,510 --> 00:11:53,960 Preprocessing it, compiling it, assembling it, and linking it. 222 00:11:53,960 --> 00:11:57,350 Just humans decided, let's just call the whole process compiling. 223 00:11:57,350 --> 00:12:00,230 But for a moment, let's consider what these steps are. 224 00:12:00,230 --> 00:12:02,690 So preprocessing refers to this. 225 00:12:02,690 --> 00:12:06,710 If we look at our source code, version 2 that uses the cs50 library 226 00:12:06,710 --> 00:12:10,442 and therefore get string, notice that we have these include lines at top. 227 00:12:10,442 --> 00:12:12,650 And they're kind of special versus all the other code 228 00:12:12,650 --> 00:12:15,710 we've written, because they start with hash symbols, specifically. 229 00:12:15,710 --> 00:12:17,660 And that's sort of a special syntax that means 230 00:12:17,660 --> 00:12:20,600 that these are, technically, called preprocessor directives. 231 00:12:20,600 --> 00:12:25,290 Fancy way of saying they're handled special versus the rest of your code. 232 00:12:25,290 --> 00:12:29,870 In fact, if we focus on cs50.h, recall from last week 233 00:12:29,870 --> 00:12:35,870 that I provided a hint as to what's actually in cs50.h, among other things. 234 00:12:35,870 --> 00:12:40,580 What was the one salient thing that I said was in cs50.h and therefore, 235 00:12:40,580 --> 00:12:43,475 why we were including it in the first place? 236 00:12:43,475 --> 00:12:44,350 AUDIENCE: Get string? 237 00:12:44,350 --> 00:12:46,850 DAVID MALAN: So get string, specifically, 238 00:12:46,850 --> 00:12:49,160 the prototype for get string. 239 00:12:49,160 --> 00:12:51,410 We haven't made many of our own functions yet, 240 00:12:51,410 --> 00:12:53,840 but recall that any time we've made our own functions, 241 00:12:53,840 --> 00:12:56,330 and we've written them below main in a file, 242 00:12:56,330 --> 00:12:58,790 we've also had to, somewhat stupidly, copy paste 243 00:12:58,790 --> 00:13:01,370 the prototype of the function at the top of the file, 244 00:13:01,370 --> 00:13:05,210 just to teach the compiler that this function doesn't exist, yet, 245 00:13:05,210 --> 00:13:07,430 it does down there, but it will exist. 246 00:13:07,430 --> 00:13:08,300 Just trust me. 247 00:13:08,300 --> 00:13:10,980 So again, that's what these prototypes are doing for us. 248 00:13:10,980 --> 00:13:13,340 So therefore, in my code, If I want to use 249 00:13:13,340 --> 00:13:16,760 a function like get string, or printf, for that matter, 250 00:13:16,760 --> 00:13:19,150 they're not implemented clearly in the same file, 251 00:13:19,150 --> 00:13:20,400 they're implemented elsewhere. 252 00:13:20,400 --> 00:13:22,692 So I need to tell the compiler to trust me that they're 253 00:13:22,692 --> 00:13:24,000 implemented somewhere else. 254 00:13:24,000 --> 00:13:26,810 And so technically, inside of cs50.h, which 255 00:13:26,810 --> 00:13:30,410 is installed somewhere in the cloud's hard drive, so to speak, 256 00:13:30,410 --> 00:13:34,820 that you all are accessing via VS Code, there's a line that looks like this. 257 00:13:34,820 --> 00:13:38,870 A prototype for the get string function that says the name of the functions 258 00:13:38,870 --> 00:13:42,830 get string, it takes one input, or argument, called prompt, 259 00:13:42,830 --> 00:13:45,710 and that type of that prompt is a string. 260 00:13:45,710 --> 00:13:51,150 Get string, not surprisingly, has a return value and it returns a string. 261 00:13:51,150 --> 00:13:54,800 So literally, that line and a bunch of others, are in cs50.h. 262 00:13:54,800 --> 00:13:58,280 So rather than you all having to copy paste the prototype, 263 00:13:58,280 --> 00:14:01,160 you can just trust that cs50 figured out what it is. 264 00:14:01,160 --> 00:14:04,970 You can include cs50.h and the compiler is going 265 00:14:04,970 --> 00:14:07,420 to go find that prototype for you. 266 00:14:07,420 --> 00:14:09,480 Same thing in standard I/O. Someone else-- what 267 00:14:09,480 --> 00:14:13,620 must clearly be in stdio.h, among other stuff, that 268 00:14:13,620 --> 00:14:17,590 motivates our including stdio.h, too? 269 00:14:17,590 --> 00:14:18,090 Yeah? 270 00:14:18,090 --> 00:14:18,798 AUDIENCE: Printf. 271 00:14:18,798 --> 00:14:21,030 DAVID MALAN: Printf, the prototype for printf, 272 00:14:21,030 --> 00:14:24,010 and I'll just change it here in yellow, to be the same. 273 00:14:24,010 --> 00:14:25,410 And it turns out, the format-- 274 00:14:25,410 --> 00:14:28,590 the prototype for printf is, actually, pretty fancy, 275 00:14:28,590 --> 00:14:31,740 because, as you might have noticed, printf can take one argument, just 276 00:14:31,740 --> 00:14:35,910 something to print, 2, if you want to plug a value into it, 3 or more. 277 00:14:35,910 --> 00:14:38,620 So the dot dot dot just represents exactly that. 278 00:14:38,620 --> 00:14:42,330 It's not quite as simple a prototype as get strain, but more on that 279 00:14:42,330 --> 00:14:43,115 another time. 280 00:14:43,115 --> 00:14:46,050 So what does it mean to preprocess your code? 281 00:14:46,050 --> 00:14:49,860 The very first thing the compiler, clang, in this case, 282 00:14:49,860 --> 00:14:54,270 is doing for you when it reads your code top-to-bottom, left-to-right, is it 283 00:14:54,270 --> 00:14:57,960 notices, oh, here is hash include, oh, here's another hash include. 284 00:14:57,960 --> 00:15:03,090 And it, essentially, finds those files on the hard drive, cs50.h, stdio.h, 285 00:15:03,090 --> 00:15:06,990 and does the equivalent of copying and pasting them automatically 286 00:15:06,990 --> 00:15:09,360 into your code at the very top. 287 00:15:09,360 --> 00:15:12,450 Thereby teaching the compiler that gets string and printf 288 00:15:12,450 --> 00:15:14,430 will eventually exist somewhere. 289 00:15:14,430 --> 00:15:18,480 So that's the preprocessing step, whereby, again, it's 290 00:15:18,480 --> 00:15:22,080 just doing a find-and-replace of anything that starts with hash include. 291 00:15:22,080 --> 00:15:24,510 It's plugging in the files there so that you, essentially, 292 00:15:24,510 --> 00:15:27,780 get all the prototypes you need automatically. 293 00:15:27,780 --> 00:15:28,830 OK. 294 00:15:28,830 --> 00:15:31,230 What does it mean, then, to compile the results? 295 00:15:31,230 --> 00:15:33,450 Because at this point in the story, your code 296 00:15:33,450 --> 00:15:35,678 now looks like this in the computer's memory. 297 00:15:35,678 --> 00:15:37,470 It doesn't change your file, it's doing all 298 00:15:37,470 --> 00:15:39,990 of this in the computer's memory, or RAM, for you. 299 00:15:39,990 --> 00:15:42,070 But it, essentially, looks like this. 300 00:15:42,070 --> 00:15:45,600 Well the next step is what's, technically, really compiling. 301 00:15:45,600 --> 00:15:48,420 Even though again, we use compile as an umbrella term. 302 00:15:48,420 --> 00:15:51,510 Compiling code in C means to take code that 303 00:15:51,510 --> 00:15:53,740 now looks like this in the computer's memory 304 00:15:53,740 --> 00:15:56,890 and turn it into something that looks like this. 305 00:15:56,890 --> 00:15:58,350 Which is way more cryptic. 306 00:15:58,350 --> 00:16:00,990 But it was just a few decades ago that, if you 307 00:16:00,990 --> 00:16:03,930 were taking a class like CS50 in its earlier form, 308 00:16:03,930 --> 00:16:07,740 we wouldn't be using C it didn't exist yet, we would actually be using this, 309 00:16:07,740 --> 00:16:09,690 something called assembly language. 310 00:16:09,690 --> 00:16:13,230 And there's different types of, or flavors of, assembly language. 311 00:16:13,230 --> 00:16:17,010 But this is about as low level as you can get to what a computer really 312 00:16:17,010 --> 00:16:19,410 understands, be it a Mac, or PC, or a phone, 313 00:16:19,410 --> 00:16:22,650 before you start getting into actual 0s and 1s. 314 00:16:22,650 --> 00:16:24,013 And most of this is cryptic. 315 00:16:24,013 --> 00:16:27,180 I couldn't tell you what this is doing unless I thought it through carefully 316 00:16:27,180 --> 00:16:30,300 and rewound mentally, years ago, from having studied it, 317 00:16:30,300 --> 00:16:32,880 but let's highlight a few key words in yellow. 318 00:16:32,880 --> 00:16:37,380 Notice that this assembly language that the computer is outputting 319 00:16:37,380 --> 00:16:40,530 for you automatically, still has mention of main 320 00:16:40,530 --> 00:16:43,290 and it has mention of get string, and it has mention of printf. 321 00:16:43,290 --> 00:16:46,358 So there's some relationship to the C code we saw a moment ago. 322 00:16:46,358 --> 00:16:48,150 And then if I highlight these other things, 323 00:16:48,150 --> 00:16:50,430 these are what are called computer instructions. 324 00:16:50,430 --> 00:16:52,740 At the end of the day, your Mac, your PC, 325 00:16:52,740 --> 00:16:56,340 your phone actually only understands very basic instructions, 326 00:16:56,340 --> 00:17:01,020 like addition, subtraction, division, multiplication, move into memory, 327 00:17:01,020 --> 00:17:06,190 load from memory, print something to the screen, very basic operations. 328 00:17:06,190 --> 00:17:07,755 And that's what you're seeing here. 329 00:17:07,755 --> 00:17:12,750 These assembly instructions are what the computer actually 330 00:17:12,750 --> 00:17:16,870 feeds into the brains of the computer, the CPU, the central processing unit. 331 00:17:16,870 --> 00:17:19,770 And it's that Intel CPU, or whatever you have, 332 00:17:19,770 --> 00:17:23,220 that understands this instruction, and this one, and this one, and this one. 333 00:17:23,220 --> 00:17:25,860 And collectively, long story short, all they do 334 00:17:25,860 --> 00:17:28,620 is print hello, world on the screen, but in a way 335 00:17:28,620 --> 00:17:31,910 that the machine understands how to do. 336 00:17:31,910 --> 00:17:34,500 So let me pause here. 337 00:17:34,500 --> 00:17:37,010 Are there any questions on what we mean by preprocessing? 338 00:17:37,010 --> 00:17:40,850 Which finds and replaces the hash includes symbols, among others, 339 00:17:40,850 --> 00:17:44,450 and compiling, which technically takes your source code, 340 00:17:44,450 --> 00:17:48,170 once preprocessed, and converts it to that stuff called assembly language. 341 00:17:48,170 --> 00:17:50,342 AUDIENCE: [INAUDIBLE] each CPU has-- 342 00:17:50,342 --> 00:17:51,290 DAVID MALAN: Correct. 343 00:17:51,290 --> 00:17:54,710 Each type of CPU has its own instruction set. 344 00:17:54,710 --> 00:17:55,280 Indeed. 345 00:17:55,280 --> 00:17:58,970 And as a teaser, this is why, at least back in the day, when 346 00:17:58,970 --> 00:18:02,900 we used to install software from CD-ROMs, or some other type of media, 347 00:18:02,900 --> 00:18:08,222 this is why you can't take a program that was sold for a Windows computer 348 00:18:08,222 --> 00:18:09,680 and run it on a Mac, or vice-versa. 349 00:18:09,680 --> 00:18:14,420 Because the commands, the instructions that those two products understand, 350 00:18:14,420 --> 00:18:15,500 are actually different. 351 00:18:15,500 --> 00:18:20,150 Now Microsoft, or any company, could generally write code in one language, 352 00:18:20,150 --> 00:18:24,109 like C or another, and they can compile it twice, saving a PC version 353 00:18:24,109 --> 00:18:25,790 and saving a Mac version. 354 00:18:25,790 --> 00:18:30,109 It's twice as much work and sometimes you get into some incompatibilities, 355 00:18:30,109 --> 00:18:33,140 but that's why these steps are somewhat distinct. 356 00:18:33,140 --> 00:18:36,710 You can now use the same code and support even different platforms, 357 00:18:36,710 --> 00:18:37,940 or systems, if you'd want. 358 00:18:37,940 --> 00:18:38,440 All right. 359 00:18:38,440 --> 00:18:39,650 Assembly, assembling. 360 00:18:39,650 --> 00:18:42,800 Thankfully, this part is fairly straightforward, at least, in concept. 361 00:18:42,800 --> 00:18:46,250 To assemble code, which is step three of four, that is just 362 00:18:46,250 --> 00:18:50,360 happening for you every time you run make or, in turn, clang, 363 00:18:50,360 --> 00:18:53,570 this assembly language, which the computer generated automatically 364 00:18:53,570 --> 00:18:57,080 for you from your source code, is turned into 0s and 1s. 365 00:18:57,080 --> 00:19:00,783 So that's the step that, last week, I simplified and said, 366 00:19:00,783 --> 00:19:03,950 when you compile your code, you convert it to source code-- from source code 367 00:19:03,950 --> 00:19:04,970 to machine code. 368 00:19:04,970 --> 00:19:07,685 Technically, that happens when you assemble your code. 369 00:19:07,685 --> 00:19:10,940 But no one in normal conversations says that, they just 370 00:19:10,940 --> 00:19:13,280 say compile for all of these terms. 371 00:19:13,280 --> 00:19:14,310 All right. 372 00:19:14,310 --> 00:19:17,450 So that's assembling. 373 00:19:17,450 --> 00:19:19,070 There's one final step. 374 00:19:19,070 --> 00:19:22,400 Even in this simple program of getting the user's name 375 00:19:22,400 --> 00:19:27,120 and then plugging it into printf, I'm using three different people's code, 376 00:19:27,120 --> 00:19:27,620 if you will. 377 00:19:27,620 --> 00:19:30,200 My own, which is in hello.c. 378 00:19:30,200 --> 00:19:35,600 Some of CS50s, which is in hello.c, sorry-- which 379 00:19:35,600 --> 00:19:39,080 is in cs50.c, which is not a file I've mentioned, yet, 380 00:19:39,080 --> 00:19:43,220 but it stands to reason, that if there's a cs50.h that has prototypes, 381 00:19:43,220 --> 00:19:45,380 turns out, the actual implementation of get string 382 00:19:45,380 --> 00:19:47,600 and other things are in cs50.c. 383 00:19:47,600 --> 00:19:51,290 And there's a third file somewhere on the hard drive 384 00:19:51,290 --> 00:19:54,260 that's involved in compiling even this simple program. 385 00:19:54,260 --> 00:19:59,971 hello.c, cs50.c, and by that logic, what might the other be? 386 00:19:59,971 --> 00:20:00,471 Yeah? 387 00:20:00,471 --> 00:20:02,275 AUDIENCE: stdio? 388 00:20:02,275 --> 00:20:03,600 DAVID MALAN: Stdio.c. 389 00:20:03,600 --> 00:20:06,690 And that's a bit of a white lie, because that's such a big, fancy library 390 00:20:06,690 --> 00:20:09,750 that there's actually multiple files that compose it, but the same idea, 391 00:20:09,750 --> 00:20:11,380 and we'll take the simplification. 392 00:20:11,380 --> 00:20:16,200 So when I have this code, and I compile my code, 393 00:20:16,200 --> 00:20:21,300 I get those 0s and 1s that end up taking hello.c and turning it, effectively, 394 00:20:21,300 --> 00:20:26,830 into 0s and 1s that are combined with cs50.c, followed by stdio.c as well. 395 00:20:26,830 --> 00:20:27,840 So let me rewind here. 396 00:20:27,840 --> 00:20:33,300 Here might be the 0s and 1s for my code, the two lines of code that I wrote. 397 00:20:33,300 --> 00:20:37,920 Here might be the 0s and 1s for what cs50 wrote some years ago in cs50.c. 398 00:20:37,920 --> 00:20:42,210 Here might be the 0s and 1s that someone wrote for standard I/O decades ago. 399 00:20:42,210 --> 00:20:45,720 The last and final step is that linking command 400 00:20:45,720 --> 00:20:48,330 that links all of these 0s and 1s together, 401 00:20:48,330 --> 00:20:53,820 essentially stitches them together into one single file called hello, 402 00:20:53,820 --> 00:20:56,385 or called a.out, whatever you name it. 403 00:20:56,385 --> 00:21:01,650 That last step is what combines all of these different programmers' 0s and 1s. 404 00:21:01,650 --> 00:21:04,050 And my God, now we're really in the weeds. 405 00:21:04,050 --> 00:21:07,020 Who wants to even think about running code at this level? 406 00:21:07,020 --> 00:21:08,160 You shouldn't need to. 407 00:21:08,160 --> 00:21:09,180 But it's not magic. 408 00:21:09,180 --> 00:21:11,748 When you're running make, there's some very concrete steps 409 00:21:11,748 --> 00:21:14,290 that are happening that humans have developed over the years, 410 00:21:14,290 --> 00:21:17,700 over the decades, that breakdown this big problem of source code going 411 00:21:17,700 --> 00:21:22,410 to 0s and 1s, or machine code, into these very specific steps. 412 00:21:22,410 --> 00:21:26,100 But henceforth, you can call all of this compiling. 413 00:21:26,100 --> 00:21:27,120 Questions? 414 00:21:27,120 --> 00:21:27,780 Or confusion? 415 00:21:27,780 --> 00:21:28,596 Yeah? 416 00:21:28,596 --> 00:21:30,804 AUDIENCE: Can you explain again what a.out signifies? 417 00:21:30,804 --> 00:21:31,770 DAVID MALAN: Sure. 418 00:21:31,770 --> 00:21:33,270 What does a.out signify? 419 00:21:33,270 --> 00:21:37,890 a.out is just the conventional, default file name for any program 420 00:21:37,890 --> 00:21:41,280 that you compile directly with a compiler, like clang. 421 00:21:41,280 --> 00:21:43,680 It's a meaningless name, though. 422 00:21:43,680 --> 00:21:47,250 It stands for assembler output, and assembler might now sound familiar 423 00:21:47,250 --> 00:21:48,690 from this assembling process. 424 00:21:48,690 --> 00:21:51,150 It's a lame name for a computer program, and we 425 00:21:51,150 --> 00:21:56,450 can override it by outputting something like hello, instead. 426 00:21:56,450 --> 00:21:57,317 Yeah? 427 00:21:57,317 --> 00:22:03,426 AUDIENCE: [INAUDIBLE] 428 00:22:03,426 --> 00:22:07,860 DAVID MALAN: To recap, there are other prototypes in those files, 429 00:22:07,860 --> 00:22:11,910 cs50.h, stdio.h, technically, they're all included on top of your file, 430 00:22:11,910 --> 00:22:14,460 even though you, strictly speaking, don't need most of them, 431 00:22:14,460 --> 00:22:18,190 but they are there, just in case you might want them. 432 00:22:18,190 --> 00:22:19,660 And finally, any other questions? 433 00:22:19,660 --> 00:22:20,160 Yeah? 434 00:22:20,160 --> 00:22:23,878 AUDIENCE: [INAUDIBLE] 435 00:22:23,878 --> 00:22:26,920 DAVID MALAN: Does it matter what order we're telling the computer to run? 436 00:22:26,920 --> 00:22:29,140 Sometimes with libraries, yes, it matters 437 00:22:29,140 --> 00:22:31,520 what order they are linked in together. 438 00:22:31,520 --> 00:22:34,330 But for our purposes, it's really not going to matter. 439 00:22:34,330 --> 00:22:38,750 It's going to-- make is going to take care of automating that process for us. 440 00:22:38,750 --> 00:22:39,250 All right. 441 00:22:39,250 --> 00:22:41,795 So with that said, henceforth, compiling, technically, 442 00:22:41,795 --> 00:22:42,670 is these four things. 443 00:22:42,670 --> 00:22:46,690 But we'll focus on it as a higher level concept, an abstraction, 444 00:22:46,690 --> 00:22:49,880 known as compiling itself. 445 00:22:49,880 --> 00:22:52,510 So another process that we'll now begin to focus on all the 446 00:22:52,510 --> 00:22:55,690 more this week because, invariably, this past week you ran against-- 447 00:22:55,690 --> 00:22:57,160 ran up against some challenges. 448 00:22:57,160 --> 00:23:00,550 You probably created your very first bugs, or mistakes, in a program 449 00:23:00,550 --> 00:23:03,940 and so let's focus for a moment on actual techniques for debugging. 450 00:23:03,940 --> 00:23:07,060 As you spend more time this semester, in the years 451 00:23:07,060 --> 00:23:10,270 to come If you continue to program, you're never, frankly, probably, 452 00:23:10,270 --> 00:23:13,577 going to write bug free code, ultimately. 453 00:23:13,577 --> 00:23:16,660 Though your programs are going to get more featureful, more sophisticated, 454 00:23:16,660 --> 00:23:20,230 and we're all going to start to make more sophisticated mistakes. 455 00:23:20,230 --> 00:23:22,570 And to this day, I write buggy code all the time. 456 00:23:22,570 --> 00:23:24,520 And I'm always horrified when I do it up here. 457 00:23:24,520 --> 00:23:26,620 But hopefully, that won't happen too often. 458 00:23:26,620 --> 00:23:30,100 But when it does, it's a process, now, of debugging, trying 459 00:23:30,100 --> 00:23:32,230 to find the mistakes in your program. 460 00:23:32,230 --> 00:23:35,600 You don't have to stare at your code, or shake your fist at your code. 461 00:23:35,600 --> 00:23:38,590 There are actual tools that real world programmers 462 00:23:38,590 --> 00:23:41,860 use to help debug their code and find these faults. 463 00:23:41,860 --> 00:23:44,455 So what are some of the techniques and tools that folks use? 464 00:23:44,455 --> 00:23:49,440 Well as an aside, if you've ever-- 465 00:23:49,440 --> 00:23:52,840 a bug in a program is a mistake, that's been around for some time. 466 00:23:52,840 --> 00:23:58,010 If you've ever heard this tale, some 50 plus years ago, in 1947. 467 00:23:58,010 --> 00:24:02,770 This is an entry in a log book written by a famous computer scientist known 468 00:24:02,770 --> 00:24:05,230 as-- named Grace Hopper, who happened to be the one 469 00:24:05,230 --> 00:24:09,345 to record the very first discovery of a quote-unquote actual bug in a computer. 470 00:24:09,345 --> 00:24:11,860 This was like a moth that had flown into, 471 00:24:11,860 --> 00:24:17,080 at the time, a very sophisticated system known as the Harvard Mark II computer, 472 00:24:17,080 --> 00:24:20,050 very large, refrigerator-sized type systems, 473 00:24:20,050 --> 00:24:24,160 in which an actual bug caused an issue. 474 00:24:24,160 --> 00:24:27,190 The etymology of bug though, predates this particular instance, 475 00:24:27,190 --> 00:24:30,580 but here you have, as any computer scientists might know, the example 476 00:24:30,580 --> 00:24:32,845 of a first physical bug in a computer. 477 00:24:32,845 --> 00:24:35,322 How, though, do you go about removing such a thing? 478 00:24:35,322 --> 00:24:37,780 Well, let's consider a very simple scenario from last time, 479 00:24:37,780 --> 00:24:40,780 for instance, when we were trying to print out various aspects of Mario, 480 00:24:40,780 --> 00:24:42,970 like this column of 3 bricks. 481 00:24:42,970 --> 00:24:46,660 Let's consider how I might go about implementing a program like this. 482 00:24:46,660 --> 00:24:51,130 Let me switch back over to VS Code here, and I'm going to run-- 483 00:24:51,130 --> 00:24:52,750 write a program. 484 00:24:52,750 --> 00:24:54,640 And I'm not going to trust myself, so I'm 485 00:24:54,640 --> 00:24:56,507 going to call it buggy.c from the get-go, 486 00:24:56,507 --> 00:24:58,340 knowing that I'm going to mess something up. 487 00:24:58,340 --> 00:25:01,150 But I'm going to go ahead and include stdio.h. 488 00:25:01,150 --> 00:25:03,940 And I'm going to define main, as usual. 489 00:25:03,940 --> 00:25:05,950 So hopefully, no mistakes just yet. 490 00:25:05,950 --> 00:25:08,710 And now, I want to print those 3 bricks on the screen using 491 00:25:08,710 --> 00:25:10,270 just hashes for bricks. 492 00:25:10,270 --> 00:25:16,420 So how about 4 int i get 0, i less than or equal to 3, i plus plus. 493 00:25:16,420 --> 00:25:18,280 Now, inside of my curly braces, I'm going 494 00:25:18,280 --> 00:25:23,960 to go ahead and print out a hash followed by a backslash n, semicolon. 495 00:25:23,960 --> 00:25:27,975 All right, saving the file, doing make, buggy, Enter, it compiles. 496 00:25:27,975 --> 00:25:33,340 So there's no syntactical errors, my code is syntactically correct. 497 00:25:33,340 --> 00:25:36,640 But some of you have probably seen the logical error already, 498 00:25:36,640 --> 00:25:39,370 because when I run this program I don't get 499 00:25:39,370 --> 00:25:45,430 this picture, which was 3 bricks high, I seem to have 4 bricks instead. 500 00:25:45,430 --> 00:25:47,930 Now, this might be jumping out at you, why it's happening, 501 00:25:47,930 --> 00:25:49,930 but I've kept the program simple just so that we 502 00:25:49,930 --> 00:25:54,010 don't have to find an actual bug, we can use a tool to find one that we already 503 00:25:54,010 --> 00:25:55,970 know about, in this case. 504 00:25:55,970 --> 00:25:59,050 What might be the first strategy for finding a bug like this, 505 00:25:59,050 --> 00:26:03,292 rather than staring at your code, asking a question, trying to think 506 00:26:03,292 --> 00:26:04,125 through the problem? 507 00:26:04,125 --> 00:26:07,690 Well, let's actually try to diagnose the problem more proactively. 508 00:26:07,690 --> 00:26:10,420 And the simplest way to do this now, and years from now, 509 00:26:10,420 --> 00:26:13,870 is, honestly, going to be to use a function like printf. 510 00:26:13,870 --> 00:26:15,790 Printf is a wonderfully useful function, not 511 00:26:15,790 --> 00:26:18,550 for formatting-- printing formatted strings and all that, for 512 00:26:18,550 --> 00:26:21,430 just looking inside the values of variables 513 00:26:21,430 --> 00:26:24,352 that you might be curious about to see what's going on. 514 00:26:24,352 --> 00:26:25,060 So you know what? 515 00:26:25,060 --> 00:26:26,320 Let me do this. 516 00:26:26,320 --> 00:26:29,110 I see that there's 4 coming out, but I intended 3. 517 00:26:29,110 --> 00:26:31,740 So clearly, something's wrong with my i variables. 518 00:26:31,740 --> 00:26:34,090 So let me be a little more pedantic. 519 00:26:34,090 --> 00:26:37,300 Let me go inside of this loop and, temporarily, 520 00:26:37,300 --> 00:26:40,480 say something explicit, like, i is-- 521 00:26:40,480 --> 00:26:45,200 &i /n, and then just plug in the value of i. 522 00:26:45,200 --> 00:26:45,700 Right? 523 00:26:45,700 --> 00:26:48,970 This is not the program I want to write, it's the program I'm temporarily 524 00:26:48,970 --> 00:26:54,400 writing, because now I'm going to say make buggy, ./buggy. 525 00:26:54,400 --> 00:26:56,500 And if I look, now, at the output, I have 526 00:26:56,500 --> 00:27:01,090 some helpful diagnostic information. i is 0, and I get a hash, i is 1, 527 00:27:01,090 --> 00:27:03,610 and I get a hash, 2 and I get a hash, 3 and I get hash. 528 00:27:03,610 --> 00:27:04,527 OK, wait a minute. 529 00:27:04,527 --> 00:27:06,610 I'm clearly going too many steps because, maybe, I 530 00:27:06,610 --> 00:27:09,250 forgot that computers are, essentially, counting from 0, 531 00:27:09,250 --> 00:27:11,450 and now, oh, it's less than or equal to. 532 00:27:11,450 --> 00:27:13,030 Now you see it, right? 533 00:27:13,030 --> 00:27:15,940 Again, trivial example, but just by using printf, 534 00:27:15,940 --> 00:27:18,910 you can see inside of the computer's memory 535 00:27:18,910 --> 00:27:21,130 by just printing stuff out like this. 536 00:27:21,130 --> 00:27:25,770 And now, once you've figured it out, oh, so this should probably be less than 3, 537 00:27:25,770 --> 00:27:28,140 or I should start counting from 1, there's 538 00:27:28,140 --> 00:27:29,640 any number of ways I could fix this. 539 00:27:29,640 --> 00:27:32,655 But the most conventional is probably just to say less than 3. 540 00:27:32,655 --> 00:27:39,180 Now, I can delete my temporary print statement, rerun make buggy, ./buggy. 541 00:27:39,180 --> 00:27:41,790 And, voila, problem solved. 542 00:27:41,790 --> 00:27:43,830 All right, and to this day, I do this. 543 00:27:43,830 --> 00:27:46,860 Whether it's making a command line application, or a web application, 544 00:27:46,860 --> 00:27:49,050 or mobile application, It's very common to use 545 00:27:49,050 --> 00:27:51,270 printf, or some equivalent in any language, 546 00:27:51,270 --> 00:27:55,350 just to poke around and see what's inside the computer's memory. 547 00:27:55,350 --> 00:27:58,570 Thankfully, there's more sophisticated tools than this. 548 00:27:58,570 --> 00:28:00,930 Let me go ahead and reintroduce the bug here. 549 00:28:00,930 --> 00:28:04,620 And let me reopen my sidebar at left here. 550 00:28:04,620 --> 00:28:08,550 Let me now recompile the code to make sure it's current. 551 00:28:08,550 --> 00:28:11,310 And I'm going to run a command called debug50. 552 00:28:11,310 --> 00:28:15,090 Which is a command that's representative of a type of program 553 00:28:15,090 --> 00:28:16,740 known as a debugger. 554 00:28:16,740 --> 00:28:19,680 And this debugger is actually built into VS Code. 555 00:28:19,680 --> 00:28:23,700 And all debug50 is doing for us is automating the process of starting 556 00:28:23,700 --> 00:28:25,650 VS Code's built-in debugger. 557 00:28:25,650 --> 00:28:28,260 So this isn't even a CS50-specific tool, we've 558 00:28:28,260 --> 00:28:31,170 just given you a debug50 command to make it easier 559 00:28:31,170 --> 00:28:32,855 to start it up from the get-go. 560 00:28:32,855 --> 00:28:37,560 And the way you run this debugger is you say debug50, space, and then 561 00:28:37,560 --> 00:28:40,120 the name of the program that you want to debug. 562 00:28:40,120 --> 00:28:42,210 So, in this case, . /buggy. 563 00:28:42,210 --> 00:28:44,010 So you don't mention your c-file. 564 00:28:44,010 --> 00:28:46,650 You mention your already-compiled code. 565 00:28:46,650 --> 00:28:52,230 And what this debugger is going to let me do is, most powerfully, 566 00:28:52,230 --> 00:28:54,930 walk through my code step-by-step. 567 00:28:54,930 --> 00:28:58,930 Because every program we've written thus far, runs from start to finish, 568 00:28:58,930 --> 00:29:02,325 even if I'm not done thinking through each step at a time. 569 00:29:02,325 --> 00:29:05,850 With a debugger, I can actually click on a line number 570 00:29:05,850 --> 00:29:09,180 and say pause execution here, and the debugger 571 00:29:09,180 --> 00:29:14,130 will let me walk through my code one step at a time, one second at a time, 572 00:29:14,130 --> 00:29:16,740 one minute at a time, at my own human pace. 573 00:29:16,740 --> 00:29:19,470 Which is super compelling when the programs get more complicated 574 00:29:19,470 --> 00:29:22,600 and they might, otherwise, fly by on the screen. 575 00:29:22,600 --> 00:29:25,860 So I'm going to click to the left of line 5. 576 00:29:25,860 --> 00:29:27,970 And notice that these little red dots appear. 577 00:29:27,970 --> 00:29:31,290 And if I click on one it stays, and gets even redder. 578 00:29:31,290 --> 00:29:34,230 And I'm going to run debug50 on ./buggy. 579 00:29:34,230 --> 00:29:39,090 And in just a moment, you'll see that a new panel opens on the left hand side. 580 00:29:39,090 --> 00:29:41,910 It's doing some configuration of the screen. 581 00:29:41,910 --> 00:29:46,690 Let me zoom out a little bit here so we can see more on the screen at once. 582 00:29:46,690 --> 00:29:50,440 And sometimes, you'll see in VS Code that debug console opens up, 583 00:29:50,440 --> 00:29:54,480 which looks very cryptic, just go back to terminal window if that happens. 584 00:29:54,480 --> 00:29:57,875 Because at the terminal window is where you can still interact with your code. 585 00:29:57,875 --> 00:30:00,120 And let's now take a look at what's going on. 586 00:30:00,120 --> 00:30:04,650 If I zoom in on my buggy.c code here, you'll 587 00:30:04,650 --> 00:30:10,890 notice that we have the same program as before, but highlighted in yellow 588 00:30:10,890 --> 00:30:11,820 is line 5. 589 00:30:11,820 --> 00:30:15,660 Not a coincidence, that's the line I set a so-called breakpoint at. 590 00:30:15,660 --> 00:30:20,400 The little red dot means break here, pause execution here. 591 00:30:20,400 --> 00:30:23,716 And the yellow line has not yet been executed. 592 00:30:23,716 --> 00:30:27,600 But if I, now, at the top of my screen, notice these little arrows. 593 00:30:27,600 --> 00:30:28,750 There's one for Play. 594 00:30:28,750 --> 00:30:30,750 There's one for this, which, if I hover over it, 595 00:30:30,750 --> 00:30:34,140 says Step Over, there's another that's going to say Step Into, 596 00:30:34,140 --> 00:30:35,820 there's a third that says Step Out. 597 00:30:35,820 --> 00:30:38,520 I'm just going to use the first of these, Step Over. 598 00:30:38,520 --> 00:30:41,580 And I'm going to do this, and you'll see that the yellow highlight 599 00:30:41,580 --> 00:30:45,660 moved from line 5 to line 7 because now it's ready, 600 00:30:45,660 --> 00:30:47,955 but hasn't yet printed out that hash. 601 00:30:47,955 --> 00:30:51,817 But the most powerful thing here, notice, is that top left here. 602 00:30:51,817 --> 00:30:54,150 It's a little cryptic, because there's a bunch of things 603 00:30:54,150 --> 00:30:56,910 going on that will make more sense over time, but at the top 604 00:30:56,910 --> 00:30:58,470 there's a section called variables. 605 00:30:58,470 --> 00:31:00,750 Below that, something called locals, which means 606 00:31:00,750 --> 00:31:02,820 local to my current function, main. 607 00:31:02,820 --> 00:31:07,410 And notice, there's my variable called i, and its current value is 0. 608 00:31:07,410 --> 00:31:12,810 So now, once I click Step Over again, watch what happens. 609 00:31:12,810 --> 00:31:15,660 We go from line 7 back to line 5. 610 00:31:15,660 --> 00:31:19,455 But look in the terminal window, one of the hashes has printed. 611 00:31:19,455 --> 00:31:22,050 But now, it's printed at my own pace. 612 00:31:22,050 --> 00:31:24,030 I can think through this step-by-step. 613 00:31:24,030 --> 00:31:26,340 Notice that i has not changed, yet. 614 00:31:26,340 --> 00:31:29,700 It's still 0 because the yellow highlighted line hasn't yet executed. 615 00:31:29,700 --> 00:31:34,140 But the moment I click Step Over, it's going to execute line 5. 616 00:31:34,140 --> 00:31:41,010 Now, notice at top left, i has become 1, and nothing has printed, yet, 617 00:31:41,010 --> 00:31:43,290 because now, highlighted is line 7. 618 00:31:43,290 --> 00:31:48,000 So if I click Step Over again, we'll see the hash. 619 00:31:48,000 --> 00:31:51,930 If I repeat this process at my own human, comfortable pace, 620 00:31:51,930 --> 00:31:57,040 I can see my variables changing, I can see output changing on the screen, 621 00:31:57,040 --> 00:31:59,902 and I can just think about should that have just happened. 622 00:31:59,902 --> 00:32:01,860 I can pause and give thought to what's actually 623 00:32:01,860 --> 00:32:06,240 going on without trying to race the computer and figure it all out at once. 624 00:32:06,240 --> 00:32:08,490 I'm going to go ahead and stop here because we already 625 00:32:08,490 --> 00:32:11,430 know what this particular problem is, and that brings me back 626 00:32:11,430 --> 00:32:12,720 to my default terminal window. 627 00:32:12,720 --> 00:32:16,180 But this debugger, let me disable the breakpoint now 628 00:32:16,180 --> 00:32:18,570 so it doesn't keep breaking, this debugger 629 00:32:18,570 --> 00:32:20,760 will be your friend moving forward in order 630 00:32:20,760 --> 00:32:25,290 to step through your code step-by-step, at your own pace to figure out 631 00:32:25,290 --> 00:32:26,820 where something has gone wrong. 632 00:32:26,820 --> 00:32:30,397 Printf is great, but it gets annoying if you have to constantly add print this, 633 00:32:30,397 --> 00:32:33,480 print this, print this, print this, recompile, rerun it, oh wait a minute, 634 00:32:33,480 --> 00:32:34,980 print this, print this. 635 00:32:34,980 --> 00:32:39,780 The debugger lets you do the equivalent, but automatically. 636 00:32:39,780 --> 00:32:45,960 Questions on this debugger, which you'll see all the more hands-on over time? 637 00:32:45,960 --> 00:32:47,430 Questions on debugger? 638 00:32:47,430 --> 00:32:48,554 Yeah? 639 00:32:48,554 --> 00:32:50,560 AUDIENCE: You were using a Step Over feature. 640 00:32:50,560 --> 00:32:53,303 What do the other features in the debugger-- 641 00:32:53,303 --> 00:32:54,720 DAVID MALAN: Really good question. 642 00:32:54,720 --> 00:32:57,720 We'll see this before long, but those other buttons that I glossed over, 643 00:32:57,720 --> 00:33:02,460 step into and step out of, actually let you step into specific functions 644 00:33:02,460 --> 00:33:04,200 if I had any more than main. 645 00:33:04,200 --> 00:33:06,960 So if main called a function called something, 646 00:33:06,960 --> 00:33:10,380 and something called a function called something else, instead of just 647 00:33:10,380 --> 00:33:14,730 stepping over the entire execution of that function, I could step into it 648 00:33:14,730 --> 00:33:17,105 and walk through its lines of code one by one. 649 00:33:17,105 --> 00:33:19,020 So any time you have a problem set you're 650 00:33:19,020 --> 00:33:22,140 working on that has multiple functions, you can set a breakpoint in main, 651 00:33:22,140 --> 00:33:26,250 if you want, or you can set it inside of one of your additional functions 652 00:33:26,250 --> 00:33:29,130 to focus your attention only on that. 653 00:33:29,130 --> 00:33:32,640 And we'll see examples of that over time. 654 00:33:32,640 --> 00:33:33,780 All right, so what else? 655 00:33:33,780 --> 00:33:38,100 And what's the sort of, elephant in the room, so to speak, 656 00:33:38,100 --> 00:33:39,750 is actually a duck in this case. 657 00:33:39,750 --> 00:33:42,160 Why is there this duck and all of these ducks here? 658 00:33:42,160 --> 00:33:46,440 Well, it turns out, a third, genuinely recommended, debugging technique 659 00:33:46,440 --> 00:33:50,055 is talking through problems, talking through code with someone else. 660 00:33:50,055 --> 00:33:52,620 Now, in the absence of having a family member, or a friend, 661 00:33:52,620 --> 00:33:56,520 or a roommate who actually wants to hear you talk about code, of all things, 662 00:33:56,520 --> 00:34:01,320 generally, programmers turn to a rubber duck, or other inanimate objects 663 00:34:01,320 --> 00:34:03,360 if something animate is not available. 664 00:34:03,360 --> 00:34:06,760 The idea behind rubber duck debugging, so to speak, 665 00:34:06,760 --> 00:34:12,750 is that simply by looking at your code and talking it through, OK, on line 3, 666 00:34:12,750 --> 00:34:17,040 I'm starting a 4 loop and I'm initializing i to 0. 667 00:34:17,040 --> 00:34:18,990 OK, then, I'm printing out a hash. 668 00:34:18,990 --> 00:34:24,112 Just by talking through your code, step-by-step, invariably, 669 00:34:24,112 --> 00:34:26,820 finds you having the proverbial light bulb go off over your head, 670 00:34:26,820 --> 00:34:29,040 because you realize, wait a minute I just said something stupid, 671 00:34:29,040 --> 00:34:30,510 or I just said something wrong. 672 00:34:30,510 --> 00:34:34,500 And this is really just a proxy for any other human, teaching fellow, teacher 673 00:34:34,500 --> 00:34:36,060 or friend, colleague. 674 00:34:36,060 --> 00:34:38,440 But in the absence of any of those people in the room, 675 00:34:38,440 --> 00:34:40,357 you're welcome to take, on your way out today. 676 00:34:40,357 --> 00:34:44,280 One of these little, rubber ducks and consider using it, for real, any time 677 00:34:44,280 --> 00:34:47,820 you want to talk through one of your problems in CS50, 678 00:34:47,820 --> 00:34:49,140 or maybe life more generally. 679 00:34:49,140 --> 00:34:51,480 But having it there on your desk is just a way 680 00:34:51,480 --> 00:34:55,140 to help you hear illogic in what you think 681 00:34:55,140 --> 00:34:57,790 might, otherwise, be logical code. 682 00:34:57,790 --> 00:35:02,400 So printf, debugging, rubber-duck debugging are just three of the ways, 683 00:35:02,400 --> 00:35:05,207 you'll see over time, to get to the source of code 684 00:35:05,207 --> 00:35:06,790 that you will write that has mistakes. 685 00:35:06,790 --> 00:35:08,880 Which is going to happen, but it will empower you 686 00:35:08,880 --> 00:35:12,000 all the more to solve those mistakes. 687 00:35:12,000 --> 00:35:17,440 All right, any questions on debugging, in general, or these three techniques? 688 00:35:17,440 --> 00:35:17,940 Yeah? 689 00:35:17,940 --> 00:35:19,740 AUDIENCE: [INAUDIBLE] 690 00:35:19,740 --> 00:35:22,650 DAVID MALAN: What's the difference between Step Over and Step Into? 691 00:35:22,650 --> 00:35:25,980 At the moment, the only one that's applicable to the code I just wrote 692 00:35:25,980 --> 00:35:29,340 is Step Over, because it means step over each line of code. 693 00:35:29,340 --> 00:35:34,050 If, though, I had other functions that I had written in this program, 694 00:35:34,050 --> 00:35:39,300 maybe lower down in the file, I could step into those function calls 695 00:35:39,300 --> 00:35:41,469 and walk through them one at a time. 696 00:35:41,469 --> 00:35:43,650 So we'll come back to this with an actual example, 697 00:35:43,650 --> 00:35:46,230 but step into will allow me to do exactly that. 698 00:35:46,230 --> 00:35:49,210 In fact, this is a perfect segue to doing a little something like this. 699 00:35:49,210 --> 00:35:51,632 Let me go ahead and open up another file here. 700 00:35:51,632 --> 00:35:53,340 And, actually, we'll use the same, buggy. 701 00:35:53,340 --> 00:35:56,320 And we're going to write one other thing that's buggy, as well. 702 00:35:56,320 --> 00:36:00,000 Let me go up here and include, as before, cs50.h. 703 00:36:00,000 --> 00:36:03,780 Let me include stdio.h. 704 00:36:03,780 --> 00:36:05,520 Let me do int main(void). 705 00:36:05,520 --> 00:36:08,050 So all of this, I think, is correct, so far. 706 00:36:08,050 --> 00:36:11,280 And let's do this, let's give myself an int called i, 707 00:36:11,280 --> 00:36:14,530 and let's ask the user for a negative integer. 708 00:36:14,530 --> 00:36:17,300 This is not a function that exists, technically, yet. 709 00:36:17,300 --> 00:36:20,050 But I'm going to assume, for the sake of discussion, that it does. 710 00:36:20,050 --> 00:36:23,700 Then, I'm just going to print out, with %i and a new line, 711 00:36:23,700 --> 00:36:25,360 whatever the human typed in. 712 00:36:25,360 --> 00:36:28,320 So at this point in the story, my program, I think, is correct. 713 00:36:28,320 --> 00:36:30,930 Except for the fact that get negative int is not 714 00:36:30,930 --> 00:36:33,690 a function in the CS50 library or anywhere else. 715 00:36:33,690 --> 00:36:35,460 I'm going to need to invent it myself. 716 00:36:35,460 --> 00:36:41,310 So suppose, in this case, that I declare a function called get negative int. 717 00:36:41,310 --> 00:36:45,630 It's return type, so to speak, should be int, because, as its name suggests, 718 00:36:45,630 --> 00:36:48,360 I want to hand the user back in integer, and it's going 719 00:36:48,360 --> 00:36:50,310 to take no input to keep it simple. 720 00:36:50,310 --> 00:36:51,810 So I'm just going to say void there. 721 00:36:51,810 --> 00:36:54,810 No inputs, no special prompts, nothing like that. 722 00:36:54,810 --> 00:36:57,600 Let me, now, give myself some curly braces. 723 00:36:57,600 --> 00:37:00,510 And let me do something familiar, perhaps, from problem set 1. 724 00:37:00,510 --> 00:37:05,550 Let me give myself a variable, like n, and let me do the following 725 00:37:05,550 --> 00:37:07,320 within this block of code. 726 00:37:07,320 --> 00:37:13,590 Assign n the value of get int, asking the user for a negative integer using 727 00:37:13,590 --> 00:37:14,850 get int's own prompt. 728 00:37:14,850 --> 00:37:18,750 And I want to do this while n is less than 0, because I 729 00:37:18,750 --> 00:37:20,390 want to get a negative from the user. 730 00:37:20,390 --> 00:37:24,140 And recall, from having used this block in the past, 731 00:37:24,140 --> 00:37:27,770 I can now return n as the very last step to hand back 732 00:37:27,770 --> 00:37:31,790 whatever the user has typed in, so long as they cooperated and gave me 733 00:37:31,790 --> 00:37:33,750 an actual negative integer. 734 00:37:33,750 --> 00:37:36,710 Now, I've deliberately made a mistake here, 735 00:37:36,710 --> 00:37:39,080 and it's a subtle, silly, mathematical one, 736 00:37:39,080 --> 00:37:43,910 but let me compile this program after copying the prototype up to the top, 737 00:37:43,910 --> 00:37:45,380 so I don't make that mistake again. 738 00:37:45,380 --> 00:37:48,470 Let me do make buggy, Enter. 739 00:37:48,470 --> 00:37:50,720 And now, let me do ./buggy. 740 00:37:50,720 --> 00:37:54,020 I'll give it a negative integer, like negative 50. 741 00:37:54,020 --> 00:37:55,370 Uh-huh. 742 00:37:55,370 --> 00:37:59,330 That did not take. 743 00:37:59,330 --> 00:38:00,860 How about negative 5? 744 00:38:00,860 --> 00:38:02,060 No. 745 00:38:02,060 --> 00:38:04,500 How about 0? 746 00:38:04,500 --> 00:38:05,000 All right. 747 00:38:05,000 --> 00:38:09,080 So it's, clearly, working backwards, or incorrectly here, logically. 748 00:38:09,080 --> 00:38:10,800 So how could I go about debugging this? 749 00:38:10,800 --> 00:38:12,425 Well, I could do what I've done before? 750 00:38:12,425 --> 00:38:18,920 I could use my printf technique and say something explicit like n is %i, 751 00:38:18,920 --> 00:38:25,310 new line, comma n, just to print it out, let me recompile buggy, 752 00:38:25,310 --> 00:38:28,640 let me rerun buggy, let me type in negative 50. 753 00:38:28,640 --> 00:38:30,630 OK, n is negative 50. 754 00:38:30,630 --> 00:38:33,173 So that didn't really help me at this point, 755 00:38:33,173 --> 00:38:34,590 because that's the same as before. 756 00:38:34,590 --> 00:38:38,030 So let me do this, debug50, ./buggy. 757 00:38:38,030 --> 00:38:39,870 Oh, but I've made a mistake. 758 00:38:39,870 --> 00:38:41,700 So I didn't set my breakpoint, yet. 759 00:38:41,700 --> 00:38:44,930 So let me do this, and I'll set a breakpoint this time. 760 00:38:44,930 --> 00:38:47,330 I could set it here, on line 8. 761 00:38:47,330 --> 00:38:49,340 Let's do it in main, as before. 762 00:38:49,340 --> 00:38:51,530 Let me rerun debug50, now. 763 00:38:51,530 --> 00:38:52,970 On ./buggy. 764 00:38:52,970 --> 00:38:55,190 That fancy user interface is going to pop up. 765 00:38:55,190 --> 00:38:58,310 It's going to highlight the line that I set the breakpoint on. 766 00:38:58,310 --> 00:39:01,250 Notice that, on the left hand side of the screen, 767 00:39:01,250 --> 00:39:04,650 i is defaulting, at the moment to 0, because I haven't typed anything in, 768 00:39:04,650 --> 00:39:05,150 yet. 769 00:39:05,150 --> 00:39:10,815 But let me, now, Step Over this line that's highlighted in yellow, 770 00:39:10,815 --> 00:39:12,440 and you'll see that I'm being prompted. 771 00:39:12,440 --> 00:39:16,220 So let's type in my negative 50, Enter. 772 00:39:16,220 --> 00:39:21,470 Notice now that I'm stuck in that function. 773 00:39:21,470 --> 00:39:22,250 All right. 774 00:39:22,250 --> 00:39:26,520 So clearly, the issue seems to be in my get negative int function. 775 00:39:26,520 --> 00:39:30,120 So, OK, let me stop this execution. 776 00:39:30,120 --> 00:39:33,175 My problem doesn't seem to be in main, per se, maybe it's down here. 777 00:39:33,175 --> 00:39:33,800 So that's fine. 778 00:39:33,800 --> 00:39:35,990 Let me set my same breakpoint at line 8. 779 00:39:35,990 --> 00:39:38,510 Let me rerun debug50 one more time. 780 00:39:38,510 --> 00:39:43,110 But this time, instead of just stepping over that line, let's step into it. 781 00:39:43,110 --> 00:39:45,410 So notice line 8 is, again, highlighted in yellow. 782 00:39:45,410 --> 00:39:47,690 In the past I've been clicking Step Over. 783 00:39:47,690 --> 00:39:50,180 Let's click Step into, now. 784 00:39:50,180 --> 00:39:53,480 When I click Step Into, boom, now, the debugger 785 00:39:53,480 --> 00:39:56,390 jumps into that specific function. 786 00:39:56,390 --> 00:39:59,330 Now, I can step through these lines of code, again and again. 787 00:39:59,330 --> 00:40:01,700 I can see what the value of n is as I'm typing it in. 788 00:40:01,700 --> 00:40:03,500 I can think through my logic, and voila. 789 00:40:03,500 --> 00:40:07,640 Hopefully, once I've solved the issue, I can exit the debugger, fix my code, 790 00:40:07,640 --> 00:40:09,180 and move on. 791 00:40:09,180 --> 00:40:12,050 So Step Over just goes over the line, but executes it, 792 00:40:12,050 --> 00:40:17,210 Step Into lets you go into other functions you've written. 793 00:40:17,210 --> 00:40:19,400 So let's go ahead and do this. 794 00:40:19,400 --> 00:40:23,550 We've got a bunch of possible approaches that we 795 00:40:23,550 --> 00:40:25,550 can take to solving some problems let's go ahead 796 00:40:25,550 --> 00:40:26,730 and pace ourselves today, though. 797 00:40:26,730 --> 00:40:27,900 Let's take a five-minute break, here. 798 00:40:27,900 --> 00:40:30,688 And when we come back, we'll take a look at that computer's memory 799 00:40:30,688 --> 00:40:31,730 we've been talking about. 800 00:40:31,730 --> 00:40:32,950 See you in five. 801 00:40:32,950 --> 00:40:36,380 All right. 802 00:40:36,380 --> 00:40:41,000 So let's dive back in. 803 00:40:41,000 --> 00:40:46,860 Up until now, both, by way of week 1 and problems set 1, for the most part, 804 00:40:46,860 --> 00:40:50,660 we've just translated from Scratch into C all of these basic building blocks, 805 00:40:50,660 --> 00:40:53,700 like loops and conditionals, Boolean expressions, variables. 806 00:40:53,700 --> 00:40:54,950 So sort of, more of the same. 807 00:40:54,950 --> 00:40:58,430 But there are features in C that we've already stumbled across already, 808 00:40:58,430 --> 00:41:02,300 like data types, the types of variables that doesn't exist in Scratch, 809 00:41:02,300 --> 00:41:04,450 but that, in fact, does exist in other languages. 810 00:41:04,450 --> 00:41:06,200 In fact, a few that we'll see before long. 811 00:41:06,200 --> 00:41:10,670 So to summarize the types we saw last week, recall this little list here. 812 00:41:10,670 --> 00:41:15,050 We had ints, and floats, and longs, and doubles, and chars, 813 00:41:15,050 --> 00:41:18,510 there's also Booles and also string, which we've seen a few times. 814 00:41:18,510 --> 00:41:21,830 But today, let's actually start to formalize what these things are, 815 00:41:21,830 --> 00:41:25,760 and actually what your Mac and PC are doing when you manipulate bits 816 00:41:25,760 --> 00:41:29,170 as an int versus a char, versus a string, versus something else. 817 00:41:29,170 --> 00:41:31,920 And see if we can't put more tools into your toolkit, so to speak, 818 00:41:31,920 --> 00:41:35,630 so we can start quickly writing more featureful, more sophisticated 819 00:41:35,630 --> 00:41:36,800 programs in C. 820 00:41:36,800 --> 00:41:40,640 So it turns out, that on most systems nowadays, 821 00:41:40,640 --> 00:41:43,010 though this can vary by actual computer, this 822 00:41:43,010 --> 00:41:46,040 is how large each of the data types, typically, 823 00:41:46,040 --> 00:41:51,590 is in C. When you store a Boolean value, a 0 or 1, a true, a false, or true, 824 00:41:51,590 --> 00:41:52,850 it actually uses 1 byte. 825 00:41:52,850 --> 00:41:55,100 That's a little excessive, because, strictly speaking, 826 00:41:55,100 --> 00:41:58,580 you only need 1 bit, which is 1/8 of this size. 827 00:41:58,580 --> 00:42:01,190 But for simplicity, computers use a whole byte 828 00:42:01,190 --> 00:42:03,740 to represent a Boole, true or false. 829 00:42:03,740 --> 00:42:08,040 A char, we saw last week, is only 1 byte, or 8 bits. 830 00:42:08,040 --> 00:42:12,950 And this is why ASCII, which uses 1 byte, or technically, only 7 bits early 831 00:42:12,950 --> 00:42:17,600 on, was confined to only 256 maximally possible characters. 832 00:42:17,600 --> 00:42:21,940 Notice that an int is 4 bytes, or 32 bits. 833 00:42:21,940 --> 00:42:24,580 A float is also 4 bytes or 32 bits. 834 00:42:24,580 --> 00:42:27,850 But the things that we call long, it's, literally, twice as long, 835 00:42:27,850 --> 00:42:29,710 8 bytes or 64 bits. 836 00:42:29,710 --> 00:42:30,430 So is a double. 837 00:42:30,430 --> 00:42:33,900 A double is 64 bits of precision for floating point values. 838 00:42:33,900 --> 00:42:37,215 And a string, for today, we're going to leave as a question mark. 839 00:42:37,215 --> 00:42:39,340 We'll come back to that, later today and next week, 840 00:42:39,340 --> 00:42:42,520 as to how much space a string takes up, but, suffice it to say, 841 00:42:42,520 --> 00:42:45,488 it's going to take up a variable amount of space, 842 00:42:45,488 --> 00:42:47,530 depending on whether the string is short or long. 843 00:42:47,530 --> 00:42:50,470 But we'll see exactly what that means, before long. 844 00:42:50,470 --> 00:42:55,030 So here's a photograph of a typical piece of memory 845 00:42:55,030 --> 00:42:57,760 inside of your Mac, or PC, or phone. 846 00:42:57,760 --> 00:43:00,160 Odds are, it might be a little smaller in some devices. 847 00:43:00,160 --> 00:43:02,950 This is known as RAM, or random access memory. 848 00:43:02,950 --> 00:43:05,410 Each of these little black chips on this circuit 849 00:43:05,410 --> 00:43:07,720 board, the green thing, these little black chips 850 00:43:07,720 --> 00:43:10,630 are where 0s and 1s are actually stored. 851 00:43:10,630 --> 00:43:12,670 Each of those stores some number of bytes. 852 00:43:12,670 --> 00:43:15,130 Maybe megabytes, maybe even gigabytes, nowadays. 853 00:43:15,130 --> 00:43:21,430 So let's focus on one of those chips, to give us a zoomed in version, thereof. 854 00:43:21,430 --> 00:43:25,390 Let's consider the fact that, even though we don't have to care, exactly , 855 00:43:25,390 --> 00:43:29,470 how this kind of thing is made, if this is, like, 1 gigabyte of memory, 856 00:43:29,470 --> 00:43:31,930 for the sake of discussion, it stands to reason that, 857 00:43:31,930 --> 00:43:35,830 if this thing is storing 1 billion bytes, 1 gigabyte, 858 00:43:35,830 --> 00:43:38,110 then we can number them, arbitrarily. 859 00:43:38,110 --> 00:43:41,590 Maybe this will be byte 0, 1, 2, 3, 4, 5, 6, 7, 8. 860 00:43:41,590 --> 00:43:45,000 Then, maybe, way down here in the bottom right corner is byte number 1 billion. 861 00:43:45,000 --> 00:43:48,760 We can just number these things, as might be our convention. 862 00:43:48,760 --> 00:43:50,710 Let's draw that graphically. 863 00:43:50,710 --> 00:43:53,090 Not with a billion squares, but fewer than those. 864 00:43:53,090 --> 00:43:55,410 And let's zoom in further, and consider that. 865 00:43:55,410 --> 00:43:57,160 At this point in the story, let's abstract 866 00:43:57,160 --> 00:43:59,380 away all the hardware, and all the little wires, 867 00:43:59,380 --> 00:44:03,730 and just think of memory as taking up-- or, rather, just think of data 868 00:44:03,730 --> 00:44:06,170 as taking up some number of bytes. 869 00:44:06,170 --> 00:44:09,820 So, for instance, if you were to store a char in a computer's memory, which 870 00:44:09,820 --> 00:44:14,230 was 1 byte, it might be stored at this top left-hand location 871 00:44:14,230 --> 00:44:16,195 of this black chip of memory. 872 00:44:16,195 --> 00:44:20,290 If you were to store something like an integer that uses 4 bytes, well, 873 00:44:20,290 --> 00:44:23,560 it might use four of those bytes, but they're going to be contiguous 874 00:44:23,560 --> 00:44:25,220 back-to-back-to-back, in this case. 875 00:44:25,220 --> 00:44:29,270 If you were to store a long or a double, you might, actually, need 8 bytes. 876 00:44:29,270 --> 00:44:31,390 So I'm filling in these squares to represent 877 00:44:31,390 --> 00:44:36,160 how much memory and given variable of some data type would take up. 878 00:44:36,160 --> 00:44:39,230 1, or 4, or 8, in this case, here. 879 00:44:39,230 --> 00:44:42,160 Well, from here, let's abstract away from all of the hardware 880 00:44:42,160 --> 00:44:44,320 and really focus on memory as being a grid. 881 00:44:44,320 --> 00:44:47,650 Or, really, like a canvas that we can paint any types of data 882 00:44:47,650 --> 00:44:48,850 onto that we want. 883 00:44:48,850 --> 00:44:52,600 At the end of the day, all of this data is just going to be 0s and 1s. 884 00:44:52,600 --> 00:44:56,500 But it's up to you and I to build abstractions on top of that. 885 00:44:56,500 --> 00:45:00,130 Things like actual numbers, colors, images, movies, and beyond. 886 00:45:00,130 --> 00:45:02,440 But we'll start lower-level, here, first. 887 00:45:02,440 --> 00:45:05,950 Suppose I had a program that needs three integers. 888 00:45:05,950 --> 00:45:08,800 A simple program whose purpose in life is to average your three 889 00:45:08,800 --> 00:45:12,400 scores on an exam, or some such thing. 890 00:45:12,400 --> 00:45:17,020 Suppose that your three scores were these, 72, 73, not too bad, and 33, 891 00:45:17,020 --> 00:45:18,145 which is particularly low. 892 00:45:18,145 --> 00:45:23,030 Let's write a program that does this kind of averaging for us. 893 00:45:23,030 --> 00:45:24,860 Let me go back to VS Code, here. 894 00:45:24,860 --> 00:45:28,270 Let me open up a file called scores.c. 895 00:45:28,270 --> 00:45:30,830 Let me implement this as follows. 896 00:45:30,830 --> 00:45:35,860 Let me include stdio.h at the top, int main(void) as before. 897 00:45:35,860 --> 00:45:41,320 Then, inside of main, let me declare score 1, which is 72. 898 00:45:41,320 --> 00:45:43,990 Give me another score, 73. 899 00:45:43,990 --> 00:45:47,140 Then, a third score, called score 3, which is going to be 33. 900 00:45:47,140 --> 00:45:50,740 Now, I'm going to use printf to print out the average of those things, 901 00:45:50,740 --> 00:45:52,520 and I can do this in a few different ways. 902 00:45:52,520 --> 00:45:57,850 But I'm going to print out %f, and I'm going to do score 1, plus score 2, 903 00:45:57,850 --> 00:46:03,760 plus score 3, divided by 3, close parentheses semicolon. 904 00:46:03,760 --> 00:46:07,300 Some relatively simple arithmetic to compute the average of three scores, 905 00:46:07,300 --> 00:46:10,570 if I'm curious what my average grade is in the class with these three 906 00:46:10,570 --> 00:46:11,620 assessments. 907 00:46:11,620 --> 00:46:15,616 Let me, now, do make scores. 908 00:46:15,616 --> 00:46:19,240 All right, so I've somehow made an error already. 909 00:46:19,240 --> 00:46:25,150 But this one is, actually, germane to a problem we, hopefully, 910 00:46:25,150 --> 00:46:26,860 won't encounter too frequently. 911 00:46:26,860 --> 00:46:27,860 What's going on here? 912 00:46:27,860 --> 00:46:31,360 So underlined to score 1, plus score 2, plus score 3, divided by 3. 913 00:46:31,360 --> 00:46:36,250 Format specifies type double, but the argument has type int, well, 914 00:46:36,250 --> 00:46:38,530 what's going on here? 915 00:46:38,530 --> 00:46:40,430 Because the arithmetic seems to check out. 916 00:46:40,430 --> 00:46:40,930 Yeah? 917 00:46:40,930 --> 00:46:44,560 AUDIENCE: So the computer is doing the math, but they basically [INAUDIBLE] 918 00:46:44,560 --> 00:46:49,260 just gives out a value at the end because, well [INAUDIBLE] 919 00:46:49,260 --> 00:46:50,210 DAVID MALAN: Correct. 920 00:46:50,210 --> 00:46:51,640 And we'll come back to this in more detail, 921 00:46:51,640 --> 00:46:54,522 but, indeed, what's happening here is I'm adding three ints together, 922 00:46:54,522 --> 00:46:56,480 obviously, because I define them right up here. 923 00:46:56,480 --> 00:46:59,470 And I'm dividing by another int, 3, but the catch 924 00:46:59,470 --> 00:47:03,890 is, recall that C when it performs math, treats all of these things as integers. 925 00:47:03,890 --> 00:47:05,810 But integers are not floating point value. 926 00:47:05,810 --> 00:47:08,890 So if you actually want to get a precise, average for your score 927 00:47:08,890 --> 00:47:12,760 without throwing away the remainder, everything after the decimal point, 928 00:47:12,760 --> 00:47:15,430 it turns out, we're going to have to-- 929 00:47:15,430 --> 00:47:17,410 we're going to-- aww-- 930 00:47:17,410 --> 00:47:18,430 we're going to have to-- 931 00:47:18,430 --> 00:47:22,720 [LAUGHTER] we're going to have to convert this whole expression, somehow, 932 00:47:22,720 --> 00:47:23,350 to a float. 933 00:47:23,350 --> 00:47:26,230 And there's a few ways to do this but the easiest way, 934 00:47:26,230 --> 00:47:28,540 for now, I'm going to go ahead and do this up here, 935 00:47:28,540 --> 00:47:31,360 I'm going to change the divide by 3 to divide by 3.0. 936 00:47:31,360 --> 00:47:35,440 Because it turns out, long story short, in C, so long as one of the values 937 00:47:35,440 --> 00:47:37,300 participating in an arithmetic expression 938 00:47:37,300 --> 00:47:39,730 like this is something like a float, the rest 939 00:47:39,730 --> 00:47:44,210 will be treated as promoted to a floating point value as well. 940 00:47:44,210 --> 00:47:49,495 So let me, now, recompile this code with make scores, Enter. 941 00:47:49,495 --> 00:47:53,500 This time it worked OK, because I'm treating a float as a float. 942 00:47:53,500 --> 00:47:55,600 Let me do . /scores, Enter. 943 00:47:55,600 --> 00:48:00,150 All right, my average is 59.33333 and so forth. 944 00:48:00,150 --> 00:48:00,650 All right. 945 00:48:00,650 --> 00:48:03,340 So the math, presumably, checks out. 946 00:48:03,340 --> 00:48:06,220 Floating point imprecision per last week aside. 947 00:48:06,220 --> 00:48:09,280 But let's consider the design of this program. 948 00:48:09,280 --> 00:48:16,680 What is, kind of, bad about it, or if we maintain this program longer term, 949 00:48:16,680 --> 00:48:19,480 are we going to regret the design of this program? 950 00:48:19,480 --> 00:48:20,990 What might not be ideal here? 951 00:48:20,990 --> 00:48:21,490 Yeah? 952 00:48:21,490 --> 00:48:30,364 AUDIENCE: [INAUDIBLE] 953 00:48:30,364 --> 00:48:34,220 DAVID MALAN: Yeah, so in this case, I have hard coded my three scores. 954 00:48:34,220 --> 00:48:37,140 So, if I'm hearing you correctly, this program 955 00:48:37,140 --> 00:48:39,600 is only ever going to tell me this specific average. 956 00:48:39,600 --> 00:48:41,730 I'm not even using something like, get int 957 00:48:41,730 --> 00:48:44,790 or get float to get three different scores, so that's not good. 958 00:48:44,790 --> 00:48:46,942 And suppose that we wait later in the semester, 959 00:48:46,942 --> 00:48:48,400 I think other problems could arise. 960 00:48:48,400 --> 00:48:48,900 Yeah? 961 00:48:48,900 --> 00:48:51,020 AUDIENCE: Just thinking also somewhat of an issue 962 00:48:51,020 --> 00:48:52,900 that you can't reuse that number. 963 00:48:52,900 --> 00:48:55,450 DAVID MALAN: I can't reuse the number because I 964 00:48:55,450 --> 00:48:59,088 haven't stored the average in some variable, which in this program, not 965 00:48:59,088 --> 00:49:01,630 a big deal, but certainly, if I wanted to reuse it elsewhere, 966 00:49:01,630 --> 00:49:02,650 that's a problem. 967 00:49:02,650 --> 00:49:05,025 Let's fast-forward again, a little later in the semester, 968 00:49:05,025 --> 00:49:07,390 I don't just have three test scores or exam scores, 969 00:49:07,390 --> 00:49:09,430 maybe I have 4, or 5, or 6. 970 00:49:09,430 --> 00:49:10,690 Where might this take us? 971 00:49:10,690 --> 00:49:12,301 AUDIENCE: Yeah, if you ever want to have to take 972 00:49:12,301 --> 00:49:14,900 the average of any number of scores other than 3, [INAUDIBLE] 973 00:49:14,900 --> 00:49:18,110 DAVID MALAN: Yeah, I've sort of, capped this program at 3. 974 00:49:18,110 --> 00:49:20,942 And honestly, this is, kind of, bordering on copy paste. 975 00:49:20,942 --> 00:49:23,900 Even though the variables, yes, have different names; score 1, score 2, 976 00:49:23,900 --> 00:49:24,800 score 3. 977 00:49:24,800 --> 00:49:27,230 Imagine doing this for a whole grade book for a class. 978 00:49:27,230 --> 00:49:32,990 Having to score 4, 5, 6, 11 10, 12, 20, 30, that's a lot of variables. 979 00:49:32,990 --> 00:49:35,420 You can imagine just how ugly the code starts 980 00:49:35,420 --> 00:49:38,635 to get if you're just defining variable after variable, after variable. 981 00:49:38,635 --> 00:49:42,740 So it turns out, there are better ways, in languages like C, 982 00:49:42,740 --> 00:49:47,240 if you want to have multiple values stored in memory that 983 00:49:47,240 --> 00:49:49,040 happened to be of the same data type. 984 00:49:49,040 --> 00:49:50,420 Let's take a look back at this memory, here, 985 00:49:50,420 --> 00:49:52,545 to see what these things might look like in memory. 986 00:49:52,545 --> 00:49:54,170 Here's that grid of memory. 987 00:49:54,170 --> 00:49:56,450 Each of these recall represents a byte. 988 00:49:56,450 --> 00:49:59,690 To be clear, if I store score 1 in memory first, 989 00:49:59,690 --> 00:50:01,130 how many bytes will it take up? 990 00:50:01,130 --> 00:50:02,520 AUDIENCE: [INAUDIBLE] 991 00:50:02,520 --> 00:50:03,650 DAVID MALAN: So 4, a.k.a. 992 00:50:03,650 --> 00:50:04,430 32 bits. 993 00:50:04,430 --> 00:50:08,578 So I might draw a score 1 as filling up this part of the memory. 994 00:50:08,578 --> 00:50:11,870 It's up to the computer as to whether it goes here, or down there, or wherever. 995 00:50:11,870 --> 00:50:15,290 I'm just keeping the pictures clean for today, from the top-left on down. 996 00:50:15,290 --> 00:50:18,080 If I, then, declare another variable, called score 2, 997 00:50:18,080 --> 00:50:20,730 it might end up over there, also taking up 4 bytes. 998 00:50:20,730 --> 00:50:23,330 And then score 3 might end up here. 999 00:50:23,330 --> 00:50:26,880 So that's just representing what's going on inside of the computer's memory. 1000 00:50:26,880 --> 00:50:30,680 But technically speaking, to be clear, per week 0, what's 1001 00:50:30,680 --> 00:50:34,580 really being stored in the computer's memory, are patterns of 0s and 1s. 1002 00:50:34,580 --> 00:50:39,350 32 total, in this case, because 32 bits is 4 bytes. 1003 00:50:39,350 --> 00:50:43,280 But again, it gets boring quickly to think in and look 1004 00:50:43,280 --> 00:50:44,760 at binary all the time. 1005 00:50:44,760 --> 00:50:47,120 So we'll, generally, abstract this away as just using 1006 00:50:47,120 --> 00:50:49,550 decimal numbers, in this case, instead. 1007 00:50:49,550 --> 00:50:54,170 But there might be a better way to store, not just three of these things, 1008 00:50:54,170 --> 00:50:57,500 but maybe four, maybe, five, maybe 10, maybe, more, 1009 00:50:57,500 --> 00:51:03,110 by declaring one variable to store all of them, instead of 3, or 4, or 5, 1010 00:51:03,110 --> 00:51:05,750 or more individual variables. 1011 00:51:05,750 --> 00:51:10,250 The way to do this is by way of something known as an array. 1012 00:51:10,250 --> 00:51:18,320 An array is another type of data that allows you to store multiple values 1013 00:51:18,320 --> 00:51:20,980 of the same type back-to-back-to-back. 1014 00:51:20,980 --> 00:51:22,230 That is, to say, contiguously. 1015 00:51:22,230 --> 00:51:29,840 So an array can let you create memory for one int, or two, or three, 1016 00:51:29,840 --> 00:51:32,600 or even more than that, but describe them 1017 00:51:32,600 --> 00:51:36,390 all using the same variable name, the same one name. 1018 00:51:36,390 --> 00:51:40,740 So for instance, if, for one program, I only need three integers, 1019 00:51:40,740 --> 00:51:45,800 but I don't want to messily declare them as score 1, score 2, score 3, 1020 00:51:45,800 --> 00:51:46,960 I can do this, instead. 1021 00:51:46,960 --> 00:51:49,130 This is today's first new piece of syntax, 1022 00:51:49,130 --> 00:51:51,290 the square brackets that we're now seeing. 1023 00:51:51,290 --> 00:51:57,140 This line of code, here, is similar to int score 1 semicolon, 1024 00:51:57,140 --> 00:52:00,360 or int score 1 equals 72 semicolon. 1025 00:52:00,360 --> 00:52:05,780 This line of code is declaring for me, so to speak, an array of size 3. 1026 00:52:05,780 --> 00:52:09,260 And that array is going to store three integers. 1027 00:52:09,260 --> 00:52:09,770 Why? 1028 00:52:09,770 --> 00:52:14,990 Because the type of that array is an int, here. 1029 00:52:14,990 --> 00:52:18,110 The square brackets tell the computer how many ints you want. 1030 00:52:18,110 --> 00:52:18,980 In this case, 3. 1031 00:52:18,980 --> 00:52:21,140 And the name is, of course, scores. 1032 00:52:21,140 --> 00:52:23,540 Which, in English, I've deliberately pluralized 1033 00:52:23,540 --> 00:52:28,100 so that I can describe this array as storing multiple scores, indeed. 1034 00:52:28,100 --> 00:52:32,970 So if I want to now assign values to this variable, called scores, 1035 00:52:32,970 --> 00:52:34,760 I can do code like this. 1036 00:52:34,760 --> 00:52:40,160 I can say, scores bracket 0 equals 72, scores bracket 1 equals 73, 1037 00:52:40,160 --> 00:52:42,190 and scores bracket 2 equals 33. 1038 00:52:42,190 --> 00:52:43,940 The only thing weird there is, admittedly, 1039 00:52:43,940 --> 00:52:45,830 the square brackets which are still new. 1040 00:52:45,830 --> 00:52:49,820 But we're also, notice, 0 indexing things. 1041 00:52:49,820 --> 00:52:52,345 To zero index means to start counting at 0. 1042 00:52:52,345 --> 00:52:54,470 When we've talked about that before, our four loops 1043 00:52:54,470 --> 00:52:56,000 have, generally, been zero indexed. 1044 00:52:56,000 --> 00:52:59,870 Arrays in C are zero indexed. 1045 00:52:59,870 --> 00:53:01,430 And you do not have choice over that. 1046 00:53:01,430 --> 00:53:04,550 You can't start counting at 1 in arrays because you prefer to, 1047 00:53:04,550 --> 00:53:06,830 you'd be sacrificing one of the elements. 1048 00:53:06,830 --> 00:53:09,620 You have to start in arrays counting from 0. 1049 00:53:09,620 --> 00:53:13,130 So out of context, this doesn't solve a problem, 1050 00:53:13,130 --> 00:53:15,200 but it, definitely, is going to once we have more 1051 00:53:15,200 --> 00:53:16,910 than, even, three scores here. 1052 00:53:16,910 --> 00:53:19,750 In fact, let me change this program a little bit. 1053 00:53:19,750 --> 00:53:21,450 Let me go back to VS Code. 1054 00:53:21,450 --> 00:53:24,020 And delete these three lines, here. 1055 00:53:24,020 --> 00:53:27,080 And replace it with a scores variable that's 1056 00:53:27,080 --> 00:53:30,140 ready to store three total integers. 1057 00:53:30,140 --> 00:53:34,130 And then, initialize them as follows, scores bracket 0 is 72, 1058 00:53:34,130 --> 00:53:38,300 as before, scores bracket 1 is going to be 73, scores bracket 2 1059 00:53:38,300 --> 00:53:39,740 is going to be 33. 1060 00:53:39,740 --> 00:53:44,068 Notice, I do not need to say int before any of these lines, 1061 00:53:44,068 --> 00:53:45,860 because that's been taken care of, already, 1062 00:53:45,860 --> 00:53:50,570 for me on line 5, where I already specified that everything in this array 1063 00:53:50,570 --> 00:53:53,330 is going to be an int. 1064 00:53:53,330 --> 00:53:57,020 Now, down here, this code needs to change because I no longer have 1065 00:53:57,020 --> 00:53:59,300 three variables, score 1, 2, and 3. 1066 00:53:59,300 --> 00:54:03,950 I have 1 variable, but that I can index into. 1067 00:54:03,950 --> 00:54:08,750 I'm going to, here, then, do scores bracket 0, plus scores bracket 1, 1068 00:54:08,750 --> 00:54:13,370 plus scores bracket 2, which is equivalent to what I did earlier, 1069 00:54:13,370 --> 00:54:14,900 giving me back those three integers. 1070 00:54:14,900 --> 00:54:17,860 But notice, I'm using the same variable name, every time. 1071 00:54:17,860 --> 00:54:21,070 And again, I'm using this new square bracket notation to, quote-unquote, 1072 00:54:21,070 --> 00:54:26,590 index into the array to get at the first int, the second int, and the third, 1073 00:54:26,590 --> 00:54:28,840 and then, to do it again down here. 1074 00:54:28,840 --> 00:54:31,907 Now, this program, still not really solving all the problems we describe, 1075 00:54:31,907 --> 00:54:34,240 I still can only store three scores, but we'll come back 1076 00:54:34,240 --> 00:54:35,930 to something like that before long. 1077 00:54:35,930 --> 00:54:38,950 But for now, we're just introducing a new syntax and a new feature, 1078 00:54:38,950 --> 00:54:44,980 whereby, I can now store multiple values in the same variable. 1079 00:54:44,980 --> 00:54:47,110 Well, let's enhance this a bit more. 1080 00:54:47,110 --> 00:54:50,660 Instead of hard coding these scores, as was identified as a problem, 1081 00:54:50,660 --> 00:54:54,790 let's use get int to ask the user for a score. 1082 00:54:54,790 --> 00:54:58,330 Let's, then, use get int to ask the user for another score. 1083 00:54:58,330 --> 00:55:01,540 Let's use get int to ask the user for a third score, 1084 00:55:01,540 --> 00:55:04,400 storing them in those respective locations. 1085 00:55:04,400 --> 00:55:09,820 And, now, if I go ahead and save this program, recompile scores, huh. 1086 00:55:09,820 --> 00:55:10,900 I've messed up, here. 1087 00:55:10,900 --> 00:55:13,990 Now these errors should be getting a little familiar. 1088 00:55:13,990 --> 00:55:16,750 What mistake did I make? 1089 00:55:16,750 --> 00:55:17,875 Let me give folks a moment. 1090 00:55:17,875 --> 00:55:18,970 AUDIENCE: cs50.h 1091 00:55:18,970 --> 00:55:21,100 DAVID MALAN: cs50.h. 1092 00:55:21,100 --> 00:55:24,220 That was not intentional, so still making mistakes all these years later. 1093 00:55:24,220 --> 00:55:26,320 I need to include cs50.h. 1094 00:55:26,320 --> 00:55:29,570 Now, I'm going to go back to the bottom in the terminal window, make scores. 1095 00:55:29,570 --> 00:55:30,070 OK. 1096 00:55:30,070 --> 00:55:31,670 We're back in business, ./scores. 1097 00:55:31,670 --> 00:55:33,920 Now, the program is getting a little more interesting. 1098 00:55:33,920 --> 00:55:38,020 So maybe, this year was better and I got a 100, and a 99, and a 98, and there, 1099 00:55:38,020 --> 00:55:40,900 my average is 99.0000. 1100 00:55:40,900 --> 00:55:42,370 So now, it's a little more dynamic. 1101 00:55:42,370 --> 00:55:43,270 It's a little more interesting. 1102 00:55:43,270 --> 00:55:45,978 But it's still capping the number of scores at three, admittedly. 1103 00:55:45,978 --> 00:55:50,740 But now, I've introduced another, sort of, symptom of bad programming. 1104 00:55:50,740 --> 00:55:54,108 There's this expression in programming, too, called code smell, where like-- 1105 00:55:54,108 --> 00:55:55,900 [SNIFFS AIR] something smells a little off. 1106 00:55:55,900 --> 00:56:00,550 And there's something off here in that I could do better with this code. 1107 00:56:00,550 --> 00:56:05,080 Does anyone see an opportunity to improve the design of this code, here, 1108 00:56:05,080 --> 00:56:08,230 if my goal, still, is to get three scores from the user but [SNIFF SNIFF] 1109 00:56:08,230 --> 00:56:10,430 without it smelling [SNIFF] kind of bad? 1110 00:56:10,430 --> 00:56:10,930 Yeah? 1111 00:56:10,930 --> 00:56:12,940 AUDIENCE: [INAUDIBLE] use a 4 loop? 1112 00:56:12,940 --> 00:56:15,958 That way you don't have to copy and paste all of those scores. 1113 00:56:15,958 --> 00:56:17,160 DAVID MALAN: Yeah, exactly. 1114 00:56:17,160 --> 00:56:19,022 Those lines of code are almost identical. 1115 00:56:19,022 --> 00:56:21,480 And honestly, the only thing that's changing is the number, 1116 00:56:21,480 --> 00:56:23,100 and it's just incrementing by 1. 1117 00:56:23,100 --> 00:56:25,330 We have all of the building blocks to do this better. 1118 00:56:25,330 --> 00:56:27,130 So let me go ahead and improve this. 1119 00:56:27,130 --> 00:56:29,560 Let me delete that code. 1120 00:56:29,560 --> 00:56:31,720 Let me, now, have a 4 loop. 1121 00:56:31,720 --> 00:56:36,150 So for int i get 0, i less than 3, i plus plus. 1122 00:56:36,150 --> 00:56:39,060 Then, inside of this 4 loop, I can distill all three 1123 00:56:39,060 --> 00:56:40,860 of those lines into something more generic, 1124 00:56:40,860 --> 00:56:46,530 like scores bracket i equals get int, and now, ask the user, just 1125 00:56:46,530 --> 00:56:48,905 once, via get int, for a score. 1126 00:56:48,905 --> 00:56:52,000 So this is where arrays start to get pretty powerful. 1127 00:56:52,000 --> 00:56:54,000 You don't have to hard code, that is, literally, 1128 00:56:54,000 --> 00:56:56,462 type in all of these magic numbers like 0, 1, and 2. 1129 00:56:56,462 --> 00:56:58,170 You can start to do it, programmatically, 1130 00:56:58,170 --> 00:56:59,770 as you propose with a loop. 1131 00:56:59,770 --> 00:57:01,350 So now, I've tightened things up. 1132 00:57:01,350 --> 00:57:04,230 I'm now, dynamically, getting three different scores, 1133 00:57:04,230 --> 00:57:06,766 but putting them in three different locations. 1134 00:57:06,766 --> 00:57:10,470 And so this program, ultimately, is going to work, pretty much, the same. 1135 00:57:10,470 --> 00:57:17,520 Make scores, ./scores, and 100, 99, 98, and we're back to the same answer. 1136 00:57:17,520 --> 00:57:19,440 But it's a little better designed, too. 1137 00:57:19,440 --> 00:57:21,360 If I really want to nitpick, there's something 1138 00:57:21,360 --> 00:57:23,100 that still smells, a little bit, here. 1139 00:57:23,100 --> 00:57:27,540 The fact that I have indeed, this magic number three, that really 1140 00:57:27,540 --> 00:57:29,890 has to be the same as this number here. 1141 00:57:29,890 --> 00:57:32,170 Otherwise, who knows what's going to go wrong. 1142 00:57:32,170 --> 00:57:34,380 So what might be a solution, per last week, 1143 00:57:34,380 --> 00:57:36,960 to cleaning that code up further, too? 1144 00:57:36,960 --> 00:57:39,750 AUDIENCE: [INAUDIBLE] the user's discretion 1145 00:57:39,750 --> 00:57:41,742 how many input scores [INAUDIBLE]. 1146 00:57:41,742 --> 00:57:44,790 DAVID MALAN: OK, so we could leave it up to the user's discretion. 1147 00:57:44,790 --> 00:57:47,500 And so we could, actually, do something like this. 1148 00:57:47,500 --> 00:57:49,200 Let me take this a few steps ahead. 1149 00:57:49,200 --> 00:57:56,230 Let me say something like, int n gets get int, how many scores question mark, 1150 00:57:56,230 --> 00:58:00,600 then I could actually change this to an n, and then this to an n, 1151 00:58:00,600 --> 00:58:02,970 and, indeed, make the whole program dynamic? 1152 00:58:02,970 --> 00:58:05,670 Ask the human how many tests have there been this semester? 1153 00:58:05,670 --> 00:58:07,500 Then, you can type in each of those scores 1154 00:58:07,500 --> 00:58:09,708 because the loop is going to iterate that many times. 1155 00:58:09,708 --> 00:58:13,020 And then you'll get the average of one test, two test, three-- 1156 00:58:13,020 --> 00:58:17,520 well, lost another-- or however many scores that were actually 1157 00:58:17,520 --> 00:58:20,760 specified by the user Yeah, question? 1158 00:58:20,760 --> 00:58:25,765 AUDIENCE: How many bits or bytes get used in an array? 1159 00:58:25,765 --> 00:58:28,060 DAVID MALAN: How many bytes are used in an array? 1160 00:58:28,060 --> 00:58:32,524 AUDIENCE: [INAUDIBLE] point of doing this is to save [INAUDIBLE] 1161 00:58:32,524 --> 00:58:35,500 DAVID MALAN: So the purpose of an array is not to save space. 1162 00:58:35,500 --> 00:58:39,010 It's to eliminate having multiple variable names 1163 00:58:39,010 --> 00:58:40,900 because that gets very messy quickly. 1164 00:58:40,900 --> 00:58:44,980 If you have score 1, score 2, score 3, dot, dot, dot, score 99, 1165 00:58:44,980 --> 00:58:48,100 that's, like, 99 different variables, potentially, 1166 00:58:48,100 --> 00:58:54,160 that you could collapse into one variable that has 99 locations. 1167 00:58:54,160 --> 00:58:56,230 At different indices, or indexes. 1168 00:58:56,230 --> 00:58:58,570 As someone would say, the index for an array 1169 00:58:58,570 --> 00:59:00,756 is whatever is in the square brackets. 1170 00:59:00,756 --> 00:59:11,560 AUDIENCE: [INAUDIBLE] 1171 00:59:11,560 --> 00:59:13,280 DAVID MALAN: So it's a good question. 1172 00:59:13,280 --> 00:59:15,370 So if you-- I'm using ints for everything-- 1173 00:59:15,370 --> 00:59:17,560 and honestly, we don't really need ints for scores 1174 00:59:17,560 --> 00:59:21,770 because I'm not likely to get a 2 billion on a test anytime soon. 1175 00:59:21,770 --> 00:59:23,620 And so you could use different data types. 1176 00:59:23,620 --> 00:59:26,287 And that list we had on the screen, earlier, is not all of them. 1177 00:59:26,287 --> 00:59:29,770 There's a data type called short, which is shorter than an int, 1178 00:59:29,770 --> 00:59:34,850 you could, technically, use char, in some form or other data types as well. 1179 00:59:34,850 --> 00:59:36,940 Generally speaking, in the year 2021, these 1180 00:59:36,940 --> 00:59:40,990 tend to be over optima-- overly optimized decisions. 1181 00:59:40,990 --> 00:59:42,940 Everyone just uses ints, even though no one 1182 00:59:42,940 --> 00:59:46,300 is going to get a test score that's 2 billion, or more, because int is just, 1183 00:59:46,300 --> 00:59:47,260 kind of, the go-to. 1184 00:59:47,260 --> 00:59:50,252 Years ago, memory was expensive. 1185 00:59:50,252 --> 00:59:52,210 And every one of your instincts would have been 1186 00:59:52,210 --> 00:59:54,700 spot on because memory is so tight. 1187 00:59:54,700 --> 00:59:56,930 But, nowadays, we don't worry as much about it. 1188 00:59:56,930 --> 00:59:57,430 Yeah? 1189 00:59:57,430 --> 01:00:02,556 AUDIENCE: I have a question about the error [INAUDIBLE].. 1190 01:00:02,556 --> 01:00:06,605 Could it-- when you're doing a hash problem on the problem set-- 1191 01:00:06,605 --> 01:00:10,010 DAVID MALAN: So what is the difference between dividing two ints 1192 01:00:10,010 --> 01:00:12,380 and not getting an error, as you might have encountered 1193 01:00:12,380 --> 01:00:15,920 in a program like cash, versus dividing two ints 1194 01:00:15,920 --> 01:00:18,150 and getting an error like I did a moment ago? 1195 01:00:18,150 --> 01:00:22,280 The problem with the scenario I created a moment ago was printf was involved. 1196 01:00:22,280 --> 01:00:27,980 And I was telling printf to use a %f, but I was giving printf the result 1197 01:00:27,980 --> 01:00:30,580 of dividing integers by another integer. 1198 01:00:30,580 --> 01:00:32,930 So it was printf that was yelling at me. 1199 01:00:32,930 --> 01:00:35,930 I'm guessing in the scenario you're describing, for something like cash, 1200 01:00:35,930 --> 01:00:39,180 printf was not involved in that particular line of code. 1201 01:00:39,180 --> 01:00:40,865 So that's the difference, there. 1202 01:00:40,865 --> 01:00:41,660 All right. 1203 01:00:41,660 --> 01:00:45,110 So we, now, have this ability to create an array. 1204 01:00:45,110 --> 01:00:47,510 And an array can store multiple values. 1205 01:00:47,510 --> 01:00:51,450 What, then, might we do that's more interesting than just storing numbers 1206 01:00:51,450 --> 01:00:51,950 in memory? 1207 01:00:51,950 --> 01:00:54,230 Well, let's take this one step further. 1208 01:00:54,230 --> 01:01:01,130 As opposed to just storing 72, 73, 33 or 100, 99, 98, at these given locations, 1209 01:01:01,130 --> 01:01:05,930 because again, an array gives you one variable name, but multiple locations, 1210 01:01:05,930 --> 01:01:08,360 or indices therein, bracket 0, bracket 1, 1211 01:01:08,360 --> 01:01:11,330 bracket 2 on up, if it were even bigger than that. 1212 01:01:11,330 --> 01:01:16,100 Let's, now, start to consider something more modest, like simple chars. 1213 01:01:16,100 --> 01:01:18,830 Chars, being 1 byte each, so they're even smaller, 1214 01:01:18,830 --> 01:01:20,090 they take up much less space. 1215 01:01:20,090 --> 01:01:22,048 And, indeed, if I wanted to say a message like, 1216 01:01:22,048 --> 01:01:24,200 hi I could use three variables. 1217 01:01:24,200 --> 01:01:28,520 If I wanted a program to print, hi, H-I exclamation point, 1218 01:01:28,520 --> 01:01:33,230 I could, of course, store those in three variables, like c1, c2, c3. 1219 01:01:33,230 --> 01:01:36,710 And let's, for the sake of discussion, let's whip this up real quickly. 1220 01:01:36,710 --> 01:01:39,680 Let me create a new program, now, in VS Code. 1221 01:01:39,680 --> 01:01:42,920 This time, I'm going to call it hi.c. 1222 01:01:42,920 --> 01:01:45,650 And I'm not going to bother with the CS50 library. 1223 01:01:45,650 --> 01:01:47,660 I just need the standard I/O one, for now. 1224 01:01:47,660 --> 01:01:49,220 int main(void). 1225 01:01:49,220 --> 01:01:52,400 And then, inside of main, I'm going to, simply, create three variables. 1226 01:01:52,400 --> 01:01:55,760 And this is already, hopefully, striking you as a bad idea. 1227 01:01:55,760 --> 01:01:58,310 But we'll go down this road, temporarily, 1228 01:01:58,310 --> 01:02:02,300 with c1, and c2, and, finally, c3. 1229 01:02:02,300 --> 01:02:05,660 Storing each character in the phrase I want to print, 1230 01:02:05,660 --> 01:02:09,450 and I'm going to print this in a different way than usual. 1231 01:02:09,450 --> 01:02:10,880 Now I'm dealing with chars. 1232 01:02:10,880 --> 01:02:14,480 And we've, generally, dealt with strings, which was easier last week. 1233 01:02:14,480 --> 01:02:21,600 But %c, %c, %c, will let me print out three chars, and like c1, c2, and c3. 1234 01:02:21,600 --> 01:02:24,420 So, kind of, a stupid way of printing out a string. 1235 01:02:24,420 --> 01:02:26,940 So we already have a solution to this problem last week. 1236 01:02:26,940 --> 01:02:30,540 But let's poke around at what's going on underneath the hood, here. 1237 01:02:30,540 --> 01:02:33,350 So let's make hi, ./hi. 1238 01:02:33,350 --> 01:02:34,475 And, voila no surprise. 1239 01:02:34,475 --> 01:02:36,350 But we, again, could have done this last week 1240 01:02:36,350 --> 01:02:39,530 with a string and just one variable, or even, 0, at that. 1241 01:02:39,530 --> 01:02:43,220 But let's start converting these characters 1242 01:02:43,220 --> 01:02:47,750 to their apparent numeric equivalents like we talked about in week 0 too. 1243 01:02:47,750 --> 01:02:52,310 Let me modify these %c's, just to be fun, to be %i's. 1244 01:02:52,310 --> 01:02:56,180 And let me add some spaces so there are gaps between each of them. 1245 01:02:56,180 --> 01:03:00,350 Let me, now, recompile hi, and let me rerun it. 1246 01:03:00,350 --> 01:03:02,900 Just to guess, what should I see on the screen now? 1247 01:03:05,690 --> 01:03:06,200 Any guesses? 1248 01:03:06,200 --> 01:03:06,700 Yeah? 1249 01:03:06,700 --> 01:03:08,036 AUDIENCE: The ASCII values? 1250 01:03:08,036 --> 01:03:09,760 DAVID MALAN: The ASCII values. 1251 01:03:09,760 --> 01:03:12,220 And it's intentional that I keep using the same word, 1252 01:03:12,220 --> 01:03:18,250 hi, because it should be, hopefully, the old friends, 72, 73, and 33. 1253 01:03:18,250 --> 01:03:22,120 Which, is to say, that c knows about ASCII, or equivalently, Unicode, 1254 01:03:22,120 --> 01:03:24,320 and can do this conversion for us automatically. 1255 01:03:24,320 --> 01:03:27,670 And it seems to be doing it implicitly for us, so to speak. 1256 01:03:27,670 --> 01:03:31,000 Notice that c1, c2 and c3 are, obviously, chars, 1257 01:03:31,000 --> 01:03:34,420 but printf is able to tolerate printing them as integers. 1258 01:03:34,420 --> 01:03:38,870 If I really want it to be pedantic, I could use this technique, again, 1259 01:03:38,870 --> 01:03:41,320 known as typecasting, where I can actually 1260 01:03:41,320 --> 01:03:46,610 convert one data type to another, if it makes logical sense to do so. 1261 01:03:46,610 --> 01:03:49,900 And we saw in week 0, chars, or characters, 1262 01:03:49,900 --> 01:03:53,500 are just numbers, like 72, 73, and 33. 1263 01:03:53,500 --> 01:03:57,680 So I can use this parenthetical expression to convert, incorrectly, 1264 01:03:57,680 --> 01:04:02,623 [LAUGHTER] three chars to three integers, instead. 1265 01:04:02,623 --> 01:04:04,540 So that's what I meant to type the first time. 1266 01:04:04,540 --> 01:04:05,040 There we go. 1267 01:04:05,040 --> 01:04:05,800 Strike two, today. 1268 01:04:05,800 --> 01:04:09,280 So parenthesis, int, close parenthesis says 1269 01:04:09,280 --> 01:04:14,840 take whatever variable comes after this, c1, c2, or c3 and convert it to an int. 1270 01:04:14,840 --> 01:04:18,640 The effect is going to be no different, make hi, and then rerunning whoops-- 1271 01:04:18,640 --> 01:04:24,910 then running ./hi still works the same, but now I'm explicitly converting chars 1272 01:04:24,910 --> 01:04:25,660 to ints. 1273 01:04:25,660 --> 01:04:29,260 And we can do this all day long, chars to ints, floats to ints, 1274 01:04:29,260 --> 01:04:30,250 ints to floats. 1275 01:04:30,250 --> 01:04:31,888 Sometimes, it's equivalent. 1276 01:04:31,888 --> 01:04:33,805 Other times, you're going to lose information. 1277 01:04:33,805 --> 01:04:37,270 Taking a float to an int, just intuitively, 1278 01:04:37,270 --> 01:04:39,790 is going to throw away everything after the decimal point, 1279 01:04:39,790 --> 01:04:42,680 because an int has no decimal point. 1280 01:04:42,680 --> 01:04:45,100 But, for now, I'm going to rewind to the version of this 1281 01:04:45,100 --> 01:04:49,150 that just did implicit-type conversion, or implicit casting, 1282 01:04:49,150 --> 01:04:53,350 just to demonstrate that we can, indeed, see the values underneath the hood. 1283 01:04:53,350 --> 01:04:53,950 All right. 1284 01:04:53,950 --> 01:04:56,370 Let me go ahead and do this, now, the week 1 way. 1285 01:04:56,370 --> 01:04:57,370 This was kind of stupid. 1286 01:04:57,370 --> 01:05:00,205 Let's just do printf, quote-unquote-- 1287 01:05:00,205 --> 01:05:04,630 Actually, let's do this, string s equals quote-unquote hi, 1288 01:05:04,630 --> 01:05:09,680 and then let's do a simple printf with %s, printing out s's there. 1289 01:05:09,680 --> 01:05:12,520 So now I've rewound to last week, where we began this story, 1290 01:05:12,520 --> 01:05:16,660 but you'll notice that, if we keep playing around with this-- 1291 01:05:16,660 --> 01:05:18,860 whoops, what did I do here? 1292 01:05:18,860 --> 01:05:23,470 Oh, and let me introduce the C50 library here, more on that next before long. 1293 01:05:23,470 --> 01:05:26,260 Let me go ahead and recompile, rerun this, 1294 01:05:26,260 --> 01:05:28,268 we seem to be coding in circles, here. 1295 01:05:28,268 --> 01:05:30,810 Like, I've just done the same thing multiple, different ways. 1296 01:05:30,810 --> 01:05:33,400 But there's clearly an equivalence, then, 1297 01:05:33,400 --> 01:05:36,978 between sequences of chars and strings. 1298 01:05:36,978 --> 01:05:38,770 And if you do it the real pedantic way, you 1299 01:05:38,770 --> 01:05:43,390 have three different variables, c1, c2, c3, representing H-I exclamation point, 1300 01:05:43,390 --> 01:05:47,870 or you can just treat them all together like this h, i, exclamation point. 1301 01:05:47,870 --> 01:05:52,030 But it turns out that strings are actually 1302 01:05:52,030 --> 01:05:58,060 implemented by the computer in a pretty now familiar way. 1303 01:05:58,060 --> 01:06:04,382 What might a string actually be as of this point in the story? 1304 01:06:04,382 --> 01:06:05,590 Where are we going with this? 1305 01:06:05,590 --> 01:06:06,923 Let me try to look further back. 1306 01:06:06,923 --> 01:06:07,850 Yeah, in way back? 1307 01:06:07,850 --> 01:06:08,350 Yeah? 1308 01:06:08,350 --> 01:06:10,600 AUDIENCE: Can a string like this be an array of chars? 1309 01:06:10,600 --> 01:06:13,410 DAVID MALAN: Yeah, a string might be, and indeed is, just 1310 01:06:13,410 --> 01:06:14,800 an array of characters. 1311 01:06:14,800 --> 01:06:17,190 So last week we took for granted that strings exist. 1312 01:06:17,190 --> 01:06:19,530 Technically, strings exist, but they're implemented 1313 01:06:19,530 --> 01:06:23,070 as arrays of characters, which actually opens up 1314 01:06:23,070 --> 01:06:25,770 some interesting possibilities for us. 1315 01:06:25,770 --> 01:06:28,300 Because, let me see, let me see if I can do this. 1316 01:06:28,300 --> 01:06:31,560 Let me try to print out, now, three integers again. 1317 01:06:31,560 --> 01:06:37,530 But if string s is but an array, as you propose, maybe I can do s bracket 0, 1318 01:06:37,530 --> 01:06:39,760 s bracket 1, and s bracket 2. 1319 01:06:39,760 --> 01:06:43,650 So maybe I can start poking around inside of strings, 1320 01:06:43,650 --> 01:06:45,630 even though we didn't do this last week, so I 1321 01:06:45,630 --> 01:06:47,260 can get at those individual values. 1322 01:06:47,260 --> 01:06:51,270 So make hi, ./hi and, voila, there we go again. 1323 01:06:51,270 --> 01:06:56,208 It's the same 72, 73, 33, but now, I'm sort of, hopefully, 1324 01:06:56,208 --> 01:06:58,500 like, wrapping my mind around the fact that, all right, 1325 01:06:58,500 --> 01:07:01,650 a string is just an array of characters, and arrays, you 1326 01:07:01,650 --> 01:07:04,960 can index into them using this new square bracket notation. 1327 01:07:04,960 --> 01:07:08,040 So I can get at any one of these individual characters, 1328 01:07:08,040 --> 01:07:14,055 and, heck, convert it to an integer like we did in week 0. 1329 01:07:14,055 --> 01:07:17,010 Let me get a little curious now. 1330 01:07:17,010 --> 01:07:20,020 What else might be in the computer's memory? 1331 01:07:20,020 --> 01:07:23,550 Well, let's-- I'll go back to the depiction of these same things. 1332 01:07:23,550 --> 01:07:25,860 Here might be how we originally implemented hi 1333 01:07:25,860 --> 01:07:28,800 with three variables, c1, c2, c3. 1334 01:07:28,800 --> 01:07:31,500 Of course, that map to these decimal digits or equivalent, 1335 01:07:31,500 --> 01:07:32,880 these binary values. 1336 01:07:32,880 --> 01:07:35,310 But what was this looking like in memory? 1337 01:07:35,310 --> 01:07:38,250 Literally, when you create a string in memory, like this, 1338 01:07:38,250 --> 01:07:41,240 string s equals quote-unquote hi, let's consider what's going on 1339 01:07:41,240 --> 01:07:42,615 underneath the hood, so to speak. 1340 01:07:42,615 --> 01:07:47,490 Well, as an abstraction, a string, it's H-I exclamation point taking up, 1341 01:07:47,490 --> 01:07:48,917 it would seem, 3 bytes, right? 1342 01:07:48,917 --> 01:07:51,000 I've gotten rid of the bars, there, because if you 1343 01:07:51,000 --> 01:07:55,650 think of a string as a type, I'm just going to use one big box of size 3. 1344 01:07:55,650 --> 01:08:00,210 But technically, a string, we've just revealed, is an array, 1345 01:08:00,210 --> 01:08:01,830 and the array is of size 3. 1346 01:08:01,830 --> 01:08:03,750 So technically, if the string is called s, 1347 01:08:03,750 --> 01:08:05,970 s bracket 0 will give you the first character, 1348 01:08:05,970 --> 01:08:09,810 s bracket 1, the second, and s bracket 3, the third. 1349 01:08:09,810 --> 01:08:13,290 But let me ask this question now, if this, at the end of the day, 1350 01:08:13,290 --> 01:08:16,560 is the only thing in your computer memory 1351 01:08:16,560 --> 01:08:20,790 and the ability, like a canvas to draw 0s and 1s, or numbers, or characters, 1352 01:08:20,790 --> 01:08:22,620 or whatever on it, but that's it, like this 1353 01:08:22,620 --> 01:08:25,770 is what your Mac, and PC, and phone ultimately reduced to. 1354 01:08:25,770 --> 01:08:29,730 Suppose that I'm running a piece of software, like a text messenger, 1355 01:08:29,730 --> 01:08:33,000 and now I write down bye exclamation point. 1356 01:08:33,000 --> 01:08:34,860 Well, where might that go in memory? 1357 01:08:34,860 --> 01:08:35,845 Well, it might go here. 1358 01:08:35,845 --> 01:08:39,333 B-Y-E. And then the next thing I type might go here, here, here and so forth. 1359 01:08:39,333 --> 01:08:41,250 My memory just might get filled up, over time, 1360 01:08:41,250 --> 01:08:44,310 with things that you or someone else are typing. 1361 01:08:44,310 --> 01:08:50,580 But then how does the computer know if, potentially, B-Y-E exclamation point 1362 01:08:50,580 --> 01:08:56,150 is right after H-I exclamation point where one string ends and the next one 1363 01:08:56,150 --> 01:08:56,650 begins? 1364 01:08:58,930 --> 01:08:59,430 Right? 1365 01:08:59,430 --> 01:09:03,070 All we have are bytes, or 0s and 1s. 1366 01:09:03,070 --> 01:09:05,730 So if you were designing this, how would you 1367 01:09:05,730 --> 01:09:08,280 implement some kind of delimiter between the two? 1368 01:09:08,280 --> 01:09:10,260 Or figure out what the length of a string is? 1369 01:09:10,260 --> 01:09:11,010 What do you think? 1370 01:09:11,010 --> 01:09:12,148 AUDIENCE: A nul character. 1371 01:09:12,148 --> 01:09:15,107 DAVID MALAN: OK, so the right answer is use a nul character, 1372 01:09:15,107 --> 01:09:17,190 and for those who don't know, what does that mean? 1373 01:09:17,190 --> 01:09:19,492 AUDIENCE: It's special. 1374 01:09:19,492 --> 01:09:21,450 DAVID MALAN: Yeah, so it's a special character. 1375 01:09:21,450 --> 01:09:23,520 Let me describe it as a sentinel character. 1376 01:09:23,520 --> 01:09:25,575 Humans decided some time ago that you know 1377 01:09:25,575 --> 01:09:28,560 what, if we want to delineate where one string ends 1378 01:09:28,560 --> 01:09:32,010 and where the next one begins, we just need some special symbol. 1379 01:09:32,010 --> 01:09:35,189 And the symbol they'll use is generally written as backslash 0. 1380 01:09:35,189 --> 01:09:39,555 This is just shorthand notation for literally eight 0 bits. 1381 01:09:39,555 --> 01:09:42,540 0, 0, 0, 0, 0, 0, 0, 0. 1382 01:09:42,540 --> 01:09:46,140 And the nickname for eight 0 bits, in this context, 1383 01:09:46,140 --> 01:09:48,930 is nul, N-U-L, so to speak. 1384 01:09:48,930 --> 01:09:51,910 And we can actually see this as follows. 1385 01:09:51,910 --> 01:09:53,913 If you look at the corresponding decimal digits, 1386 01:09:53,913 --> 01:09:56,580 like you could do by doing out the math or doing the conversion, 1387 01:09:56,580 --> 01:10:01,560 like we've done in code, you would see for storing hi, 72, 73, 33, 1388 01:10:01,560 --> 01:10:06,600 but then 1 extra byte that's sort of invisibly there, but that is all 0s. 1389 01:10:06,600 --> 01:10:09,120 And now I've just written it as the decimal number 0. 1390 01:10:09,120 --> 01:10:12,120 The implication of this is that the computer is apparently 1391 01:10:12,120 --> 01:10:16,695 using, not 3 bytes to store a word like hi, but 4 bytes. 1392 01:10:16,695 --> 01:10:22,050 Whatever the length of the string is, plus 1 for this special sentinel value 1393 01:10:22,050 --> 01:10:24,640 that demarcates the end of the string. 1394 01:10:24,640 --> 01:10:26,680 So we might draw it like this instead. 1395 01:10:26,680 --> 01:10:31,350 And this character is, again, pronounced nul, or written N-U-L. 1396 01:10:31,350 --> 01:10:32,319 So that's all, right? 1397 01:10:32,319 --> 01:10:35,069 If humans, at the end of the day, just have this canvas of memory, 1398 01:10:35,069 --> 01:10:36,902 they just needed to decide, all right, well, 1399 01:10:36,902 --> 01:10:39,990 how do we distinguish one string from another? 1400 01:10:39,990 --> 01:10:42,660 It's a lot easier with chars, individually, it's 1401 01:10:42,660 --> 01:10:45,450 a lot easier with ints, it's even easier With floats, why? 1402 01:10:45,450 --> 01:10:49,620 Because, per that chart earlier, every character is always 1 byte. 1403 01:10:49,620 --> 01:10:51,810 Every int is always 4 bytes. 1404 01:10:51,810 --> 01:10:54,750 Every long is always 8 bytes. 1405 01:10:54,750 --> 01:10:56,279 How long is a string? 1406 01:10:56,279 --> 01:10:59,760 Well, hi is 1, 2, 3 with an exclamation point. 1407 01:10:59,760 --> 01:11:03,029 Bye is 1, 2, 3, 4 with an exclamation point. 1408 01:11:03,029 --> 01:11:06,450 David is D-A-V-I-D, five without an exclamation point. 1409 01:11:06,450 --> 01:11:10,210 And so a string can be any number of bytes long, 1410 01:11:10,210 --> 01:11:12,700 so you somehow need to draw a line in the sand 1411 01:11:12,700 --> 01:11:16,706 to separate in memory one string from another. 1412 01:11:16,706 --> 01:11:19,412 So what's the implication of this? 1413 01:11:19,412 --> 01:11:20,870 Well, let me go back to code, here. 1414 01:11:20,870 --> 01:11:22,210 Let's actually poke around. 1415 01:11:22,210 --> 01:11:27,130 This is a bit dangerous, but I'm going to start looking at memory locations 1416 01:11:27,130 --> 01:11:29,210 past my string here. 1417 01:11:29,210 --> 01:11:33,250 So let me go ahead and recompile, make hi. 1418 01:11:33,250 --> 01:11:35,110 Whoops, what did I do here? 1419 01:11:35,110 --> 01:11:36,680 I forgot a format code. 1420 01:11:36,680 --> 01:11:38,620 Let me add one more %i. 1421 01:11:38,620 --> 01:11:42,550 Now let me go ahead and rerun make hi, ./hi, Enter. 1422 01:11:42,550 --> 01:11:43,580 There it is. 1423 01:11:43,580 --> 01:11:46,660 So you can actually see in the computer, unbeknownst to you 1424 01:11:46,660 --> 01:11:49,830 previously, that there's indeed something else going on there. 1425 01:11:49,830 --> 01:11:52,880 And if I were to make one other variant of this program-- 1426 01:11:52,880 --> 01:11:55,630 let's get rid of just this one word and let's have two. 1427 01:11:55,630 --> 01:11:57,550 So let me give myself another string called t, 1428 01:11:57,550 --> 01:12:01,810 for instance, just this common convention with bye exclamation point. 1429 01:12:01,810 --> 01:12:04,900 Let me, then print out with %s. 1430 01:12:04,900 --> 01:12:10,785 And let me also print out with %s, whoops, printf, print out t, as well. 1431 01:12:10,785 --> 01:12:14,320 Let me recompile this program, and obviously the out-- 1432 01:12:14,320 --> 01:12:17,470 ugh-- this is what happens when I go too fast. 1433 01:12:17,470 --> 01:12:20,740 All right, third mistake today, close quote. 1434 01:12:20,740 --> 01:12:22,030 As I was missing. 1435 01:12:22,030 --> 01:12:23,590 Make hi. 1436 01:12:23,590 --> 01:12:25,000 Fourth mistake today. 1437 01:12:25,000 --> 01:12:26,200 Make hi. 1438 01:12:26,200 --> 01:12:27,490 Dot slash hi. 1439 01:12:27,490 --> 01:12:28,210 OK, voila. 1440 01:12:28,210 --> 01:12:30,610 Now we have a program that's printing both hi and bye, 1441 01:12:30,610 --> 01:12:34,720 only so that we can consider what's going on in the computer's memory. 1442 01:12:34,720 --> 01:12:40,210 If s is storing hi and apparently one bonus byte that 1443 01:12:40,210 --> 01:12:43,240 demarcates the end of that string, bye is apparently 1444 01:12:43,240 --> 01:12:46,413 going to fit into the location directly after. 1445 01:12:46,413 --> 01:12:49,330 And it's wrapping around, but that's just an artist's rendition, here. 1446 01:12:49,330 --> 01:12:52,000 But bye, B-Y-E exclamation point is taking up 1447 01:12:52,000 --> 01:12:58,948 1, 2, 3, 4, plus a fifth byte, as well. 1448 01:12:58,948 --> 01:13:03,580 All right, any questions on this underlying representation of strings? 1449 01:13:03,580 --> 01:13:05,560 And we'll contextualize this, before long, 1450 01:13:05,560 --> 01:13:07,840 so that this isn't just like, OK, who really cares? 1451 01:13:07,840 --> 01:13:10,730 This is going to be the source of actually implementing things. 1452 01:13:10,730 --> 01:13:13,510 In fact for problem set 2, like cryptography, and encryption, 1453 01:13:13,510 --> 01:13:15,468 and scrambling actual human messages. 1454 01:13:15,468 --> 01:13:16,510 But some questions first. 1455 01:13:16,510 --> 01:13:20,650 AUDIENCE: So normally if you were to not use string, 1456 01:13:20,650 --> 01:13:23,480 you would just make a character range that would declare, 1457 01:13:23,480 --> 01:13:26,580 how many characters there are so you know how many characters are 1458 01:13:26,580 --> 01:13:27,330 going to be there. 1459 01:13:27,330 --> 01:13:29,480 DAVID MALAN: A good question, too and let 1460 01:13:29,480 --> 01:13:32,115 me summarize as, if we were instead to use chars all the time, 1461 01:13:32,115 --> 01:13:35,240 we would indeed have to know in advance how many chars you want for a given 1462 01:13:35,240 --> 01:13:38,750 string that you're storing, how, then, does something like get string work, 1463 01:13:38,750 --> 01:13:41,000 because when you CS50 wrote the get string function, 1464 01:13:41,000 --> 01:13:43,190 we obviously don't know how long the words are 1465 01:13:43,190 --> 01:13:45,020 going to be that you all are typing in. 1466 01:13:45,020 --> 01:13:48,560 It turns out, two weeks from now we'll see that get string 1467 01:13:48,560 --> 01:13:51,320 uses a technique known as dynamic memory allocation. 1468 01:13:51,320 --> 01:13:55,770 And it's going to grow or shrink the array automatically for you. 1469 01:13:55,770 --> 01:13:57,050 But more on that soon. 1470 01:13:57,050 --> 01:13:57,920 Other questions? 1471 01:13:57,920 --> 01:14:01,450 AUDIENCE: Why are we using a nul value? 1472 01:14:01,450 --> 01:14:02,725 Isn't that wasting a byte? 1473 01:14:02,725 --> 01:14:03,850 DAVID MALAN: Good question. 1474 01:14:03,850 --> 01:14:06,880 Why are we using a nul value, isn't it wasting a byte? 1475 01:14:06,880 --> 01:14:07,630 Yes. 1476 01:14:07,630 --> 01:14:13,210 But I claim there's really no other way to distinguish the end of one string 1477 01:14:13,210 --> 01:14:19,748 from the start of another, unless we make some sort of notation in memory. 1478 01:14:19,748 --> 01:14:22,540 All we have, at the end of the day, inside of a computer, are bits. 1479 01:14:22,540 --> 01:14:25,900 Therefore, all we can do is spin those bits in some creative way 1480 01:14:25,900 --> 01:14:27,520 to solve this problem. 1481 01:14:27,520 --> 01:14:30,710 So we're minimally going to spend 1 byte to solve this problem. 1482 01:14:30,710 --> 01:14:31,210 Yeah? 1483 01:14:31,210 --> 01:14:35,897 AUDIENCE: How does our memory device know to enter a line when you type 1484 01:14:35,897 --> 01:14:39,270 the /n if we don't have it stored as a char? 1485 01:14:39,270 --> 01:14:40,910 DAVID MALAN: If you don't-- 1486 01:14:40,910 --> 01:14:44,690 how does the computer know to move to a next line when you have a /n? 1487 01:14:44,690 --> 01:14:47,990 So /n, even though it looks like two characters, 1488 01:14:47,990 --> 01:14:51,890 it's actually stored as just 1 byte in the computer's memory. 1489 01:14:51,890 --> 01:14:54,357 There's a mapping between it and an actual number. 1490 01:14:54,357 --> 01:14:57,440 And you can see that, for instance, on the ASCII chart from the other day. 1491 01:14:57,440 --> 01:15:01,224 AUDIENCE: So with that being stored would be the [INAUDIBLE].. 1492 01:15:01,224 --> 01:15:02,420 DAVID MALAN: It would be. 1493 01:15:02,420 --> 01:15:08,210 If I had put a /n in my code here, right after the exclamation point here 1494 01:15:08,210 --> 01:15:11,840 and here, that would actually shift everything in memory because we would 1495 01:15:11,840 --> 01:15:16,740 need to make room for a /n here and another one over here. 1496 01:15:16,740 --> 01:15:18,913 So it would take two more bytes, exactly. 1497 01:15:18,913 --> 01:15:19,580 Other questions? 1498 01:15:19,580 --> 01:15:26,050 AUDIENCE: So if hi exclamation point is written in binary and ASCII 1499 01:15:26,050 --> 01:15:32,630 too as 72, 73, 33, if we are to write those numbers in the string, 1500 01:15:32,630 --> 01:15:39,090 and convert them into binary how would the computer know what's 72 1501 01:15:39,090 --> 01:15:40,390 and what's 8? 1502 01:15:40,390 --> 01:15:42,390 DAVID MALAN: And what's the last thing you said? 1503 01:15:42,390 --> 01:15:43,806 AUDIENCE: 8, for example. 1504 01:15:43,806 --> 01:15:45,700 DAVID MALAN: It's context sensitive. 1505 01:15:45,700 --> 01:15:48,450 So if, at the end of the day, all we're storing is these numbers, 1506 01:15:48,450 --> 01:15:52,380 like 72, 73, 33, recall that it's up to the program 1507 01:15:52,380 --> 01:15:55,470 to decide, based on context, how to interpret them. 1508 01:15:55,470 --> 01:15:59,310 And I simplified this story in week 0 saying that Photoshop interprets them 1509 01:15:59,310 --> 01:16:02,910 as RGB colors, and iMessage or a text messaging program 1510 01:16:02,910 --> 01:16:07,440 interprets them as letters, and Excel interprets them as numbers. 1511 01:16:07,440 --> 01:16:12,540 How those programs do it is by way of variables like string, and int, 1512 01:16:12,540 --> 01:16:13,080 and float. 1513 01:16:13,080 --> 01:16:14,872 And in fact, later this semester, we'll see 1514 01:16:14,872 --> 01:16:19,500 a data type via which you can represent a color as a triple of numbers, 1515 01:16:19,500 --> 01:16:22,240 and red value, a green value, and a blue value. 1516 01:16:22,240 --> 01:16:24,600 So we'll see other data types as well. 1517 01:16:24,600 --> 01:16:25,100 Yeah? 1518 01:16:25,100 --> 01:16:29,320 AUDIENCE: It seems easy enough to just add a nul thing at the end of the word, 1519 01:16:29,320 --> 01:16:32,190 so why do we have integers and long integers? 1520 01:16:32,190 --> 01:16:35,192 Why can't we make everything variable in its data size? 1521 01:16:35,192 --> 01:16:36,900 DAVID MALAN: Really interesting question. 1522 01:16:36,900 --> 01:16:40,110 Why could we not just make all data types variable in size? 1523 01:16:40,110 --> 01:16:43,560 And some languages, some libraries do exactly this. 1524 01:16:43,560 --> 01:16:47,100 C is an older language, and because memory was expensive 1525 01:16:47,100 --> 01:16:48,300 memory was limited. 1526 01:16:48,300 --> 01:16:50,640 The reality was you gain benefits from just 1527 01:16:50,640 --> 01:16:53,010 standardizing the size of these things. 1528 01:16:53,010 --> 01:16:55,410 You also get performance increases in the sense 1529 01:16:55,410 --> 01:16:59,620 that if you know every int is 4 bytes, you can very quickly, 1530 01:16:59,620 --> 01:17:02,220 and we'll see this next week, jump from integer to another, 1531 01:17:02,220 --> 01:17:06,600 to another in memory just by adding 4 inside of those square brackets. 1532 01:17:06,600 --> 01:17:08,430 You can very quickly poke around. 1533 01:17:08,430 --> 01:17:11,522 Whereas, if you had variable length numbers, you would have to, 1534 01:17:11,522 --> 01:17:13,980 kind of, follow, follow, follow, looking for the end of it. 1535 01:17:13,980 --> 01:17:16,780 Follow, follow-- you would have to look at more locations in memory. 1536 01:17:16,780 --> 01:17:18,322 So that's a topic we'll come back to. 1537 01:17:18,322 --> 01:17:20,700 But it was generally for efficiency. 1538 01:17:20,700 --> 01:17:22,170 And other question, yeah? 1539 01:17:22,170 --> 01:17:27,942 AUDIENCE: Why not store the nul character [INAUDIBLE] 1540 01:17:27,942 --> 01:17:31,520 DAVID MALAN: Good question why not store the-- 1541 01:17:31,520 --> 01:17:35,540 why not store the nul character at the beginning? 1542 01:17:35,540 --> 01:17:41,890 You could-- let's see, why not store it at the beginning? 1543 01:17:41,890 --> 01:17:45,080 You could do that. 1544 01:17:45,080 --> 01:17:48,325 You could absolutely-- well, could you do this? 1545 01:17:51,580 --> 01:17:56,380 If you were to do that at the beginning-- 1546 01:17:56,380 --> 01:17:57,400 short answer, no. 1547 01:17:57,400 --> 01:17:58,420 OK, now I retract that. 1548 01:17:58,420 --> 01:18:00,628 No, because I finally thought of a problem with this. 1549 01:18:00,628 --> 01:18:02,483 If you store it at the beginning instead, 1550 01:18:02,483 --> 01:18:04,900 we'll see in just a moment how you can actually write code 1551 01:18:04,900 --> 01:18:07,150 to figure out where the end of a string is, 1552 01:18:07,150 --> 01:18:09,550 and the problem there is wouldn't necessarily 1553 01:18:09,550 --> 01:18:13,000 know if you eventually hit a 0 at the end of the string, 1554 01:18:13,000 --> 01:18:16,810 because it's the number 0 in the context of Excel using some memory, 1555 01:18:16,810 --> 01:18:20,180 or if it's the context of some other data type, altogether. 1556 01:18:20,180 --> 01:18:22,600 So the fact that we've standardized-- 1557 01:18:22,600 --> 01:18:26,560 the fact that we've standardized strings as ending with nul 1558 01:18:26,560 --> 01:18:30,655 means that we can reliably distinguish one variable from another in memory. 1559 01:18:30,655 --> 01:18:32,560 And that's actually a perfect segue way, now, 1560 01:18:32,560 --> 01:18:35,693 to actually using this primitive to building up 1561 01:18:35,693 --> 01:18:38,360 our own code that manipulates these things that are lower level. 1562 01:18:38,360 --> 01:18:39,560 So let me do this. 1563 01:18:39,560 --> 01:18:41,650 Let me create a new file called length. 1564 01:18:41,650 --> 01:18:46,000 And let's use this basic idea to figure out what the length of a string 1565 01:18:46,000 --> 01:18:50,720 is after it's been stored in a variable. 1566 01:18:50,720 --> 01:18:51,860 So let's do this. 1567 01:18:51,860 --> 01:18:56,530 Let me include both the CS50 header and the standard I/O header, 1568 01:18:56,530 --> 01:19:01,250 give myself int main(void) again here, and inside of main, do this. 1569 01:19:01,250 --> 01:19:04,060 Let me prompt the user for a string s and I'll ask them 1570 01:19:04,060 --> 01:19:08,170 for a string like their name, here. 1571 01:19:08,170 --> 01:19:13,420 And then let me name it more verbosely name this time. 1572 01:19:13,420 --> 01:19:15,170 Now let me go ahead and do this. 1573 01:19:15,170 --> 01:19:20,260 Let me iterate over every character in this string 1574 01:19:20,260 --> 01:19:22,180 in order to figure out what its length is. 1575 01:19:22,180 --> 01:19:25,060 So initially, I'm going to go ahead and say this, 1576 01:19:25,060 --> 01:19:28,040 int length equals 0, because I don't know what it is yet. 1577 01:19:28,040 --> 01:19:29,290 So we're going to start at 0. 1578 01:19:29,290 --> 01:19:32,410 And then while the following is true-- 1579 01:19:32,410 --> 01:19:37,370 while-- let me-- do I want to do this? 1580 01:19:37,370 --> 01:19:40,060 Let me change this to i, just for clarity, let me do 1581 01:19:40,060 --> 01:19:45,790 this, while name bracket i does not equal that special nul character. 1582 01:19:45,790 --> 01:19:49,180 So I typed it on the slide is N-U-L, but you don't write N-U-L in code, 1583 01:19:49,180 --> 01:19:53,665 you actually use its numeric equivalent, which is /0 in single quotes. 1584 01:19:53,665 --> 01:19:58,930 While name bracket i does not equal the nul character, I'm going to go ahead 1585 01:19:58,930 --> 01:20:02,470 and increment i to i plus plus. 1586 01:20:02,470 --> 01:20:05,470 And then down here I'm going to print out the value of i 1587 01:20:05,470 --> 01:20:09,270 to see what we actually get, printing out the value of i. 1588 01:20:09,270 --> 01:20:11,020 All right, so what's going to happen here? 1589 01:20:11,020 --> 01:20:13,420 Let me run make length. 1590 01:20:13,420 --> 01:20:14,740 Fortunately no errors. 1591 01:20:14,740 --> 01:20:19,570 ./length and let me type in something like H-I, exclamation point, Enter. 1592 01:20:19,570 --> 01:20:20,740 And I get 3. 1593 01:20:20,740 --> 01:20:23,950 Let me try bye, exclamation point, Enter. 1594 01:20:23,950 --> 01:20:25,870 And I get 4. 1595 01:20:25,870 --> 01:20:28,510 Let me try my own name, David, Enter. 1596 01:20:28,510 --> 01:20:29,970 5, and so forth. 1597 01:20:29,970 --> 01:20:31,880 So what's actually going on here? 1598 01:20:31,880 --> 01:20:34,490 Well, it seems that by way of this 4 loop, 1599 01:20:34,490 --> 01:20:36,622 we are specifying a local variable called 1600 01:20:36,622 --> 01:20:39,580 i initialized to 0, because we're figuring out the length of the string 1601 01:20:39,580 --> 01:20:40,580 as we go. 1602 01:20:40,580 --> 01:20:44,050 I'm then asking the question, does location 0, 1603 01:20:44,050 --> 01:20:49,300 that is i in the name string, which we now know is an array, 1604 01:20:49,300 --> 01:20:51,700 does it not equal /0? 1605 01:20:51,700 --> 01:20:55,645 Because if it doesn't, that means it's an actual character like H, or B, or D. 1606 01:20:55,645 --> 01:20:57,640 So let's increment i. 1607 01:20:57,640 --> 01:21:00,910 Then, let's come back around to line 9 and let's ask the question again. 1608 01:21:00,910 --> 01:21:02,590 Now i equals 1. 1609 01:21:02,590 --> 01:21:06,420 So does name bracket 1 not equal /0? 1610 01:21:06,420 --> 01:21:12,070 Well, if it doesn't, and it won't if it's an i, or a y, or an a, 1611 01:21:12,070 --> 01:21:15,490 based on what I typed in, we're going to increment i once more. 1612 01:21:15,490 --> 01:21:18,940 Fast-forward to the end of the story, once I get to the end of the string, 1613 01:21:18,940 --> 01:21:22,420 technically, one space past the end of the string, 1614 01:21:22,420 --> 01:21:25,510 name bracket i will equal /0. 1615 01:21:25,510 --> 01:21:29,960 So I don't increment i anymore, I end up just printing the result. 1616 01:21:29,960 --> 01:21:34,510 So what we seem to have here with some low level C code, just this while loop, 1617 01:21:34,510 --> 01:21:39,070 is a program that figures out the length of a given string that's been typed in. 1618 01:21:39,070 --> 01:21:41,860 Let's practice our abstraction and decompose this into, 1619 01:21:41,860 --> 01:21:43,270 maybe, a helper function here. 1620 01:21:43,270 --> 01:21:47,110 Let me grab all of this code here, and assume, 1621 01:21:47,110 --> 01:21:51,580 for the sake of discussion for a moment, that I can call a function now called 1622 01:21:51,580 --> 01:21:53,740 string length. 1623 01:21:53,740 --> 01:21:56,830 And the length of the string is name that I want to get, 1624 01:21:56,830 --> 01:22:01,000 and then I'll go ahead and print out, just as before with %i, 1625 01:22:01,000 --> 01:22:02,398 the length of that string. 1626 01:22:02,398 --> 01:22:04,690 So now I'm abstracting away this notion of figuring out 1627 01:22:04,690 --> 01:22:05,732 the length of the string. 1628 01:22:05,732 --> 01:22:08,470 That's an opportunity for to me to create my own function. 1629 01:22:08,470 --> 01:22:11,515 If I want to create a function called string length, 1630 01:22:11,515 --> 01:22:15,610 I'll claim that I want to take a string as input, 1631 01:22:15,610 --> 01:22:20,860 and what should I have this function return as its return type? 1632 01:22:20,860 --> 01:22:26,090 What should get string presumably return? 1633 01:22:26,090 --> 01:22:26,590 Yeah? 1634 01:22:26,590 --> 01:22:27,430 AUDIENCE: Int. 1635 01:22:27,430 --> 01:22:28,270 DAVID MALAN: An int, right? 1636 01:22:28,270 --> 01:22:29,020 An int makes sense. 1637 01:22:29,020 --> 01:22:30,937 Float really wouldn't make sense because we're 1638 01:22:30,937 --> 01:22:33,377 measuring things that are integers. 1639 01:22:33,377 --> 01:22:34,960 In this case, the length of something. 1640 01:22:34,960 --> 01:22:36,640 So indeed, let's have it return an int. 1641 01:22:36,640 --> 01:22:39,380 I can use the same code as before, so I'm 1642 01:22:39,380 --> 01:22:42,175 going to paste what I cut earlier in the file. 1643 01:22:42,175 --> 01:22:46,660 The only thing I have to change is the name of the variable. 1644 01:22:46,660 --> 01:22:50,240 Because now this function, I decided arbitrarily 1645 01:22:50,240 --> 01:22:53,130 that I'm going to call it s, just to be more generic. 1646 01:22:53,130 --> 01:22:55,915 So I'm going to look at s bracket i at each location. 1647 01:22:55,915 --> 01:22:58,790 And I don't want to print it at the end, this would be a side effect. 1648 01:22:58,790 --> 01:23:01,250 What's the line of code I should include here if I actually 1649 01:23:01,250 --> 01:23:04,005 want to hand back the total length? 1650 01:23:04,005 --> 01:23:04,505 Yeah? 1651 01:23:04,505 --> 01:23:05,362 AUDIENCE: Return i. 1652 01:23:05,362 --> 01:23:06,320 DAVID MALAN: Say again? 1653 01:23:06,320 --> 01:23:07,112 AUDIENCE: Return i. 1654 01:23:07,112 --> 01:23:09,270 DAVID MALAN: Return i, in this case. 1655 01:23:09,270 --> 01:23:11,540 So I'm going return i, not print it. 1656 01:23:11,540 --> 01:23:16,490 Because now, my main function can use the return value stored in length 1657 01:23:16,490 --> 01:23:18,530 and print it on the next line itself. 1658 01:23:18,530 --> 01:23:22,520 I just need a prototype, so that's my one forgivable copy paste here. 1659 01:23:22,520 --> 01:23:24,170 I'm going to rerun make length. 1660 01:23:24,170 --> 01:23:25,640 Hopefully I didn't screw up. 1661 01:23:25,640 --> 01:23:29,330 I didn't. ./length, I'll type in hi-- oops-- 1662 01:23:29,330 --> 01:23:31,340 I'll type in hi, again. 1663 01:23:31,340 --> 01:23:31,880 That works. 1664 01:23:31,880 --> 01:23:34,970 I'll type in bye again, and so forth. 1665 01:23:34,970 --> 01:23:38,703 So now we have a function that determines the length of a string. 1666 01:23:38,703 --> 01:23:41,120 Well, it turns out we didn't actually need this all along. 1667 01:23:41,120 --> 01:23:46,042 It turns out that we can get rid of my own custom string length function here. 1668 01:23:46,042 --> 01:23:48,500 I can definitely delete the whole implementation down here. 1669 01:23:48,500 --> 01:23:52,160 Because it turns out, in a file called string.h, 1670 01:23:52,160 --> 01:23:55,520 which is a new header file today, we actually have access to a function 1671 01:23:55,520 --> 01:23:59,690 called, more succinctly, strlen, S-T-R-L-E-N. Which, 1672 01:23:59,690 --> 01:24:01,130 literally does that. 1673 01:24:01,130 --> 01:24:05,240 This is a function that comes with C, albeit in the string.h header file, 1674 01:24:05,240 --> 01:24:09,450 and it does what we just implemented manually. 1675 01:24:09,450 --> 01:24:13,340 So here's an example of, admittedly, a wheel we just reinvented, but no more. 1676 01:24:13,340 --> 01:24:14,480 We don't have to do that. 1677 01:24:14,480 --> 01:24:16,850 And how do what kinds of functions exist? 1678 01:24:16,850 --> 01:24:21,260 Well, let me pop out of my browser here to a website that 1679 01:24:21,260 --> 01:24:24,455 is a CS50's incarnation of what are called manual pages. 1680 01:24:24,455 --> 01:24:28,070 It turns out that in a lot of systems, Macs, and Unix, 1681 01:24:28,070 --> 01:24:31,100 and Linux systems, including the Visual Studio Code 1682 01:24:31,100 --> 01:24:33,020 instance that we have in the cloud, there 1683 01:24:33,020 --> 01:24:36,290 are publicly accessible manual pages for functions. 1684 01:24:36,290 --> 01:24:39,770 They tend to be written very expertly, in a way that's 1685 01:24:39,770 --> 01:24:41,160 not very beginner-friendly. 1686 01:24:41,160 --> 01:24:45,650 So we have here at manual.cs50.io is CS50's version 1687 01:24:45,650 --> 01:24:48,740 of manual pages that have this less-comfortable mode that 1688 01:24:48,740 --> 01:24:51,290 give you a, sort of, cheat sheet of very frequently used, 1689 01:24:51,290 --> 01:24:55,010 helpful functions in C. And we've translated the expert 1690 01:24:55,010 --> 01:24:58,075 notation to things that a beginner can understand. 1691 01:24:58,075 --> 01:25:02,190 So, for instance, let me go ahead and search for a string up at the top here. 1692 01:25:02,190 --> 01:25:06,200 You'll see that there's documentation for our own get string function, 1693 01:25:06,200 --> 01:25:08,510 but more interestingly down here, there's 1694 01:25:08,510 --> 01:25:10,850 a whole bunch of string-related functions 1695 01:25:10,850 --> 01:25:12,620 that we haven't even seen most of, yet. 1696 01:25:12,620 --> 01:25:14,660 But there's indeed one here called strlen, 1697 01:25:14,660 --> 01:25:16,620 calculate the length of a string. 1698 01:25:16,620 --> 01:25:22,160 And so if I go to strlen here, I'll see some less-comfortable documentation 1699 01:25:22,160 --> 01:25:22,970 for this function. 1700 01:25:22,970 --> 01:25:25,400 And the way a manual page typically works, 1701 01:25:25,400 --> 01:25:28,310 whether in CS50's format or any other, system 1702 01:25:28,310 --> 01:25:30,950 is you see, typically, a synopsis of what header 1703 01:25:30,950 --> 01:25:33,330 files you need to use the function. 1704 01:25:33,330 --> 01:25:35,960 So you would copy paste these couple of lines here. 1705 01:25:35,960 --> 01:25:39,530 You see what the prototype is of the function so 1706 01:25:39,530 --> 01:25:42,533 that you know what its inputs are, if any, and its outputs are, if any. 1707 01:25:42,533 --> 01:25:45,200 Then down below you might see a description, which in this case, 1708 01:25:45,200 --> 01:25:46,320 is pretty straightforward. 1709 01:25:46,320 --> 01:25:48,170 This function calculates the length of s. 1710 01:25:48,170 --> 01:25:51,110 Then you see what the return value is, if any, 1711 01:25:51,110 --> 01:25:54,310 and you might even see an example, like this one that we've whipped up here. 1712 01:25:54,310 --> 01:25:57,012 So these manual pages which are again, accessible 1713 01:25:57,012 --> 01:25:59,720 here, and we'll link to these in the problem sets moving forward, 1714 01:25:59,720 --> 01:26:02,510 are pretty much the place to start when you want to figure out 1715 01:26:02,510 --> 01:26:05,210 has a wheel been invented already? 1716 01:26:05,210 --> 01:26:08,490 Is there a function that might help me solve some problems set problems 1717 01:26:08,490 --> 01:26:11,900 so that I don't have to really get into the weeds of doing all 1718 01:26:11,900 --> 01:26:13,712 of those lower-level steps as I've had. 1719 01:26:13,712 --> 01:26:16,670 Sometimes the answer is going to be yes, sometimes it's going to be no. 1720 01:26:16,670 --> 01:26:19,160 But again the point of our having just done this together 1721 01:26:19,160 --> 01:26:21,950 is to reveal that even the functions you start taking for 1722 01:26:21,950 --> 01:26:26,135 granted, they all reduce to some of these basic building blocks. 1723 01:26:26,135 --> 01:26:29,600 At the end of the day, this is all that's inside of your computer 1724 01:26:29,600 --> 01:26:30,950 is 0s and 1s. 1725 01:26:30,950 --> 01:26:33,060 We're just learning, now, how to harness those 1726 01:26:33,060 --> 01:26:37,220 and how to manipulate them ourselves. 1727 01:26:37,220 --> 01:26:41,510 Any questions here on this? 1728 01:26:41,510 --> 01:26:43,305 Any questions at all? 1729 01:26:43,305 --> 01:26:43,805 Yeah. 1730 01:26:43,805 --> 01:26:51,779 AUDIENCE: We did just see [INAUDIBLE] Is that so common 1731 01:26:51,779 --> 01:26:54,035 that we would have to specify it, or is it not? 1732 01:26:54,035 --> 01:26:55,160 DAVID MALAN: Good question. 1733 01:26:55,160 --> 01:26:57,920 Is it so common that you would have to specify it or not? 1734 01:26:57,920 --> 01:27:00,170 You do need to include its header files because that's 1735 01:27:00,170 --> 01:27:01,670 where all of those prototypes are. 1736 01:27:01,670 --> 01:27:05,190 You don't need to worry about linking it in with -l anything. 1737 01:27:05,190 --> 01:27:07,340 And in fact, moving forward, you do not ever 1738 01:27:07,340 --> 01:27:10,910 need to worry about linking in libraries when compiling your code. 1739 01:27:10,910 --> 01:27:14,940 We, the staff, have configured make to do all of that for you automatically. 1740 01:27:14,940 --> 01:27:17,030 We want you to understand that it is doing it, 1741 01:27:17,030 --> 01:27:19,340 but we'll take care of all of the -l's for you. 1742 01:27:19,340 --> 01:27:23,360 But the onus is on you for the prototypes and the header files. 1743 01:27:23,360 --> 01:27:27,150 Other questions on these representations or techniques? 1744 01:27:27,150 --> 01:27:27,650 Yeah? 1745 01:27:27,650 --> 01:27:35,920 AUDIENCE: [INAUDIBLE] exclamation mark. 1746 01:27:35,920 --> 01:27:40,524 How does it actually define the spaces [INAUDIBLE]?? 1747 01:27:40,524 --> 01:27:41,920 DAVID MALAN: A good question. 1748 01:27:41,920 --> 01:27:45,700 If you were to have a string with actual spaces in it that is multiple words, 1749 01:27:45,700 --> 01:27:47,530 what would the computer actually do? 1750 01:27:47,530 --> 01:27:49,960 Well for this. let me go to asciichart.com. 1751 01:27:49,960 --> 01:27:54,880 Which is just a random website that's my go-to for the first 127 characters 1752 01:27:54,880 --> 01:27:55,930 of ASCII. 1753 01:27:55,930 --> 01:27:58,520 This is, in fact, what we had a screenshot of the other day. 1754 01:27:58,520 --> 01:28:02,088 And if you look here, it's a little non-obvious, but S-P is space. 1755 01:28:02,088 --> 01:28:05,380 If a computer were to store a space, it would actually store the decimal number 1756 01:28:05,380 --> 01:28:10,430 32, or technically, the pattern of 0s and 1s that represent the number 32. 1757 01:28:10,430 --> 01:28:13,240 All of the US English keys that you might type on a keyboard 1758 01:28:13,240 --> 01:28:16,390 can be represented with a number, and using Unicode can 1759 01:28:16,390 --> 01:28:18,920 you express even things like emojis and other languages. 1760 01:28:18,920 --> 01:28:19,420 Yeah? 1761 01:28:19,420 --> 01:28:23,130 AUDIENCE: Are only strings followed by nul number, 1762 01:28:23,130 --> 01:28:26,516 or let's say we had a series of numbers, would each one of them 1763 01:28:26,516 --> 01:28:27,845 be accompanied by nuls? 1764 01:28:27,845 --> 01:28:28,970 DAVID MALAN: Good question. 1765 01:28:28,970 --> 01:28:31,790 Only strings are accompanied by nuls at the end 1766 01:28:31,790 --> 01:28:34,760 because every other data type we've talked about thus far 1767 01:28:34,760 --> 01:28:37,130 is of well defined finite length. 1768 01:28:37,130 --> 01:28:40,190 1 byte for char, 4 bytes for ints and so forth. 1769 01:28:40,190 --> 01:28:44,240 If we think back to last week, we did end the week with a couple of problems. 1770 01:28:44,240 --> 01:28:48,080 Integer overflow, because 4 bytes, heck, even 8 bytes is sometimes not enough. 1771 01:28:48,080 --> 01:28:50,270 We also talked about floating point imprecision. 1772 01:28:50,270 --> 01:28:53,480 Thankfully in the world of scientific computing and financial computing, 1773 01:28:53,480 --> 01:28:56,930 there are libraries you can use that draw inspiration 1774 01:28:56,930 --> 01:28:58,820 from this idea of a string, and they might 1775 01:28:58,820 --> 01:29:02,640 use 9 bytes for an integer value or maybe 20 bytes 1776 01:29:02,640 --> 01:29:04,170 that you can count really high. 1777 01:29:04,170 --> 01:29:06,680 But they will then start to manage that memory for you 1778 01:29:06,680 --> 01:29:09,960 and what they're really probably doing is just grabbing a whole bunch of bytes 1779 01:29:09,960 --> 01:29:13,070 and somehow remembering how long the sequence of bytes is. 1780 01:29:13,070 --> 01:29:16,190 That's how these higher-level libraries work, too. 1781 01:29:16,190 --> 01:29:17,700 All right, this has been a lot. 1782 01:29:17,700 --> 01:29:19,080 Let's take one more break here. 1783 01:29:19,080 --> 01:29:20,670 We'll do a seven-minute break here. 1784 01:29:20,670 --> 01:29:23,465 And when we come back, we'll flesh out a few more details. 1785 01:29:23,465 --> 01:29:26,390 All right. 1786 01:29:26,390 --> 01:29:31,400 So we just saw strlen as an example of a function that 1787 01:29:31,400 --> 01:29:32,898 comes in the string library. 1788 01:29:32,898 --> 01:29:35,690 Let's start to take more of these library functions out for a spin. 1789 01:29:35,690 --> 01:29:39,530 So we're not relying only on the built ins that we saw last week. 1790 01:29:39,530 --> 01:29:41,660 Let me switch over to VS Code. 1791 01:29:41,660 --> 01:29:46,040 And create a file called, say string.h. 1792 01:29:46,040 --> 01:29:48,115 to apply this lesson learned, as follows. 1793 01:29:48,115 --> 01:29:54,770 Let me include cs50.h, stdio.h, and this new thing, 1794 01:29:54,770 --> 01:29:57,260 string.h as well, at the top. 1795 01:29:57,260 --> 01:29:59,698 I'm going to do the usual int main(void) here. 1796 01:29:59,698 --> 01:30:02,240 And then in this program suppose, for the sake of discussion, 1797 01:30:02,240 --> 01:30:05,540 that I didn't know about %s for printf or, heck, 1798 01:30:05,540 --> 01:30:09,300 maybe early on there was no %s format code. 1799 01:30:09,300 --> 01:30:12,420 And so there was no easy way to print strings. 1800 01:30:12,420 --> 01:30:15,830 Well, at least if we know that strings are just arrays of characters, 1801 01:30:15,830 --> 01:30:19,820 we could use %c as a workaround, a solution to that, 1802 01:30:19,820 --> 01:30:21,420 sort of, contrived problem. 1803 01:30:21,420 --> 01:30:24,920 So let me ask myself for a string s by using get string here 1804 01:30:24,920 --> 01:30:27,500 and I'll ask the user for some input. 1805 01:30:27,500 --> 01:30:33,260 And then, let me print out say, output , and all I want to do is print back out 1806 01:30:33,260 --> 01:30:34,460 what the user typed. 1807 01:30:34,460 --> 01:30:38,000 Now, the simplest way to do this, of course, is going to be like last week, 1808 01:30:38,000 --> 01:30:40,960 printf %s, and plug in the s, and we're done. 1809 01:30:40,960 --> 01:30:43,730 But again, for the sake of discussion, I forgot about, 1810 01:30:43,730 --> 01:30:47,820 or someone didn't implement %s, so how else could we do this? 1811 01:30:47,820 --> 01:30:51,800 Well, in pseudo code, or in English what's the gist of how we could solve 1812 01:30:51,800 --> 01:30:58,910 this problem, printing out the string s on the screen without using %s? 1813 01:30:58,910 --> 01:31:02,420 How might we go about solving this? 1814 01:31:02,420 --> 01:31:04,147 Just in English, high-level? 1815 01:31:04,147 --> 01:31:05,730 What would your pseudo code look like? 1816 01:31:05,730 --> 01:31:06,230 Yeah? 1817 01:31:06,230 --> 01:31:09,568 AUDIENCE: You could just print each letter. 1818 01:31:09,568 --> 01:31:11,360 DAVID MALAN: OK, so just print each letter. 1819 01:31:11,360 --> 01:31:13,490 And maybe, more precisely, some kind of loop. 1820 01:31:13,490 --> 01:31:17,030 Like, let's iterate over all of the characters in s 1821 01:31:17,030 --> 01:31:18,150 and print one at a time. 1822 01:31:18,150 --> 01:31:19,290 So how can I do that? 1823 01:31:19,290 --> 01:31:24,050 Well, for int i, get 0 is kind of the go-to starting point for most loops, 1824 01:31:24,050 --> 01:31:25,580 i is less than-- 1825 01:31:25,580 --> 01:31:27,365 OK, how long do I want to iterate? 1826 01:31:27,365 --> 01:31:29,240 Well, it's going to depend on what I type in, 1827 01:31:29,240 --> 01:31:31,300 but that's why we have strlen now. 1828 01:31:31,300 --> 01:31:36,080 So iterate up to the length of s, and then increment i with plus 1829 01:31:36,080 --> 01:31:37,075 plus on each iteration. 1830 01:31:37,075 --> 01:31:40,670 And then let's just print out %c with no new line, 1831 01:31:40,670 --> 01:31:43,010 because I want everything on the same line, 1832 01:31:43,010 --> 01:31:47,780 whatever the character is at s bracket i. 1833 01:31:47,780 --> 01:31:49,790 And then at the very end, I'll give myself 1834 01:31:49,790 --> 01:31:52,350 that new line, just to move the cursor down to the next line 1835 01:31:52,350 --> 01:31:54,350 so the dollar sign is not in a weird place. 1836 01:31:54,350 --> 01:31:57,230 All right, so let's see if I didn't screw up any of the code, 1837 01:31:57,230 --> 01:32:02,690 make string, Enter, so far so good, string and let me type in something 1838 01:32:02,690 --> 01:32:04,520 like, hi, Enter. 1839 01:32:04,520 --> 01:32:06,020 And I see output of hi, too. 1840 01:32:06,020 --> 01:32:09,680 Let me do it once more with bye, Enter, and that works, too. 1841 01:32:09,680 --> 01:32:12,410 Notice I very deliberately and quickly gave myself 1842 01:32:12,410 --> 01:32:15,260 two spaces here and one space here just because I, literally, 1843 01:32:15,260 --> 01:32:18,620 wanted these things to line up properly, and input is shorter than output. 1844 01:32:18,620 --> 01:32:21,830 But that was just a deliberate formatting detail. 1845 01:32:21,830 --> 01:32:23,520 So this code is correct. 1846 01:32:23,520 --> 01:32:29,240 Which is a claim I've made before, but it's not well-designed. 1847 01:32:29,240 --> 01:32:33,170 It is well-designed in that I'm using someone else's library function, 1848 01:32:33,170 --> 01:32:35,660 like, I've not reinvented a wheel, there's no line 15 1849 01:32:35,660 --> 01:32:38,270 or below, I didn't implement string length myself. 1850 01:32:38,270 --> 01:32:43,640 So I'm at least practicing what I've preached. 1851 01:32:43,640 --> 01:32:48,360 But there's still an imperfection, a suboptimality. 1852 01:32:48,360 --> 01:32:50,910 This one's really subtle though. 1853 01:32:50,910 --> 01:32:54,330 And you have to think about how loops work. 1854 01:32:54,330 --> 01:32:58,640 What am I doing that's not super efficient? 1855 01:32:58,640 --> 01:32:59,870 Yeah, in back? 1856 01:32:59,870 --> 01:33:03,178 AUDIENCE: [INAUDIBLE] over and over again. 1857 01:33:03,178 --> 01:33:04,970 DAVID MALAN: Yeah, this is a little subtle. 1858 01:33:04,970 --> 01:33:07,460 But if you think back to the basic definition of a 4 loop 1859 01:33:07,460 --> 01:33:10,070 and recall when I highlighted things last week, what happens? 1860 01:33:10,070 --> 01:33:12,830 Well, the first thing is that i gets set to 0. 1861 01:33:12,830 --> 01:33:14,310 Then we check the condition. 1862 01:33:14,310 --> 01:33:15,560 How do we check the condition? 1863 01:33:15,560 --> 01:33:18,380 We call strlen on s, we get back an answer 1864 01:33:18,380 --> 01:33:24,810 like 3 if it's a H-I exclamation point and 0 is less than 3, so that's fine, 1865 01:33:24,810 --> 01:33:26,570 and then we print out the character. 1866 01:33:26,570 --> 01:33:29,060 Then we increment i from 0 to 1. 1867 01:33:29,060 --> 01:33:30,468 We recheck the condition. 1868 01:33:30,468 --> 01:33:31,760 How do I recheck the condition? 1869 01:33:31,760 --> 01:33:34,100 I call strlen of s. 1870 01:33:34,100 --> 01:33:36,890 Get back the same answer, 3. 1871 01:33:36,890 --> 01:33:38,720 Compare 3 against 1. 1872 01:33:38,720 --> 01:33:39,800 We're still good. 1873 01:33:39,800 --> 01:33:44,690 So we print out another character. i gets incremented again, i is now 2. 1874 01:33:44,690 --> 01:33:46,035 We check the condition. 1875 01:33:46,035 --> 01:33:46,910 What's the condition? 1876 01:33:46,910 --> 01:33:47,960 Well, what's the string like the best? 1877 01:33:47,960 --> 01:33:48,980 It's still 3. 1878 01:33:48,980 --> 01:33:51,860 2 is still less than 3. 1879 01:33:51,860 --> 01:33:55,430 So I keep asking the same question sort of stupidly 1880 01:33:55,430 --> 01:33:58,220 because the string is, presumably, never changing in length. 1881 01:33:58,220 --> 01:34:00,158 And indeed, every time I check that condition, 1882 01:34:00,158 --> 01:34:01,700 that function is going to get called. 1883 01:34:01,700 --> 01:34:04,380 And every time, the answer for hi is going to be 3. 1884 01:34:04,380 --> 01:34:04,880 3. 1885 01:34:04,880 --> 01:34:06,095 3. 1886 01:34:06,095 --> 01:34:10,850 So it's a marginal suboptimality, but I could do better, right? 1887 01:34:10,850 --> 01:34:15,560 Don't ask multiple times questions that you can remember the answer to. 1888 01:34:15,560 --> 01:34:20,960 So how could I remember the answer to this question and ask it just once? 1889 01:34:20,960 --> 01:34:24,750 How could I remember the answer to this question? 1890 01:34:24,750 --> 01:34:25,250 Let me see. 1891 01:34:25,250 --> 01:34:26,030 Yeah, back there? 1892 01:34:26,030 --> 01:34:27,446 AUDIENCE: Store it in a variable. 1893 01:34:27,446 --> 01:34:29,180 DAVID MALAN: So store it in a variable, right? 1894 01:34:29,180 --> 01:34:32,097 That's been our answer most any time we want to keep something around. 1895 01:34:32,097 --> 01:34:33,120 So how could I do this? 1896 01:34:33,120 --> 01:34:37,880 Well, I could do something like this, int, maybe, length equals strlen of s. 1897 01:34:37,880 --> 01:34:41,200 Then I can just change this function call. 1898 01:34:41,200 --> 01:34:43,160 Let me fix my spelling here. 1899 01:34:43,160 --> 01:34:47,360 Let me fix this to be comparing against length, and this is now OK. 1900 01:34:47,360 --> 01:34:50,240 Because now strlen is only called once on line 9. 1901 01:34:50,240 --> 01:34:52,740 And I'm reusing the value of that variable, a.k.a. 1902 01:34:52,740 --> 01:34:54,240 length, again, and again, and again. 1903 01:34:54,240 --> 01:34:55,282 So that's more efficient. 1904 01:34:55,282 --> 01:34:59,760 Turns out that 4 loops let you declare multiple variables at once, 1905 01:34:59,760 --> 01:35:04,020 so we can do this a little more elegantly all in one line. 1906 01:35:04,020 --> 01:35:06,770 And this is just some syntactic improvement. 1907 01:35:06,770 --> 01:35:11,930 I could actually do something like this, n equals strlen of s, 1908 01:35:11,930 --> 01:35:14,750 and then I could just say n here or I could call it length. 1909 01:35:14,750 --> 01:35:17,667 But heck, while I'm being succinct I'm just going to use n for number. 1910 01:35:17,667 --> 01:35:22,100 So now it's just a marginal change but I've now 1911 01:35:22,100 --> 01:35:26,030 declared two variables inside of my loop, i and n. 1912 01:35:26,030 --> 01:35:29,300 i is set to 0. n extends to the string length of s. 1913 01:35:29,300 --> 01:35:33,380 But now, hereafter, all of my condition checks are just, i less than n, 1914 01:35:33,380 --> 01:35:36,170 i less than n, and n is never changing. 1915 01:35:36,170 --> 01:35:38,008 All right, so a marginal improvement there. 1916 01:35:38,008 --> 01:35:39,800 Now that I've used this new function, let's 1917 01:35:39,800 --> 01:35:41,925 use some other functions that might be of interest. 1918 01:35:41,925 --> 01:35:48,680 Let me write a quick program here that capitalizes the beginning of-- 1919 01:35:48,680 --> 01:35:51,810 changes to uppercase some string that the user types in. 1920 01:35:51,810 --> 01:35:55,490 So let me code a file called uppercase.c. 1921 01:35:55,490 --> 01:36:01,520 Up here I'll use my new friends, cs50.h, and standard I/O, and string.h. 1922 01:36:01,520 --> 01:36:07,070 So standard I/O, and string.h So just as before int main(void). 1923 01:36:07,070 --> 01:36:09,620 And then inside of main, what I'm going to do this time, 1924 01:36:09,620 --> 01:36:14,390 is let's ask the user for a string s using get string asking them 1925 01:36:14,390 --> 01:36:15,680 for the before value. 1926 01:36:15,680 --> 01:36:20,130 And then let me print out something like after. 1927 01:36:20,130 --> 01:36:24,410 So that it-- just so I can see what the uppercase version thereof is. 1928 01:36:24,410 --> 01:36:28,610 And then after this, let me do the following, for int, i 1929 01:36:28,610 --> 01:36:32,030 equals 0, oh, let's practice that same lesson, 1930 01:36:32,030 --> 01:36:37,790 so n equals the string length of s, i is less than n, i plus plus. 1931 01:36:37,790 --> 01:36:41,600 So really, nothing new, fundamentally yet. 1932 01:36:41,600 --> 01:36:47,270 How do I now convert characters from lowercase, if they are, to uppercase? 1933 01:36:47,270 --> 01:36:50,000 In other words, if I type in hi, H-I in lowercase, 1934 01:36:50,000 --> 01:36:55,490 I want my program, now, to uppercase everything to capital H, capital I. 1935 01:36:55,490 --> 01:36:58,770 Well how can I go about doing this? 1936 01:36:58,770 --> 01:37:01,010 Well you might recall that there is this-- 1937 01:37:01,010 --> 01:37:03,900 you might recall that there is this ASCII chart. 1938 01:37:03,900 --> 01:37:06,855 So let's just consult this real quick on asciichart.com. 1939 01:37:06,855 --> 01:37:11,510 We've looked at this last week notice that a-- capital A is 65, 1940 01:37:11,510 --> 01:37:15,440 capital B is 66, capital C is 67, and heck, here's 1941 01:37:15,440 --> 01:37:19,640 lowercase a, lowercase b, lowercase c, and that's 97, 98, 99. 1942 01:37:19,640 --> 01:37:22,980 And if I actually do some math, there's a distance of 32. 1943 01:37:22,980 --> 01:37:23,480 Right? 1944 01:37:23,480 --> 01:37:25,640 So if I want to go from uppercase to lowercase, 1945 01:37:25,640 --> 01:37:30,788 I can do 65 plus 32 will give me 97 and that actually works out 1946 01:37:30,788 --> 01:37:32,330 across the board for everything else. 1947 01:37:32,330 --> 01:37:36,020 66 plus 32 gets me to 98 or lowercase b. 1948 01:37:36,020 --> 01:37:40,640 Or conversely, if you have a lowercase a, and its value is 97, 1949 01:37:40,640 --> 01:37:46,850 subtract 32 and boom, you have capital A. So there's some arithmetic involved. 1950 01:37:46,850 --> 01:37:49,460 But now that we know that strings are just arrays, 1951 01:37:49,460 --> 01:37:53,330 and we know that characters, which are in those arrays, 1952 01:37:53,330 --> 01:37:56,450 are just binary representations of numbers, 1953 01:37:56,450 --> 01:37:59,297 I think we can manipulate a few of these things as follows. 1954 01:37:59,297 --> 01:38:01,130 Let me go back to my program here, and first 1955 01:38:01,130 --> 01:38:05,360 ask the question, if the current character in the array during this loop 1956 01:38:05,360 --> 01:38:08,930 is lowercase, let's force it to uppercase. 1957 01:38:08,930 --> 01:38:10,250 So how am I going to do that? 1958 01:38:10,250 --> 01:38:16,460 If the character at s bracket i, the current location in the array, 1959 01:38:16,460 --> 01:38:21,320 is greater than or equal to lowercase a, and s bracket 1960 01:38:21,320 --> 01:38:26,660 i is less than or equal to lowercase z, kind of a weird Boolean 1961 01:38:26,660 --> 01:38:31,460 expression but it's completely legitimate, because in this array 1962 01:38:31,460 --> 01:38:34,230 s is a whole bunch of characters that the humans typed in, 1963 01:38:34,230 --> 01:38:37,520 because that's what a string is, greater than or equal to a might 1964 01:38:37,520 --> 01:38:39,680 be a little nonsensical because when have you ever 1965 01:38:39,680 --> 01:38:41,330 compared numbers to letters? 1966 01:38:41,330 --> 01:38:47,568 But we know from week 0 lowercase a is 97, lowercase z is, what is it, 1? 1967 01:38:47,568 --> 01:38:48,485 I don't even remember. 1968 01:38:48,485 --> 01:38:49,065 AUDIENCE: 132. 1969 01:38:49,065 --> 01:38:49,850 DAVID MALAN: What's that? 1970 01:38:49,850 --> 01:38:50,590 AUDIENCE: 132? 1971 01:38:50,590 --> 01:38:52,590 DAVID MALAN: 132, We know. 1972 01:38:52,590 --> 01:38:56,390 And so that would allow us to answer the question is the current letter 1973 01:38:56,390 --> 01:38:57,410 lowercase? 1974 01:38:57,410 --> 01:39:00,530 All right, so let me answer that question. 1975 01:39:00,530 --> 01:39:03,140 If it is, what do I want to print out? 1976 01:39:03,140 --> 01:39:05,870 I don't want to print out the letter itself, 1977 01:39:05,870 --> 01:39:09,290 I want to print out the letter minus 32, right? 1978 01:39:09,290 --> 01:39:13,160 Because if it happens to be a lowercase a, 97, 97 minus 32 1979 01:39:13,160 --> 01:39:15,530 gives me 65, which is uppercase A, and I know that 1980 01:39:15,530 --> 01:39:18,860 just from having stared at that chart in the past. 1981 01:39:18,860 --> 01:39:24,172 Else if the character is not between little a and big A, 1982 01:39:24,172 --> 01:39:25,880 I'm just going to print out the character 1983 01:39:25,880 --> 01:39:28,550 itself by printing s bracket i. 1984 01:39:28,550 --> 01:39:31,580 And at the very end of this, I'm going to print out a new line just 1985 01:39:31,580 --> 01:39:33,480 to move the cursor to the next line. 1986 01:39:33,480 --> 01:39:34,930 So again, it's a little wordy. 1987 01:39:34,930 --> 01:39:39,020 But this loop here, which I borrowed from our code previously, 1988 01:39:39,020 --> 01:39:41,510 just iterates over the string, a.k.a. 1989 01:39:41,510 --> 01:39:44,630 array, character-by-character, through its length. 1990 01:39:44,630 --> 01:39:47,360 This line 11 here is just asking the question 1991 01:39:47,360 --> 01:39:50,870 if that current character, the i-th character of s, 1992 01:39:50,870 --> 01:39:53,900 is greater than or equal to little a and less 1993 01:39:53,900 --> 01:39:59,240 than or equal to little z, that is between 97 and 132, then 1994 01:39:59,240 --> 01:40:04,940 we're going to go ahead and force it to uppercase instead. 1995 01:40:04,940 --> 01:40:09,290 All right, and let me zoom out here for just a second. 1996 01:40:09,290 --> 01:40:14,270 And sorry, I misspoke 122, which is what you might have said. 1997 01:40:14,270 --> 01:40:15,630 There's only 26 letters. 1998 01:40:15,630 --> 01:40:17,270 So 122 is little z. 1999 01:40:17,270 --> 01:40:20,280 Let me go ahead now and compile and run this program. 2000 01:40:20,280 --> 01:40:26,210 So make uppercase, ./uppercase, and let me type in hi in lowercase, Enter. 2001 01:40:26,210 --> 01:40:28,520 And there's the capitalized version, thereof. 2002 01:40:28,520 --> 01:40:30,920 Let me do it again, with my own name in lowercase, 2003 01:40:30,920 --> 01:40:33,100 and now it's capitalized as well. 2004 01:40:33,100 --> 01:40:34,860 Well, what could we do to improve this? 2005 01:40:34,860 --> 01:40:35,360 Well. 2006 01:40:35,360 --> 01:40:35,960 You know what? 2007 01:40:35,960 --> 01:40:37,640 Let's stop reinventing wheels. 2008 01:40:37,640 --> 01:40:39,840 Let's go to the manual pages. 2009 01:40:39,840 --> 01:40:43,490 So let me go here and search for something like, I don't know, 2010 01:40:43,490 --> 01:40:44,540 lowercase. 2011 01:40:44,540 --> 01:40:45,620 And there I go. 2012 01:40:45,620 --> 01:40:48,470 I did some auto complete here, our little search box 2013 01:40:48,470 --> 01:40:50,720 is saying that, OK there's an is-lower function, 2014 01:40:50,720 --> 01:40:52,550 check whether a character is lowercase. 2015 01:40:52,550 --> 01:40:53,640 Well how do I use this? 2016 01:40:53,640 --> 01:40:59,150 Well let me check, is lower, now I see the actual man page for this function. 2017 01:40:59,150 --> 01:41:01,850 Now we see, include ctype.h. 2018 01:41:01,850 --> 01:41:02,902 So that's the protot-- 2019 01:41:02,902 --> 01:41:04,610 that's the header file I need to include. 2020 01:41:04,610 --> 01:41:08,570 This is the prototype for is-lower, it apparently takes a char as input 2021 01:41:08,570 --> 01:41:10,330 and returns an int. 2022 01:41:10,330 --> 01:41:11,330 Which is a little weird. 2023 01:41:11,330 --> 01:41:14,400 I feel like is-lower should return true or false. 2024 01:41:14,400 --> 01:41:18,680 So let's scroll down to the description and return value. 2025 01:41:18,680 --> 01:41:20,810 It returns, oh this is interesting. 2026 01:41:20,810 --> 01:41:25,370 And this is a convention in C. This function returns a non-zero int 2027 01:41:25,370 --> 01:41:30,820 if C is a lowercase letter and 0 if C is not a lowercase letter. 2028 01:41:30,820 --> 01:41:33,230 So it returns non-zero. 2029 01:41:33,230 --> 01:41:38,330 So like 1, negative 1, something that's not 0 if C is a lowercase letter, 2030 01:41:38,330 --> 01:41:41,400 and 0 if it is not a lowercase letter. 2031 01:41:41,400 --> 01:41:43,160 So how can we use this building block? 2032 01:41:43,160 --> 01:41:45,230 Let me go back to my code here. 2033 01:41:45,230 --> 01:41:49,610 Let me add this file, include ctype.h. 2034 01:41:49,610 --> 01:41:53,120 And down here, let me get rid of this cryptic expression, which 2035 01:41:53,120 --> 01:41:59,060 was kind of painful to come up with, and just ask this, is-lower s bracket i? 2036 01:42:01,970 --> 01:42:05,390 That should actually work but why? 2037 01:42:05,390 --> 01:42:10,520 Well is-lower, again, returns a non-zero value if the letter is lowercase. 2038 01:42:10,520 --> 01:42:12,150 Well, what does that mean? 2039 01:42:12,150 --> 01:42:13,415 That means it could return 1. 2040 01:42:13,415 --> 01:42:14,540 It could return negative 1. 2041 01:42:14,540 --> 01:42:16,370 It could return 50 or negative 50. 2042 01:42:16,370 --> 01:42:18,650 It's actually not precisely defined, why? 2043 01:42:18,650 --> 01:42:19,700 Just, because. 2044 01:42:19,700 --> 01:42:23,750 This was a common convention to use 0 to represent false and use 2045 01:42:23,750 --> 01:42:26,120 any other value to represent true. 2046 01:42:26,120 --> 01:42:30,140 And so it turns out, that inside of Boolean expressions, 2047 01:42:30,140 --> 01:42:34,755 if you put a value like a function call like this, that returns 0, 2048 01:42:34,755 --> 01:42:36,380 that's going to be equivalent to false. 2049 01:42:36,380 --> 01:42:38,975 It's like the answer being no, it is not lower. 2050 01:42:38,975 --> 01:42:41,990 But you can also, in parentheses, put the name 2051 01:42:41,990 --> 01:42:45,920 of the function and its arguments, and not compare it against anything. 2052 01:42:45,920 --> 01:42:51,230 Because we could do something like this, well if it's not equal to 0, then 2053 01:42:51,230 --> 01:42:52,247 it must be lowercase. 2054 01:42:52,247 --> 01:42:54,830 Because that's the definition, if it returns a non-zero value, 2055 01:42:54,830 --> 01:42:55,760 it's lowercase. 2056 01:42:55,760 --> 01:42:59,210 But a more succinct way to do that is just a bit more like English. 2057 01:42:59,210 --> 01:43:04,110 If it's is lower, then print out the character minus 32. 2058 01:43:04,110 --> 01:43:06,590 So this would be the common way of using one of these 2059 01:43:06,590 --> 01:43:10,025 is- functions to check if the answer is true or false. 2060 01:43:10,025 --> 01:43:12,810 AUDIENCE: [INAUDIBLE] 2061 01:43:12,810 --> 01:43:14,670 DAVID MALAN: OK, well we might be done. 2062 01:43:14,670 --> 01:43:15,170 OK. 2063 01:43:15,170 --> 01:43:16,922 AUDIENCE: [INAUDIBLE] 2064 01:43:16,922 --> 01:43:17,900 DAVID MALAN: No. 2065 01:43:17,900 --> 01:43:19,520 So it's not necessarily 1. 2066 01:43:19,520 --> 01:43:23,180 It would be incorrect to check for 1, or negative 1, or anything else. 2067 01:43:23,180 --> 01:43:25,550 You want to check for the opposite of 0. 2068 01:43:25,550 --> 01:43:26,870 So not equal 0. 2069 01:43:26,870 --> 01:43:31,820 Or more succinctly, like I did by just putting it into parentheses. 2070 01:43:31,820 --> 01:43:34,560 Let me see what happens here. 2071 01:43:34,560 --> 01:43:38,690 So this is great, but some of you might have spotted a better solution 2072 01:43:38,690 --> 01:43:39,680 to this problem. 2073 01:43:39,680 --> 01:43:42,230 A moment ago when we were on the manual pages searching 2074 01:43:42,230 --> 01:43:45,380 for things related to lowercase, what might be another building 2075 01:43:45,380 --> 01:43:46,475 block we can employ here? 2076 01:43:49,160 --> 01:43:50,700 Based on what's on the screen here? 2077 01:43:50,700 --> 01:43:51,200 Yeah? 2078 01:43:51,200 --> 01:43:52,888 AUDIENCE: To-upper. 2079 01:43:52,888 --> 01:43:54,140 DAVID MALAN: So to-upper. 2080 01:43:54,140 --> 01:43:57,098 There's a function that would literally do the uppercasing thing for me 2081 01:43:57,098 --> 01:44:00,032 so I don't have to get into the weeds of negative 32, plus 32. 2082 01:44:00,032 --> 01:44:01,490 I don't have to consult that chart. 2083 01:44:01,490 --> 01:44:05,120 Someone has solved this problem for me in the past. 2084 01:44:05,120 --> 01:44:09,680 And let's see if I can actually get back to it. 2085 01:44:09,680 --> 01:44:10,520 There we go. 2086 01:44:10,520 --> 01:44:12,540 Let me go ahead, now, and use this. 2087 01:44:12,540 --> 01:44:15,230 So instead of doing s bracket i minus 32, 2088 01:44:15,230 --> 01:44:19,880 let's use a function that someone else wrote, and just say to-upper, s bracket 2089 01:44:19,880 --> 01:44:20,420 i. 2090 01:44:20,420 --> 01:44:23,250 And now it's going to do the solution for me. 2091 01:44:23,250 --> 01:44:30,530 So if I rerun make uppercase, and then do, slowly, .uppercase, type in hi, 2092 01:44:30,530 --> 01:44:32,120 now it's working as expected. 2093 01:44:32,120 --> 01:44:35,870 And honestly, if I read the documentation for to-upper 2094 01:44:35,870 --> 01:44:39,170 by going back to its man page, or manual page, what you'll see 2095 01:44:39,170 --> 01:44:44,420 is that it says if it's lowercase, it will return the uppercase version 2096 01:44:44,420 --> 01:44:45,050 thereof. 2097 01:44:45,050 --> 01:44:48,913 If it's not lowercase, it's already uppercase, it's punctuation, 2098 01:44:48,913 --> 01:44:50,705 it will just return the original character. 2099 01:44:50,705 --> 01:44:53,900 Which means, thanks to this function, I can actually 2100 01:44:53,900 --> 01:44:57,650 tighten this up significantly, get rid of all of my conditional 2101 01:44:57,650 --> 01:45:02,030 there, and just print out the to-upper return value, 2102 01:45:02,030 --> 01:45:05,060 and leave it to whoever wrote that function to figure out 2103 01:45:05,060 --> 01:45:09,470 if something's uppercase or lowercase. 2104 01:45:09,470 --> 01:45:13,820 All right, questions on these kinds of tricks? 2105 01:45:13,820 --> 01:45:17,090 Again, it all reduces to week 0 basics, but we're just 2106 01:45:17,090 --> 01:45:18,750 building these abstractions on top. 2107 01:45:18,750 --> 01:45:19,250 Yeah? 2108 01:45:19,250 --> 01:45:21,208 AUDIENCE: I'm wondering if there's any way just 2109 01:45:21,208 --> 01:45:25,110 to import all packages under a certain subdomain instead 2110 01:45:25,110 --> 01:45:27,120 of having to do multiple [INAUDIBLE] statements, 2111 01:45:27,120 --> 01:45:28,412 kind of like a star [INAUDIBLE] 2112 01:45:28,412 --> 01:45:29,340 DAVID MALAN: Yes. 2113 01:45:29,340 --> 01:45:30,180 Unfortunately, no. 2114 01:45:30,180 --> 01:45:33,120 There is no easy way in C to say, give me everything. 2115 01:45:33,120 --> 01:45:35,670 That was for, historically, performance reasons. 2116 01:45:35,670 --> 01:45:38,940 They want you to be explicit as to what you want to include. 2117 01:45:38,940 --> 01:45:41,730 In other languages like Python, Java, one of which 2118 01:45:41,730 --> 01:45:44,513 we'll see later this term, you can say, give me everything. 2119 01:45:44,513 --> 01:45:47,430 But that, actually, tends to be best practice because it can slow down 2120 01:45:47,430 --> 01:45:50,000 execution or compilation of your code. 2121 01:45:50,000 --> 01:45:50,500 Yeah? 2122 01:45:50,500 --> 01:45:52,845 AUDIENCE: Does to-upper accommodate for special characters? 2123 01:45:52,845 --> 01:45:53,340 DAVID MALAN: Ah. 2124 01:45:53,340 --> 01:45:55,980 Does to-upper accommodate special characters like punctuation? 2125 01:45:55,980 --> 01:45:56,480 Yes. 2126 01:45:56,480 --> 01:45:58,440 If I read the documentation more pedantically, 2127 01:45:58,440 --> 01:45:59,710 we would see exactly that. 2128 01:45:59,710 --> 01:46:02,940 It will properly hand me back an exclamation point, 2129 01:46:02,940 --> 01:46:04,600 even if I passed it in. 2130 01:46:04,600 --> 01:46:08,970 So if I do make uppercase here, and let me do ./upper, sorry-- 2131 01:46:08,970 --> 01:46:13,620 ./uppercase, hi with an exclamation point, it's going to handle that, too, 2132 01:46:13,620 --> 01:46:15,810 pass it through unchanged Yeah? 2133 01:46:15,810 --> 01:46:19,200 AUDIENCE: Do we access to a function that would do all of that 2134 01:46:19,200 --> 01:46:21,590 but just to the screen rather than to [INAUDIBLE] 2135 01:46:21,590 --> 01:46:23,550 DAVID MALAN: Really good question, too. 2136 01:46:23,550 --> 01:46:28,110 No, we do not have access to a function that at least comes with C or comes 2137 01:46:28,110 --> 01:46:31,740 with CS50's library that will just force the whole thing to uppercase. 2138 01:46:31,740 --> 01:46:34,170 In C, that's actually easier said than done. 2139 01:46:34,170 --> 01:46:35,550 In Python, it's trivial. 2140 01:46:35,550 --> 01:46:39,810 So stay tuned for another language that will let us do exactly that. 2141 01:46:39,810 --> 01:46:42,510 All right, so what does this leave us with? 2142 01:46:42,510 --> 01:46:44,520 There's just a-- let's come full circle now, 2143 01:46:44,520 --> 01:46:47,490 to where we began today where we were talking about those command line 2144 01:46:47,490 --> 01:46:48,090 arguments. 2145 01:46:48,090 --> 01:46:51,810 Recall that we talked about rm taking command line argument. 2146 01:46:51,810 --> 01:46:54,470 The file you want to delete, we talked about clang 2147 01:46:54,470 --> 01:46:56,220 taking command line arguments, that again, 2148 01:46:56,220 --> 01:46:58,140 modify the behavior of the program. 2149 01:46:58,140 --> 01:47:01,680 How is it that maybe you and I can start to write programs that 2150 01:47:01,680 --> 01:47:03,840 actually take command line arguments? 2151 01:47:03,840 --> 01:47:07,620 Well here is where I can finally explain why 2152 01:47:07,620 --> 01:47:10,740 we've been typing int main(void) for the past week 2153 01:47:10,740 --> 01:47:14,490 and just asking that you take on faith that it's just the way you do things. 2154 01:47:14,490 --> 01:47:20,820 Well, by default in C, at least the most recent versions thereof, 2155 01:47:20,820 --> 01:47:24,010 there's only two official ways to write main functions. 2156 01:47:24,010 --> 01:47:26,460 You might see other formats online, but they're generally 2157 01:47:26,460 --> 01:47:28,870 not consistent with the current specification. 2158 01:47:28,870 --> 01:47:32,160 This, again, was sort of a boilerplate for the simplest 2159 01:47:32,160 --> 01:47:34,770 function we might write last week, and recall that we've 2160 01:47:34,770 --> 01:47:36,210 been doing this the whole time. 2161 01:47:36,210 --> 01:47:40,990 (Void) What that (void) means, for all of the programs I have written thus far 2162 01:47:40,990 --> 01:47:43,890 and you have written thus far, is that none of our programs 2163 01:47:43,890 --> 01:47:47,040 that we've written take command line arguments. 2164 01:47:47,040 --> 01:47:49,110 That's what the void there means. 2165 01:47:49,110 --> 01:47:53,950 It turns out that main is the way you can specify that your program does, 2166 01:47:53,950 --> 01:47:55,740 in fact, take command line arguments, that 2167 01:47:55,740 --> 01:47:59,760 is words after the command in your terminal window. 2168 01:47:59,760 --> 01:48:02,220 If you want to actually not use get int or get string, 2169 01:48:02,220 --> 01:48:05,970 you want the human to be able to say something, like hello, David 2170 01:48:05,970 --> 01:48:06,840 and hit Enter. 2171 01:48:06,840 --> 01:48:09,940 And just run-- print hello, David on the screen. 2172 01:48:09,940 --> 01:48:14,460 You can use command line arguments, words after the program name 2173 01:48:14,460 --> 01:48:16,750 on your command line. 2174 01:48:16,750 --> 01:48:20,460 So we're going to change this in a moment to be something more verbose, 2175 01:48:20,460 --> 01:48:23,930 but something that's now a bit more familiar syntactically. 2176 01:48:23,930 --> 01:48:28,440 If you change that (void) in main to be this incantation instead, 2177 01:48:28,440 --> 01:48:33,480 int, argc, comma, string, argv, open bracket, close bracket, 2178 01:48:33,480 --> 01:48:36,630 you are now giving yourself access to writing programs 2179 01:48:36,630 --> 01:48:38,910 that take command line arguments. 2180 01:48:38,910 --> 01:48:42,120 Argc, which stands for argument count is going 2181 01:48:42,120 --> 01:48:46,410 to be an integer that stores how many words the human typed at the prompt. 2182 01:48:46,410 --> 01:48:49,050 The C automatically gives that to you. 2183 01:48:49,050 --> 01:48:52,710 String argv stands for argument vector, that's 2184 01:48:52,710 --> 01:48:57,100 going to be an array of all of the words that the human typed at the prompt. 2185 01:48:57,100 --> 01:48:59,130 So with today's building block of an array, 2186 01:48:59,130 --> 01:49:01,980 we have the ability now to let the humans type as many words, 2187 01:49:01,980 --> 01:49:03,900 or as few words, as they want at the prompt. 2188 01:49:03,900 --> 01:49:06,900 C is going to automatically put them in an array called argv, 2189 01:49:06,900 --> 01:49:12,360 and it's going to tell us how many words there are in an int called argc. 2190 01:49:12,360 --> 01:49:16,060 The int, as the return type here, we'll come back to in just a moment. 2191 01:49:16,060 --> 01:49:19,350 Let's use this definition to make, maybe, 2192 01:49:19,350 --> 01:49:20,970 just a couple of simple programs. 2193 01:49:20,970 --> 01:49:23,070 But in problem set 2 will we actually use 2194 01:49:23,070 --> 01:49:26,470 this to control the behavior of your own code. 2195 01:49:26,470 --> 01:49:33,120 Let me code up a file called argv.0 just to keep it aptly named. 2196 01:49:33,120 --> 01:49:35,700 Let me include cs50.h. 2197 01:49:35,700 --> 01:49:37,240 Let me go ahead and include-- 2198 01:49:37,240 --> 01:49:37,740 oops. 2199 01:49:37,740 --> 01:49:40,950 That is not the right name of a program, let's start that over. 2200 01:49:40,950 --> 01:49:45,450 Let's go ahead and code up argv.c. 2201 01:49:45,450 --> 01:49:46,800 And here we have-- 2202 01:49:46,800 --> 01:49:52,890 include cs50.h, include stdio.h, int, main, not void, 2203 01:49:52,890 --> 01:50:00,025 let's actually say int, argc, string, argv, open bracket, close bracket. 2204 01:50:00,025 --> 01:50:02,400 No numbers in between because you don't know, in advance, 2205 01:50:02,400 --> 01:50:05,310 how many words the human's going to type at their prompt. 2206 01:50:05,310 --> 01:50:06,760 Now let's go ahead and do this. 2207 01:50:06,760 --> 01:50:10,800 Let's write a very simple program that just says, hello, David, hello, Carter, 2208 01:50:10,800 --> 01:50:12,660 whoever the name is that gets typed. 2209 01:50:12,660 --> 01:50:16,260 But not using get string, let's instead have the human just 2210 01:50:16,260 --> 01:50:19,890 type their name at the prompt, just like rm, just like clang, just like make, 2211 01:50:19,890 --> 01:50:22,170 so it's just one and done when you hit Enter. 2212 01:50:22,170 --> 01:50:23,610 No additional prompts. 2213 01:50:23,610 --> 01:50:28,380 Let me go ahead then and do this, printf, quote-unquote, hello, 2214 01:50:28,380 --> 01:50:31,500 comma, and instead of world today, I want to print out 2215 01:50:31,500 --> 01:50:33,370 whatever the human typed in. 2216 01:50:33,370 --> 01:50:38,850 So let's go ahead and do this, argv, bracket 0 for now. 2217 01:50:38,850 --> 01:50:43,080 But I don't think this is quite what I want because, of course, 2218 01:50:43,080 --> 01:50:48,370 that's going to literally print out argv, bracket, 0, bracket. 2219 01:50:48,370 --> 01:50:52,510 I need a placeholder, so let me put %s here and then put that here. 2220 01:50:52,510 --> 01:50:56,520 So if argv is an array, but it's an array of strings, 2221 01:50:56,520 --> 01:51:00,480 then argv bracket 0 is itself a single string. 2222 01:51:00,480 --> 01:51:03,450 And so it can be plugged into that %s placeholder. 2223 01:51:03,450 --> 01:51:05,740 Let me go ahead and save my program. 2224 01:51:05,740 --> 01:51:09,340 And compile argv, so far, so good. 2225 01:51:09,340 --> 01:51:13,170 Let me now type in my name after the name of the program. 2226 01:51:13,170 --> 01:51:13,980 So no get string. 2227 01:51:13,980 --> 01:51:18,280 I'm literally typing an extra word, my own name at the prompt, Enter. 2228 01:51:18,280 --> 01:51:21,290 OK, it's apparently a little buggy in a couple of ways. 2229 01:51:21,290 --> 01:51:24,500 I forgot my /n but that's not a huge deal. 2230 01:51:24,500 --> 01:51:28,960 But apparently, inside of argv is literally everything 2231 01:51:28,960 --> 01:51:31,270 that humans typed in including the name of the program. 2232 01:51:31,270 --> 01:51:36,250 So logically, how do I print out hello, David, or hello so-and-so and not 2233 01:51:36,250 --> 01:51:37,720 the actual name of the program? 2234 01:51:37,720 --> 01:51:38,960 What needs to change here? 2235 01:51:38,960 --> 01:51:39,460 Yeah? 2236 01:51:39,460 --> 01:51:41,050 AUDIENCE: Change the index to 1. 2237 01:51:41,050 --> 01:51:41,800 DAVID MALAN: Yeah. 2238 01:51:41,800 --> 01:51:45,940 So presumably index to 1, if that's the second thing I, or whichever human, 2239 01:51:45,940 --> 01:51:46,940 has typed at the prompt. 2240 01:51:46,940 --> 01:51:51,410 So let's do make argv again, ./argv, Enter. 2241 01:51:51,410 --> 01:51:52,090 Huh. 2242 01:51:52,090 --> 01:51:53,630 Hello, nul. 2243 01:51:53,630 --> 01:51:55,690 So this is another form of nul. 2244 01:51:55,690 --> 01:51:59,320 But this is user error, now, on my part. 2245 01:51:59,320 --> 01:52:01,070 I didn't do exactly what I said I would. 2246 01:52:01,070 --> 01:52:01,570 Yeah? 2247 01:52:01,570 --> 01:52:02,530 AUDIENCE: You forgot the parameter. 2248 01:52:02,530 --> 01:52:04,430 DAVID MALAN: Yeah, I forgot the parameter. 2249 01:52:04,430 --> 01:52:05,700 So that's actually, hm. 2250 01:52:05,700 --> 01:52:07,450 I should probably deal with that, somehow, 2251 01:52:07,450 --> 01:52:09,292 so that people aren't breaking my program 2252 01:52:09,292 --> 01:52:11,000 and printing out random things, like nul. 2253 01:52:11,000 --> 01:52:14,770 But if I do say argv, David, now you see hello, David. 2254 01:52:14,770 --> 01:52:18,070 I can get a little curious, like what's at location 2? 2255 01:52:18,070 --> 01:52:23,410 Well we can see, make argv, bracket, ./argv, David, Enter. 2256 01:52:23,410 --> 01:52:24,910 All right, so just nothing is there. 2257 01:52:24,910 --> 01:52:28,202 But it turns out, in a couple of weeks, we'll start really poking around memory 2258 01:52:28,202 --> 01:52:30,310 and see if we can't crash programs deliberately 2259 01:52:30,310 --> 01:52:32,800 because nothing is stopping me from saying, 2260 01:52:32,800 --> 01:52:36,470 oh what's at location 2 million, for instance? 2261 01:52:36,470 --> 01:52:38,350 We could really start to get curious. 2262 01:52:38,350 --> 01:52:40,420 But for now, we'll do the right thing. 2263 01:52:40,420 --> 01:52:44,360 But let's now make sure the human has typed in the right number of words. 2264 01:52:44,360 --> 01:52:50,920 So let's say this, if argc equals 2, that is the name of the program 2265 01:52:50,920 --> 01:52:54,760 and one more word after that, go ahead and trust that in argv 1, 2266 01:52:54,760 --> 01:52:56,980 as you proposed, is the person's name. 2267 01:52:56,980 --> 01:53:01,810 Else, let's go ahead and default here to something simple and basic, 2268 01:53:01,810 --> 01:53:05,860 like, well, if we don't get a name from the user, just say hello, world, 2269 01:53:05,860 --> 01:53:07,300 like always. 2270 01:53:07,300 --> 01:53:10,045 So now we're programming defensively. 2271 01:53:10,045 --> 01:53:13,090 This time the human, even if they screw up, they don't give us a name 2272 01:53:13,090 --> 01:53:15,965 or they give us too many names, we're just going to say hello, world, 2273 01:53:15,965 --> 01:53:17,890 because I now have some error handling here. 2274 01:53:17,890 --> 01:53:22,030 Because, again, argc is argument count, the number of words, total, 2275 01:53:22,030 --> 01:53:23,990 typed at the command line. 2276 01:53:23,990 --> 01:53:26,740 So make, argv, ./argv. 2277 01:53:26,740 --> 01:53:28,540 Let me make the same mistake as before. 2278 01:53:28,540 --> 01:53:29,050 OK. 2279 01:53:29,050 --> 01:53:30,910 I don't get this weird nul behavior. 2280 01:53:30,910 --> 01:53:32,350 I get something well-defined. 2281 01:53:32,350 --> 01:53:33,610 I could now do David. 2282 01:53:33,610 --> 01:53:36,850 I could do David Malan, but that's not currently supported. 2283 01:53:36,850 --> 01:53:41,290 I would need to alter my logic to support more than just two words 2284 01:53:41,290 --> 01:53:42,345 after the prompt. 2285 01:53:42,345 --> 01:53:43,770 So what's the point of this? 2286 01:53:43,770 --> 01:53:45,520 At the moment, it's just a simple exercise 2287 01:53:45,520 --> 01:53:50,702 to actually give myself a way of taking user input when they run the program. 2288 01:53:50,702 --> 01:53:52,660 Because, consider, it's just more convenient in 2289 01:53:52,660 --> 01:53:54,670 this new, command-line-interface world. 2290 01:53:54,670 --> 01:53:58,857 If you had to use get string every time you compile your code, 2291 01:53:58,857 --> 01:54:00,190 it'd be kind of annoying, right? 2292 01:54:00,190 --> 01:54:03,940 You type make, then you might get a prompt, what would you like to make? 2293 01:54:03,940 --> 01:54:07,690 Then you type in hello, or cash, or something else, then you hit Enter, 2294 01:54:07,690 --> 01:54:09,330 it just really slows the process. 2295 01:54:09,330 --> 01:54:11,440 But in this command-line-interface world, 2296 01:54:11,440 --> 01:54:14,770 if you support command line arguments, then you can use these little tricks. 2297 01:54:14,770 --> 01:54:18,170 Like, scrolling up and down in your history with your arrow keys. 2298 01:54:18,170 --> 01:54:22,430 You can just type commands more quickly because you can do it all at once. 2299 01:54:22,430 --> 01:54:25,000 And you don't have to keep prompting the user, more 2300 01:54:25,000 --> 01:54:27,760 pedantically, for more and more info. 2301 01:54:27,760 --> 01:54:30,280 So any questions then on command line arguments? 2302 01:54:30,280 --> 01:54:34,000 Which, finally, reveals why we had (void) initially, 2303 01:54:34,000 --> 01:54:36,610 but what more we can now put in main. 2304 01:54:36,610 --> 01:54:39,070 That's how you take command line arguments. 2305 01:54:39,070 --> 01:54:40,500 Yeah? 2306 01:54:40,500 --> 01:54:42,610 AUDIENCE: If you were to put-- 2307 01:54:42,610 --> 01:54:47,320 if you were to use argv, and you were to put integers inside of it, 2308 01:54:47,320 --> 01:54:49,923 would it still give you, like, a string? 2309 01:54:49,923 --> 01:54:51,506 Would that still be considered string? 2310 01:54:51,506 --> 01:54:52,923 Or would you consider [INAUDIBLE]? 2311 01:54:52,923 --> 01:54:53,760 DAVID MALAN: Yes. 2312 01:54:53,760 --> 01:54:56,550 If you were to type at the command line something 2313 01:54:56,550 --> 01:55:00,660 like, not a word, but something like the number 42, 2314 01:55:00,660 --> 01:55:03,450 that would actually be treated as a string. 2315 01:55:03,450 --> 01:55:04,290 Why? 2316 01:55:04,290 --> 01:55:06,220 Because again, context matters. 2317 01:55:06,220 --> 01:55:08,940 So if your program is currently manipulating memory 2318 01:55:08,940 --> 01:55:12,510 as though its characters or strings, whatever those patterns of 0s and 1s 2319 01:55:12,510 --> 01:55:16,800 are, they will be interpreted as ASCII text, or Unicode text. 2320 01:55:16,800 --> 01:55:20,640 If we therefore go to the chart here, that might make you wonder, well, 2321 01:55:20,640 --> 01:55:24,510 then how do you distinguish numbers from letters in the context of something 2322 01:55:24,510 --> 01:55:25,890 like chars and strings? 2323 01:55:25,890 --> 01:55:34,380 Well, notice 65 is a, 97 is a, but also 49 is 1, and 50 is 2. 2324 01:55:34,380 --> 01:55:37,500 So the designers of ASCII, and then later Unicode, 2325 01:55:37,500 --> 01:55:40,680 realized well wait a minute, if we want to support programs 2326 01:55:40,680 --> 01:55:43,440 that let you type things that look like numbers, 2327 01:55:43,440 --> 01:55:46,350 even though they're not technically ints or floats, 2328 01:55:46,350 --> 01:55:50,620 we need a way in ASCII and Unicode to represent even numbers. 2329 01:55:50,620 --> 01:55:51,870 So here are your numbers. 2330 01:55:51,870 --> 01:55:55,210 And it's a little silly that we have numbers representing other numbers. 2331 01:55:55,210 --> 01:55:57,863 But again, if you're in the world of letters and characters, 2332 01:55:57,863 --> 01:56:00,030 you've got to come up with a mapping for everything. 2333 01:56:00,030 --> 01:56:01,790 And notice here, here's the dot. 2334 01:56:01,790 --> 01:56:06,390 Even if you were to represent 1.23 as a string, or as characters, 2335 01:56:06,390 --> 01:56:10,840 even the dot now is going to be represented as an ASCII character. 2336 01:56:10,840 --> 01:56:12,930 So again, context here matters. 2337 01:56:12,930 --> 01:56:17,370 All right, one final example to tease apart what this int is 2338 01:56:17,370 --> 01:56:19,840 and what it's been doing here for so long. 2339 01:56:19,840 --> 01:56:24,780 So I'm going to add one bit of logic to a new file 2340 01:56:24,780 --> 01:56:27,750 that I'm going to call exit.c. 2341 01:56:27,750 --> 01:56:29,130 So an exit.c. 2342 01:56:29,130 --> 01:56:32,880 We're going to introduce something that are generally known as exit status. 2343 01:56:32,880 --> 01:56:34,980 It turns out this is not a feature we've used yet, 2344 01:56:34,980 --> 01:56:37,240 but it's just useful to know about. 2345 01:56:37,240 --> 01:56:40,350 Especially when automating tests of your own code. 2346 01:56:40,350 --> 01:56:44,115 When it comes to figuring out if a program succeeded or failed. 2347 01:56:44,115 --> 01:56:48,870 It turns out that main has one more feature we haven't leveraged. 2348 01:56:48,870 --> 01:56:54,330 An ability to signal to the user whether something was successful or not. 2349 01:56:54,330 --> 01:56:57,760 And that's by way of main's return value. 2350 01:56:57,760 --> 01:57:02,060 So I'm going modify this program as follows, like this. 2351 01:57:02,060 --> 01:57:04,920 Suppose I want to write a similar program that 2352 01:57:04,920 --> 01:57:07,900 requires that the user type a word at the prompt. 2353 01:57:07,900 --> 01:57:12,450 So that argc has to be 2 for whatever design purpose. 2354 01:57:12,450 --> 01:57:18,990 If argc does not equal 2, I want to quit out of my program prematurely. 2355 01:57:18,990 --> 01:57:22,590 I want to insist that the user operate the program correctly. 2356 01:57:22,590 --> 01:57:28,800 So I might give them an error message like, missing command line argument /n. 2357 01:57:28,800 --> 01:57:31,180 But now I want to quit out of the program. 2358 01:57:31,180 --> 01:57:32,310 Now how can I do that? 2359 01:57:32,310 --> 01:57:37,260 The right way, quote-unquote, to do that is to return a value from main. 2360 01:57:37,260 --> 01:57:40,590 Now it's a little weird because no one called main yet, 2361 01:57:40,590 --> 01:57:42,990 right, main just gets called automatically, 2362 01:57:42,990 --> 01:57:45,300 but the convention is anytime something goes 2363 01:57:45,300 --> 01:57:50,100 wrong in a program you should return a non-zero value from main. 2364 01:57:50,100 --> 01:57:51,780 1 is fine as a go-to. 2365 01:57:51,780 --> 01:57:55,470 We don't need to get into the weeds of having many different exit statuses, 2366 01:57:55,470 --> 01:57:56,220 so to speak. 2367 01:57:56,220 --> 01:58:01,770 But if you return 1, that is a clue to the system, the Mac, the PC, the cloud 2368 01:58:01,770 --> 01:58:03,430 device that's something went wrong. 2369 01:58:03,430 --> 01:58:03,930 Why? 2370 01:58:03,930 --> 01:58:05,670 Because 1 is not 0. 2371 01:58:05,670 --> 01:58:11,460 If everything works fine, like, let's go ahead and print out hello comma %s like 2372 01:58:11,460 --> 01:58:16,620 before, quote-unquote argv bracket 1. 2373 01:58:16,620 --> 01:58:19,080 So this is just a version of the program without an else. 2374 01:58:19,080 --> 01:58:21,390 So this is the same as doing, essentially, 2375 01:58:21,390 --> 01:58:23,580 an else here like I did earlier. 2376 01:58:23,580 --> 01:58:26,740 I want to signal to the computer that all is well. 2377 01:58:26,740 --> 01:58:28,290 And so I return 0. 2378 01:58:28,290 --> 01:58:31,650 But strictly speaking, if I'm already returning here, 2379 01:58:31,650 --> 01:58:34,560 I don't technically need, if I really want to be nit picky, 2380 01:58:34,560 --> 01:58:36,870 I don't technically need the else because the only way 2381 01:58:36,870 --> 01:58:41,486 I'm going to get to line 11 is if I didn't already return. 2382 01:58:41,486 --> 01:58:43,180 So what's going on here? 2383 01:58:43,180 --> 01:58:46,530 The only new thing here logically, is that for the first time ever, 2384 01:58:46,530 --> 01:58:48,810 I'm returning a value from main. 2385 01:58:48,810 --> 01:58:50,730 That's something I could always have done 2386 01:58:50,730 --> 01:58:55,290 because main has always been defined by us as taking an int as a return value. 2387 01:58:55,290 --> 01:58:59,880 By default, main automatically, sort of secretly, returns 0 for you. 2388 01:58:59,880 --> 01:59:02,850 If you've never once use the return keyword, which you probably 2389 01:59:02,850 --> 01:59:05,370 haven't in main, it just automatically returns 0 2390 01:59:05,370 --> 01:59:07,295 and the system assumes that all went well. 2391 01:59:07,295 --> 01:59:09,390 But now that we're starting to get a little more 2392 01:59:09,390 --> 01:59:11,520 sophisticated with our code, and you know, 2393 01:59:11,520 --> 01:59:15,480 the programmer, something went wrong, you can abort programs early. 2394 01:59:15,480 --> 01:59:20,610 You can exit out of them by returning some other value, besides 0, from main. 2395 01:59:20,610 --> 01:59:23,040 And this is fortuitous that it's an int, right? 2396 01:59:23,040 --> 01:59:25,110 0 means everything worked. 2397 01:59:25,110 --> 01:59:29,250 Unfortunately, in programming, there are seemingly, an infinite number of things 2398 01:59:29,250 --> 01:59:30,240 that can go wrong. 2399 01:59:30,240 --> 01:59:33,210 And int gives you 4 billion possible codes 2400 01:59:33,210 --> 01:59:36,455 that you can use, a.k.a. exit statuses, to signify errors. 2401 01:59:36,455 --> 01:59:39,930 So if you've ever on your Mac or PC gotten some weird pop up 2402 01:59:39,930 --> 01:59:43,320 that an error happened, sometimes, there's a cryptic number in it. 2403 01:59:43,320 --> 01:59:45,420 Maybe it's positive, maybe it's negative. 2404 01:59:45,420 --> 01:59:50,170 It might say error code 123, or negative 49, or something like that. 2405 01:59:50,170 --> 01:59:54,310 What you're generally seeing, are these exit statuses, these return 2406 01:59:54,310 --> 01:59:57,610 values from main in a program that someone at Microsoft, 2407 01:59:57,610 --> 02:00:01,120 or Apple, or somewhere else wrote, something went wrong, 2408 02:00:01,120 --> 02:00:05,980 they are unnecessarily showing you, the user what the error code is. 2409 02:00:05,980 --> 02:00:09,100 If only, so that when you call customer support or submit a ticket, 2410 02:00:09,100 --> 02:00:12,190 you can tell them what exit status you encountered, 2411 02:00:12,190 --> 02:00:15,070 what error code you encounter. 2412 02:00:15,070 --> 02:00:19,390 All right, any questions on exit statuses, 2413 02:00:19,390 --> 02:00:24,580 which is the last of our new building blocks, for now? 2414 02:00:24,580 --> 02:00:25,540 Any questions at all? 2415 02:00:25,540 --> 02:00:26,040 Yeah? 2416 02:00:26,040 --> 02:00:33,540 AUDIENCE: [INAUDIBLE] You know how if you have get string or get int, 2417 02:00:33,540 --> 02:00:35,418 if you want to make [INAUDIBLE] 2418 02:00:35,418 --> 02:00:36,085 DAVID MALAN: No. 2419 02:00:36,085 --> 02:00:39,265 The question is can you do things again and again 2420 02:00:39,265 --> 02:00:41,890 at the command line like you could with get string and get int. 2421 02:00:41,890 --> 02:00:43,870 Which, by default, recall are automatically 2422 02:00:43,870 --> 02:00:46,420 designed to keep prompting the user in their own loop 2423 02:00:46,420 --> 02:00:49,960 until they give you an int, or a float, or the like with command line 2424 02:00:49,960 --> 02:00:50,740 arguments, no. 2425 02:00:50,740 --> 02:00:52,210 You're going to get an error message but then 2426 02:00:52,210 --> 02:00:54,002 you're going to be returned to your prompt. 2427 02:00:54,002 --> 02:00:57,387 And it's up to you to type it correctly the next time. 2428 02:00:57,387 --> 02:00:57,970 Good question. 2429 02:00:57,970 --> 02:00:58,470 Yeah? 2430 02:00:58,470 --> 02:01:03,435 AUDIENCE: [INAUDIBLE] automatically for you. 2431 02:01:03,435 --> 02:01:05,310 DAVID MALAN: If you do not return a value 2432 02:01:05,310 --> 02:01:08,730 explicitly main will automatically return 0 for you, 2433 02:01:08,730 --> 02:01:12,640 that is the way C simply works so it's not strictly necessary. 2434 02:01:12,640 --> 02:01:15,510 But now that we're starting to return values explicitly, 2435 02:01:15,510 --> 02:01:18,090 if something goes wrong, it would be good practice 2436 02:01:18,090 --> 02:01:21,480 to also start returning a value for main when something goes right 2437 02:01:21,480 --> 02:01:23,775 and there are no errors. 2438 02:01:23,775 --> 02:01:27,810 So let's now get out of the weeds and contextualize 2439 02:01:27,810 --> 02:01:31,200 this for some actual problems that we'll be solving in the coming days 2440 02:01:31,200 --> 02:01:33,130 by way of problems set 2 and beyond. 2441 02:01:33,130 --> 02:01:35,740 So here for instance-- 2442 02:01:35,740 --> 02:01:39,990 So here for instance, is a problem that you might think back 2443 02:01:39,990 --> 02:01:43,980 to when you were a kid the readability of some text or some book, 2444 02:01:43,980 --> 02:01:46,230 the grade level in which some book is written. 2445 02:01:46,230 --> 02:01:49,740 If you're a young student, you might read at first-grade level 2446 02:01:49,740 --> 02:01:51,240 or third-grade level in the US. 2447 02:01:51,240 --> 02:01:53,032 Or, if you're in college presumably, you're 2448 02:01:53,032 --> 02:01:54,945 reading at a university-level of text. 2449 02:01:54,945 --> 02:01:58,073 But what does it mean for text, like in a book, 2450 02:01:58,073 --> 02:02:00,240 or in an essay, or something like that to correspond 2451 02:02:00,240 --> 02:02:01,590 to some kind of grade level? 2452 02:02:01,590 --> 02:02:04,950 Well, here's a quote-- a title of a childhood book. 2453 02:02:04,950 --> 02:02:07,590 One Fish, Two Fish, Red Fish, Blue Fish. 2454 02:02:07,590 --> 02:02:10,840 What might the grade level be for a book that has words like this? 2455 02:02:10,840 --> 02:02:13,590 Maybe, when you were a kid or if you have a siblings still reading 2456 02:02:13,590 --> 02:02:16,260 these things, what might the grade level of this thing be? 2457 02:02:18,800 --> 02:02:19,590 Any guesses? 2458 02:02:19,590 --> 02:02:20,090 Yeah? 2459 02:02:20,090 --> 02:02:21,257 AUDIENCE: Before grade 1. 2460 02:02:21,257 --> 02:02:22,340 DAVID MALAN: Sorry, again? 2461 02:02:22,340 --> 02:02:23,382 AUDIENCE: Before grade 1. 2462 02:02:23,382 --> 02:02:25,650 DAVID MALAN: Before grade 1 is, in fact, correct. 2463 02:02:25,650 --> 02:02:27,290 So that's for really young kids? 2464 02:02:27,290 --> 02:02:28,230 Why is that? 2465 02:02:28,230 --> 02:02:29,180 Well, let's consider. 2466 02:02:29,180 --> 02:02:32,210 These are pretty simple phrases, right? 2467 02:02:32,210 --> 02:02:33,500 One fish, two fish, red-- 2468 02:02:33,500 --> 02:02:35,960 I mean there's not even verbs in these sentences, 2469 02:02:35,960 --> 02:02:40,040 they're just nouns and adjectives, and very short sentences. 2470 02:02:40,040 --> 02:02:42,200 And so that might be a heuristic we could use. 2471 02:02:42,200 --> 02:02:44,810 When analyzing text, well if the words are kind of short, 2472 02:02:44,810 --> 02:02:47,240 the sentences are kind of short, everything's very simple, 2473 02:02:47,240 --> 02:02:50,250 that's probably a very young, or early, grade level. 2474 02:02:50,250 --> 02:02:53,665 And so by one formulation, it might indeed be even before grade 1, 2475 02:02:53,665 --> 02:02:54,665 for someone quite young. 2476 02:02:54,665 --> 02:02:55,670 How about this? 2477 02:02:55,670 --> 02:02:58,022 Mr and Mrs. Dursley, of number 4, Privet Drive, 2478 02:02:58,022 --> 02:03:00,980 were proud to say that they were perfectly normal, thank you very much. 2479 02:03:00,980 --> 02:03:02,960 They were the last people you would expect 2480 02:03:02,960 --> 02:03:05,120 to be involved in anything strange or mysterious 2481 02:03:05,120 --> 02:03:07,850 because they just didn't hold with such nonsense. 2482 02:03:07,850 --> 02:03:08,782 And, onward. 2483 02:03:08,782 --> 02:03:10,490 All right, what grade level is this book? 2484 02:03:10,490 --> 02:03:11,778 AUDIENCE: Third. 2485 02:03:11,778 --> 02:03:13,070 DAVID MALAN: OK, I heard third. 2486 02:03:13,070 --> 02:03:14,585 AUDIENCE: What? 2487 02:03:14,585 --> 02:03:15,980 DAVID MALAN: Seventh, fifth. 2488 02:03:15,980 --> 02:03:17,150 OK, all over the place. 2489 02:03:17,150 --> 02:03:20,540 But grade 7, according to one particular measure. 2490 02:03:20,540 --> 02:03:24,802 And whether or not we can debate exactly what age you were when you read this, 2491 02:03:24,802 --> 02:03:27,260 and maybe you're feeling ahead of your time, or behind now. 2492 02:03:27,260 --> 02:03:31,470 But here, we have a snippet of text. 2493 02:03:31,470 --> 02:03:36,560 What makes this text assume an older audience, a more mature audience, 2494 02:03:36,560 --> 02:03:39,690 a higher grade level, would you think? 2495 02:03:39,690 --> 02:03:40,190 Yeah? 2496 02:03:40,190 --> 02:03:42,415 AUDIENCE: [INAUDIBLE] 2497 02:03:42,415 --> 02:03:45,110 DAVID MALAN: Yeah, it's longer, different types of words, 2498 02:03:45,110 --> 02:03:47,513 there's commas now in phrases, and so forth. 2499 02:03:47,513 --> 02:03:49,680 So there's just some kind of sophistication to this. 2500 02:03:49,680 --> 02:03:52,280 So it turns out for the upcoming problem set, 2501 02:03:52,280 --> 02:03:55,370 among the things you'll do is take, as input, texts like this 2502 02:03:55,370 --> 02:03:56,510 and analyze them. 2503 02:03:56,510 --> 02:03:59,072 Considering , well, how many words are in the text? 2504 02:03:59,072 --> 02:04:00,530 How many sentences are in the text? 2505 02:04:00,530 --> 02:04:02,375 How many letters are in the text? 2506 02:04:02,375 --> 02:04:06,170 And use those according to a well-defined formula to prescribe what, 2507 02:04:06,170 --> 02:04:09,680 exactly, the grade level of some actual text-- there's the third-- 2508 02:04:09,680 --> 02:04:10,582 might actually be. 2509 02:04:10,582 --> 02:04:12,790 Well what else are we going to do in the coming days? 2510 02:04:12,790 --> 02:04:15,410 Well I've alluded to this notion of cryptography in the past. 2511 02:04:15,410 --> 02:04:18,350 This notion of scrambling information in such a way 2512 02:04:18,350 --> 02:04:21,422 that you can hide the contents of a message 2513 02:04:21,422 --> 02:04:23,630 from someone who might otherwise intercept it, right? 2514 02:04:23,630 --> 02:04:26,130 The earliest form of this might also be when you're younger, 2515 02:04:26,130 --> 02:04:29,390 and you're in class, and you're passing a note from one person to another, 2516 02:04:29,390 --> 02:04:30,650 from yourself to someone else. 2517 02:04:30,650 --> 02:04:32,960 You don't want to necessarily write a note in English, 2518 02:04:32,960 --> 02:04:35,120 or some other written, language you might want 2519 02:04:35,120 --> 02:04:37,430 to scramble it somehow, or encrypt it. 2520 02:04:37,430 --> 02:04:40,460 Maybe you change the As to a B, and the Bs to a C. 2521 02:04:40,460 --> 02:04:42,770 So that if the teacher snaps it up and intercepts it, 2522 02:04:42,770 --> 02:04:45,200 they can't actually understand what it is you've 2523 02:04:45,200 --> 02:04:47,160 written because it's encrypted. 2524 02:04:47,160 --> 02:04:49,610 So long as your friend, the recipient of this note, 2525 02:04:49,610 --> 02:04:51,890 knows how you manipulated it. 2526 02:04:51,890 --> 02:04:55,640 How you added or subtracted letters to each other, 2527 02:04:55,640 --> 02:04:58,850 they can decrypt it, which is to reverse that process. 2528 02:04:58,850 --> 02:05:02,070 So formally, in the world of cryptography and computer science, 2529 02:05:02,070 --> 02:05:04,130 this is another problem to solve. 2530 02:05:04,130 --> 02:05:07,173 Your input, though, when you have a message you want to send securely, 2531 02:05:07,173 --> 02:05:08,840 is what's generally known as plain text. 2532 02:05:08,840 --> 02:05:12,980 There's some algorithm that's going to then encipher, or encrypt 2533 02:05:12,980 --> 02:05:16,100 that information, into what's called ciphertext, which 2534 02:05:16,100 --> 02:05:18,650 is the scrambled version that theoretically can get safely 2535 02:05:18,650 --> 02:05:21,110 intercepted and your message has not been spoiled, 2536 02:05:21,110 --> 02:05:24,620 unless that intercept actually knows what algorithm 2537 02:05:24,620 --> 02:05:27,150 you used inside of this process. 2538 02:05:27,150 --> 02:05:29,720 So that would be generally known as a cipher. 2539 02:05:29,720 --> 02:05:33,080 The ciphers typically take, though, not one input, but two. 2540 02:05:33,080 --> 02:05:37,685 If, for instance, your cipher is as simple as A becomes B, 2541 02:05:37,685 --> 02:05:41,420 B becomes C, C becomes D, dot dot dot, Z becomes A, 2542 02:05:41,420 --> 02:05:45,140 you're essentially adding one to every letter and encrypting it. 2543 02:05:45,140 --> 02:05:47,750 Now that would be, what we call, the key. 2544 02:05:47,750 --> 02:05:51,470 You and the recipient both have to agree, presumably, before class, 2545 02:05:51,470 --> 02:05:55,280 in advance, what number you're going to use that day to rotate, 2546 02:05:55,280 --> 02:05:56,960 or change all of these letters by. 2547 02:05:56,960 --> 02:06:00,410 Because when you add 1, they upon receiving your ciphertext 2548 02:06:00,410 --> 02:06:03,090 have to subtract 1 to get back the answer. 2549 02:06:03,090 --> 02:06:07,730 For instance, if the input, plaintext, is hi, as before, 2550 02:06:07,730 --> 02:06:13,010 and the key is 1, the ciphertext using this simple rotational algorithm, 2551 02:06:13,010 --> 02:06:17,720 otherwise known as the Caesar cipher, might be ij exclamation point. 2552 02:06:17,720 --> 02:06:21,408 So it's similar, but it's at least scrambled at first glance. 2553 02:06:21,408 --> 02:06:23,450 And unless the teacher really cares to figure out 2554 02:06:23,450 --> 02:06:26,420 what algorithm are they using today, or what key are they using today, 2555 02:06:26,420 --> 02:06:29,700 it's probably sufficiently secure for your purposes. 2556 02:06:29,700 --> 02:06:31,160 How do you reverse the process? 2557 02:06:31,160 --> 02:06:34,190 Well, your friend gets this and reverses it by negative 1. 2558 02:06:34,190 --> 02:06:38,630 So I becomes H, J becomes I, and things like punctuation 2559 02:06:38,630 --> 02:06:41,060 remain untouched at least in this scheme. 2560 02:06:41,060 --> 02:06:43,580 So let's consider one final example here. 2561 02:06:43,580 --> 02:06:51,080 If the input to the algorithm is Uijtxbtdt50, and the key 2562 02:06:51,080 --> 02:06:53,090 this time is negative 1. 2563 02:06:53,090 --> 02:06:59,510 Such that now B should become A, and C should become B, and A should become A. 2564 02:06:59,510 --> 02:07:01,130 So we're going in the other direction. 2565 02:07:01,130 --> 02:07:03,030 How might we analyze this? 2566 02:07:03,030 --> 02:07:06,000 Well if we spread all the letters out, and we start from left to right, 2567 02:07:06,000 --> 02:07:11,780 and we start subtracting one letter, U becomes T, I becomes H, J becomes I, 2568 02:07:11,780 --> 02:07:17,220 T becomes S, X becomes W, A, was, D, T-- 2569 02:07:17,220 --> 02:07:18,270 this was CS50. 2570 02:07:18,270 --> 02:07:19,470 We'll see you next time. 2571 02:07:19,470 --> 02:07:21,320 [APPLAUSE] 2572 02:07:20,000 --> 02:07:56,000 [MUSIC PLAYING] 216690

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.