Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:00:00,000
[MUSIC PLAYING]
1
00:01:18,000 --> 00:01:20,825
DAVID MALAN: This is CS50 and this is week 2.
2
00:01:20,825 --> 00:01:23,450
Now that you have some programming experience under your belts,
3
00:01:23,450 --> 00:01:25,910
in this more arcane language called c.
4
00:01:25,910 --> 00:01:28,790
Among our goals today is to help you understand exactly what you have
5
00:01:28,790 --> 00:01:30,650
been doing these past several days.
6
00:01:30,650 --> 00:01:33,955
Wrestling with your first programs in C, so that you have more of a bottom
7
00:01:33,955 --> 00:01:36,080
up understanding of what some of these commands do.
8
00:01:36,080 --> 00:01:38,580
And, ultimately, what more we can do with this language.
9
00:01:38,580 --> 00:01:41,750
So this recall was the very first program you wrote,
10
00:01:41,750 --> 00:01:44,870
I wrote in this language called C, much more textual,
11
00:01:44,870 --> 00:01:46,970
certainly, than the Scratch equivalent.
12
00:01:46,970 --> 00:01:51,200
But at the end of the day, computers, your Mac, your PC,
13
00:01:51,200 --> 00:01:54,555
VS Code doesn't understand this actual code.
14
00:01:54,555 --> 00:01:57,680
What's the format into which we need to get any program that we write, just
15
00:01:57,680 --> 00:01:58,180
to recap?
16
00:01:58,180 --> 00:01:59,202
AUDIENCE: [INAUDIBLE]
17
00:01:59,202 --> 00:02:01,790
DAVID MALAN: So binary, otherwise known as machine code.
18
00:02:01,790 --> 00:02:02,290
Right?
19
00:02:02,290 --> 00:02:05,870
The 0s and 1s that your computer actually does understand.
20
00:02:05,870 --> 00:02:08,030
So somehow we need to get to this format.
21
00:02:08,030 --> 00:02:10,730
And up until now, we've been using this command called make,
22
00:02:10,730 --> 00:02:13,670
which is aptly named, because it lets you make programs.
23
00:02:13,670 --> 00:02:16,430
And the invocation of that has been pretty simple.
24
00:02:16,430 --> 00:02:20,450
Make hello looks in your current directory or folder for a file called
25
00:02:20,450 --> 00:02:25,100
hello.c, implicitly, and then it compiles that into a file called hello,
26
00:02:25,100 --> 00:02:27,650
which itself is executable, which just means runnable,
27
00:02:27,650 --> 00:02:29,900
so that you can then do ./hello.
28
00:02:29,900 --> 00:02:34,190
But it turns out that make is actually not a compiler itself.
29
00:02:34,190 --> 00:02:35,840
It does help you make programs.
30
00:02:35,840 --> 00:02:40,520
But make is this utility that comes on a lot of systems that makes it easier
31
00:02:40,520 --> 00:02:44,060
to actually compile code by using an actual compiler,
32
00:02:44,060 --> 00:02:48,290
the program that converts source code to machine code, on your own Mac, or PC,
33
00:02:48,290 --> 00:02:50,660
or whatever cloud environment you might be using.
34
00:02:50,660 --> 00:02:53,330
In fact, what make is doing for us, is actually,
35
00:02:53,330 --> 00:02:57,230
running a command automatically known as clang, for C language.
36
00:02:57,230 --> 00:03:01,590
And, so here, for instance, in VS Code, is that very first program again,
37
00:03:01,590 --> 00:03:03,470
this time in the context of a text editor,
38
00:03:03,470 --> 00:03:06,680
and I could compile this with make hello.
39
00:03:06,680 --> 00:03:09,567
Let me go ahead and use the compiler itself manually.
40
00:03:09,567 --> 00:03:12,650
And we'll see in a moment why we've been automating the process with make.
41
00:03:12,650 --> 00:03:15,060
I'm going to run clang instead.
42
00:03:15,060 --> 00:03:17,340
And then I'm going to run hello.c.
43
00:03:17,340 --> 00:03:19,490
So it's a little different how the compiler's used.
44
00:03:19,490 --> 00:03:22,160
It needs to know, explicitly, what the file is called.
45
00:03:22,160 --> 00:03:25,280
I'll go ahead and run clang, hello.c, Enter.
46
00:03:25,280 --> 00:03:28,415
Nothing seems to happen, which, generally speaking, is a good thing.
47
00:03:28,415 --> 00:03:29,790
Because no errors have popped up.
48
00:03:29,790 --> 00:03:36,140
And if I do ls for list, you'll see there is not a file called hello.
49
00:03:36,140 --> 00:03:39,230
But there is a curiously-named file called a.out.
50
00:03:39,230 --> 00:03:42,620
This is a historical convention, stands for assembler output.
51
00:03:42,620 --> 00:03:45,380
And this is, just, the default file name for a program
52
00:03:45,380 --> 00:03:49,400
that you might compile yourself, manually, using clang itself.
53
00:03:49,400 --> 00:03:51,830
Let me go ahead now and point out that that's
54
00:03:51,830 --> 00:03:53,340
kind of a stupid name for a program.
55
00:03:53,340 --> 00:03:56,435
Even though it works, ./a.out would work.
56
00:03:56,435 --> 00:03:59,060
But if you actually want to customize the name of your program,
57
00:03:59,060 --> 00:04:02,720
we could just resort to make, or we could do explicitly
58
00:04:02,720 --> 00:04:03,920
what make is doing for us.
59
00:04:03,920 --> 00:04:06,770
It turns out, some programs, among them make,
60
00:04:06,770 --> 00:04:08,990
support what are called command line arguments,
61
00:04:08,990 --> 00:04:10,310
and more on those later today.
62
00:04:10,310 --> 00:04:13,670
But these are literally words or numbers that you type at your prompt
63
00:04:13,670 --> 00:04:17,330
after the name of a program that just influences its behavior in some way.
64
00:04:17,330 --> 00:04:20,040
It modifies its behavior.
65
00:04:20,040 --> 00:04:22,940
And it turns out, if you read the documentation for clang,
66
00:04:22,940 --> 00:04:28,040
you can actually pass a -o, for output, command line argument, that
67
00:04:28,040 --> 00:04:30,260
lets you specify, explicitly what do you want
68
00:04:30,260 --> 00:04:31,795
your outputted program to be called?
69
00:04:31,795 --> 00:04:34,670
And then you go ahead and type the name of the file that you actually
70
00:04:34,670 --> 00:04:37,110
want to compile, from source code to machine code.
71
00:04:37,110 --> 00:04:38,720
Let me hit Enter now.
72
00:04:38,720 --> 00:04:41,990
Again, nothing seems to happen, and I type ls and voila.
73
00:04:41,990 --> 00:04:45,010
Now we still have the old a.out, because I didn't delete it yet.
74
00:04:45,010 --> 00:04:46,010
And I do have hello now.
75
00:04:46,010 --> 00:04:50,420
So ./hello, voila, runs hello, world again.
76
00:04:50,420 --> 00:04:52,160
And let me go ahead and remove this file.
77
00:04:52,160 --> 00:04:56,593
I could, of course, resort to using the Explorer, on the left hand side.
78
00:04:56,593 --> 00:04:59,510
Which, I am in the habit of closing, just to give us more room to see.
79
00:04:59,510 --> 00:05:02,240
But I could go ahead and right-click or control-click on a.out
80
00:05:02,240 --> 00:05:03,365
if I want to get rid of it.
81
00:05:03,365 --> 00:05:06,300
Or again, let me focus on the command line interface.
82
00:05:06,300 --> 00:05:07,250
And I can use--
83
00:05:07,250 --> 00:05:08,030
anyone recall?
84
00:05:08,030 --> 00:05:11,000
We didn't really use it much, but what command removes a file?
85
00:05:11,000 --> 00:05:12,665
AUDIENCE: rm.
86
00:05:12,665 --> 00:05:16,430
DAVID MALAN: So rm for remove. rm, a.out, Enter.
87
00:05:16,430 --> 00:05:20,060
Remove regular file, a.out, y for yes, enter.
88
00:05:20,060 --> 00:05:22,640
And now, if I do ls again, voila, it's gone.
89
00:05:22,640 --> 00:05:24,650
All right, so, let's now enhance this program
90
00:05:24,650 --> 00:05:30,290
to do the second version we ever did, which was to also include cs50.h,
91
00:05:30,290 --> 00:05:33,149
so that we have access to functions like, get string, and the like.
92
00:05:33,149 --> 00:05:40,340
Let me do string, name, gets, get string, what's your name,
93
00:05:40,340 --> 00:05:41,550
question mark.
94
00:05:41,550 --> 00:05:46,010
And now, let me go ahead and say hello to that name with our %s placeholder,
95
00:05:46,010 --> 00:05:46,920
comma, name.
96
00:05:46,920 --> 00:05:49,160
So this was version 2 of our program last time,
97
00:05:49,160 --> 00:05:53,300
that very easily compiled with make hello, but notice the difference now.
98
00:05:53,300 --> 00:05:56,360
If I want to compile this thing myself with clang, using
99
00:05:56,360 --> 00:05:58,520
that same lesson learned, all right, let's do it.
100
00:05:58,520 --> 00:06:05,300
clang-o, hello, just so I get a better name for the program, hello.c, Enter.
101
00:06:05,300 --> 00:06:09,750
And a new error pops up that some of you might have encountered on your own.
102
00:06:09,750 --> 00:06:13,580
So it's a bit arcane here, and there's this mention of a cryptic-looking path
103
00:06:13,580 --> 00:06:15,330
with temp for temporary there.
104
00:06:15,330 --> 00:06:18,560
But somehow, my issue's in main, as we can see here.
105
00:06:18,560 --> 00:06:20,257
It somehow relates to hello.c.
106
00:06:20,257 --> 00:06:23,090
Even though we might not have seen this language last time in class,
107
00:06:23,090 --> 00:06:25,970
but there's an undefined reference to get string.
108
00:06:25,970 --> 00:06:27,800
As though get string doesn't exist.
109
00:06:27,800 --> 00:06:31,340
Now, your first instinct might be, well maybe I forgot cs50.h, but of course,
110
00:06:31,340 --> 00:06:32,180
I didn't.
111
00:06:32,180 --> 00:06:34,310
That's the very first line of my program.
112
00:06:34,310 --> 00:06:37,910
But it turns out, make is doing something else for us, all this time.
113
00:06:37,910 --> 00:06:41,930
Just putting cs50.h, or any header file at the top of your code,
114
00:06:41,930 --> 00:06:46,730
for that matter, just teaches the compiler that a function will exist.
115
00:06:46,730 --> 00:06:49,310
It, sort of, asks the compiler to-- it asks the compiler
116
00:06:49,310 --> 00:06:52,610
to trust that I will, eventually, get around to implementing functions,
117
00:06:52,610 --> 00:06:58,130
like get string, and cs50.h, and stdio.h, printf, therein.
118
00:06:58,130 --> 00:07:03,830
But this error here, some kind of linker command, relates to the fact
119
00:07:03,830 --> 00:07:05,960
that there's a separate process for actually
120
00:07:05,960 --> 00:07:10,280
finding the 0s and 1s that cs50 compiled long ago for you.
121
00:07:10,280 --> 00:07:13,850
That authors of this operating system compiled for you, long ago,
122
00:07:13,850 --> 00:07:14,900
in the form of printf.
123
00:07:14,900 --> 00:07:17,840
We need to, somehow, tell the compiler that we
124
00:07:17,840 --> 00:07:20,450
need to link in code that someone else wrote,
125
00:07:20,450 --> 00:07:23,750
the actual machine code that someone else wrote and then compiled.
126
00:07:23,750 --> 00:07:27,497
So to do that, you'd have to type -lcs50, for instance,
127
00:07:27,497 --> 00:07:28,580
at the end of the command.
128
00:07:28,580 --> 00:07:31,548
So additionally, telling clang that, not only do you want to output
129
00:07:31,548 --> 00:07:34,340
a file called hello, and you want to compile a file called hello.c,
130
00:07:34,340 --> 00:07:39,200
you also want to quote-unquote link in a bunch of 0s and 1s
131
00:07:39,200 --> 00:07:43,010
that collectively implement get string and printf.
132
00:07:43,010 --> 00:07:47,220
So now, if I hit enter, this time it compiled OK.
133
00:07:47,220 --> 00:07:53,142
And now if I run ./hello, it works as it did last week, just like that.
134
00:07:53,142 --> 00:07:56,100
But honestly, this is just going to get really tedious, really quickly.
135
00:07:56,100 --> 00:07:57,930
Notice, already, just to compile my code,
136
00:07:57,930 --> 00:08:01,417
I have to run clang-o, hello, hello.c, lcs50,
137
00:08:01,417 --> 00:08:03,500
and you're going to have to type more things, too.
138
00:08:03,500 --> 00:08:06,890
If you wanted to use the math library, like, to use that round function,
139
00:08:06,890 --> 00:08:09,440
you would also have to do -lm, typically,
140
00:08:09,440 --> 00:08:12,890
to specify give me the math bits that someone else compiled.
141
00:08:12,890 --> 00:08:14,970
And the commands just get longer and longer.
142
00:08:14,970 --> 00:08:19,520
So moving forward, we won't have to resort to running clang itself,
143
00:08:19,520 --> 00:08:21,330
but clang is, indeed, the compiler.
144
00:08:21,330 --> 00:08:24,380
That is the program that converts from source code to machine code.
145
00:08:24,380 --> 00:08:28,438
But we'll continue to use make because it just automates that process.
146
00:08:28,438 --> 00:08:30,230
And the commands are only going to get more
147
00:08:30,230 --> 00:08:34,640
cryptic the more sophisticated and more feature full year programs get.
148
00:08:34,640 --> 00:08:39,620
And make, again, is just a tool that makes all that happen.
149
00:08:39,620 --> 00:08:44,300
Let me pause there to see if there's any questions before then we
150
00:08:44,300 --> 00:08:45,890
take a look further under the hood.
151
00:08:45,890 --> 00:08:47,185
Yeah, in front.
152
00:08:47,185 --> 00:08:50,185
AUDIENCE: Can you explain again what the -lcs50-- just why you put that?
153
00:08:50,185 --> 00:08:52,518
DAVID MALAN: Sure, let me come back to that in a moment.
154
00:08:52,518 --> 00:08:53,750
What does the -lcs50 mean?
155
00:08:53,750 --> 00:08:55,917
We'll come back to that, visually, in just a moment.
156
00:08:55,917 --> 00:08:58,850
But it means to link in the 0s and 1s that collectively
157
00:08:58,850 --> 00:09:00,435
implement get string and printf.
158
00:09:00,435 --> 00:09:02,060
But we'll see that, visually, in a sec.
159
00:09:02,060 --> 00:09:03,341
Yeah, behind you.
160
00:09:03,341 --> 00:09:07,073
AUDIENCE: [INAUDIBLE].
161
00:09:07,073 --> 00:09:08,490
DAVID MALAN: Really good question.
162
00:09:08,490 --> 00:09:10,850
How come I didn't have to link in standard I/O?
163
00:09:10,850 --> 00:09:12,950
Because I used printf in version 1.
164
00:09:12,950 --> 00:09:16,280
Standard I/O is just, literally, so standard that it's built in,
165
00:09:16,280 --> 00:09:17,480
it just works for free.
166
00:09:17,480 --> 00:09:18,800
CS50, of course, is not.
167
00:09:18,800 --> 00:09:21,080
It did not come with the language C or the compiler.
168
00:09:21,080 --> 00:09:22,250
We ourselves wrote it.
169
00:09:22,250 --> 00:09:26,600
And other libraries, even though they might come with the language C,
170
00:09:26,600 --> 00:09:30,600
they might not be enabled by default, generally for efficiency purposes.
171
00:09:30,600 --> 00:09:33,470
So you're not loading more 0s and 1s into the computer's memory
172
00:09:33,470 --> 00:09:34,280
than you need to.
173
00:09:34,280 --> 00:09:37,250
So standard I/O is special, if you will.
174
00:09:37,250 --> 00:09:38,510
Other questions?
175
00:09:38,510 --> 00:09:39,500
Yeah?
176
00:09:39,500 --> 00:09:41,420
AUDIENCE: [INAUDIBLE]
177
00:09:41,420 --> 00:09:43,160
DAVID MALAN: Oh, what does the -o mean?
178
00:09:43,160 --> 00:09:46,190
So -o is shorthand for the English word output,
179
00:09:46,190 --> 00:09:51,260
and so -o is telling clang to please output a file called hello,
180
00:09:51,260 --> 00:09:53,850
because the next thing I wrote after the command line
181
00:09:53,850 --> 00:09:59,929
recall was clang -o hello, then the name of the file, then -lcs50.
182
00:09:59,929 --> 00:10:03,407
And this is where these commands do get and stay fairly arcane.
183
00:10:03,407 --> 00:10:05,240
It's just through muscle memory and practice
184
00:10:05,240 --> 00:10:07,610
that you'll start to remember, oh what are the other commands that you--
185
00:10:07,610 --> 00:10:10,277
what are the command line arguments you can provide to programs?
186
00:10:10,277 --> 00:10:11,570
But we've seen this before.
187
00:10:11,570 --> 00:10:14,780
Technically, when you run make hello, the program is called make,
188
00:10:14,780 --> 00:10:16,980
hello is the command line argument.
189
00:10:16,980 --> 00:10:19,040
It's an input to the make function, albeit,
190
00:10:19,040 --> 00:10:22,250
typed at the prompt, that tells make what you want to make.
191
00:10:22,250 --> 00:10:26,180
Even when I used rm a moment ago, and did rm of a.out,
192
00:10:26,180 --> 00:10:28,280
the command line argument there was called a.out
193
00:10:28,280 --> 00:10:30,740
and it's telling rm what to delete.
194
00:10:30,740 --> 00:10:35,270
It is entirely dependent on the programs to decide what their conventions are,
195
00:10:35,270 --> 00:10:38,090
whether you use dash this or dash that, but we'll
196
00:10:38,090 --> 00:10:40,805
see over time, which ones actually matter in practice.
197
00:10:40,805 --> 00:10:46,220
So to come back to the first question about what actually is happening there,
198
00:10:46,220 --> 00:10:48,562
let's consider the code more closely.
199
00:10:48,562 --> 00:10:50,270
So here is that first version of the code
200
00:10:50,270 --> 00:10:54,590
again, with stdio.h and only printf, so no cs50 stuff yet.
201
00:10:54,590 --> 00:10:56,840
Until we add it back in and had the second version,
202
00:10:56,840 --> 00:10:59,630
where we actually get the human's name.
203
00:10:59,630 --> 00:11:02,783
When you run this command, there's a few things
204
00:11:02,783 --> 00:11:04,700
that are happening underneath the hood, and we
205
00:11:04,700 --> 00:11:06,650
won't dwell on these kinds of details, indeed,
206
00:11:06,650 --> 00:11:08,870
we'll abstract it away by using make.
207
00:11:08,870 --> 00:11:10,940
But it's worth understanding from the get-go,
208
00:11:10,940 --> 00:11:13,880
how much automation is going on, so that when you run these commands,
209
00:11:13,880 --> 00:11:14,850
it's not magic.
210
00:11:14,850 --> 00:11:17,940
You have this bottom-up understanding of what's going on.
211
00:11:17,940 --> 00:11:21,530
So when we say you've been compiling your code with make,
212
00:11:21,530 --> 00:11:23,600
that's a bit of an oversimplification.
213
00:11:23,600 --> 00:11:26,780
Technically, every time you compile your code,
214
00:11:26,780 --> 00:11:29,570
you're having the computer do four distinct things for you.
215
00:11:29,570 --> 00:11:33,020
And this is not four distinct things that you need to memorize and remember
216
00:11:33,020 --> 00:11:35,180
every time you run your program, what's happening,
217
00:11:35,180 --> 00:11:37,820
but it helps to break it down into building blocks,
218
00:11:37,820 --> 00:11:42,110
as to how we're getting from source code, like C, into 0s and 1s.
219
00:11:42,110 --> 00:11:46,640
It turns out, that when you compile, quote-unquote, "your code," technically
220
00:11:46,640 --> 00:11:50,510
speaking, you're doing four things automatically, and all at once.
221
00:11:50,510 --> 00:11:53,960
Preprocessing it, compiling it, assembling it, and linking it.
222
00:11:53,960 --> 00:11:57,350
Just humans decided, let's just call the whole process compiling.
223
00:11:57,350 --> 00:12:00,230
But for a moment, let's consider what these steps are.
224
00:12:00,230 --> 00:12:02,690
So preprocessing refers to this.
225
00:12:02,690 --> 00:12:06,710
If we look at our source code, version 2 that uses the cs50 library
226
00:12:06,710 --> 00:12:10,442
and therefore get string, notice that we have these include lines at top.
227
00:12:10,442 --> 00:12:12,650
And they're kind of special versus all the other code
228
00:12:12,650 --> 00:12:15,710
we've written, because they start with hash symbols, specifically.
229
00:12:15,710 --> 00:12:17,660
And that's sort of a special syntax that means
230
00:12:17,660 --> 00:12:20,600
that these are, technically, called preprocessor directives.
231
00:12:20,600 --> 00:12:25,290
Fancy way of saying they're handled special versus the rest of your code.
232
00:12:25,290 --> 00:12:29,870
In fact, if we focus on cs50.h, recall from last week
233
00:12:29,870 --> 00:12:35,870
that I provided a hint as to what's actually in cs50.h, among other things.
234
00:12:35,870 --> 00:12:40,580
What was the one salient thing that I said was in cs50.h and therefore,
235
00:12:40,580 --> 00:12:43,475
why we were including it in the first place?
236
00:12:43,475 --> 00:12:44,350
AUDIENCE: Get string?
237
00:12:44,350 --> 00:12:46,850
DAVID MALAN: So get string, specifically,
238
00:12:46,850 --> 00:12:49,160
the prototype for get string.
239
00:12:49,160 --> 00:12:51,410
We haven't made many of our own functions yet,
240
00:12:51,410 --> 00:12:53,840
but recall that any time we've made our own functions,
241
00:12:53,840 --> 00:12:56,330
and we've written them below main in a file,
242
00:12:56,330 --> 00:12:58,790
we've also had to, somewhat stupidly, copy paste
243
00:12:58,790 --> 00:13:01,370
the prototype of the function at the top of the file,
244
00:13:01,370 --> 00:13:05,210
just to teach the compiler that this function doesn't exist, yet,
245
00:13:05,210 --> 00:13:07,430
it does down there, but it will exist.
246
00:13:07,430 --> 00:13:08,300
Just trust me.
247
00:13:08,300 --> 00:13:10,980
So again, that's what these prototypes are doing for us.
248
00:13:10,980 --> 00:13:13,340
So therefore, in my code, If I want to use
249
00:13:13,340 --> 00:13:16,760
a function like get string, or printf, for that matter,
250
00:13:16,760 --> 00:13:19,150
they're not implemented clearly in the same file,
251
00:13:19,150 --> 00:13:20,400
they're implemented elsewhere.
252
00:13:20,400 --> 00:13:22,692
So I need to tell the compiler to trust me that they're
253
00:13:22,692 --> 00:13:24,000
implemented somewhere else.
254
00:13:24,000 --> 00:13:26,810
And so technically, inside of cs50.h, which
255
00:13:26,810 --> 00:13:30,410
is installed somewhere in the cloud's hard drive, so to speak,
256
00:13:30,410 --> 00:13:34,820
that you all are accessing via VS Code, there's a line that looks like this.
257
00:13:34,820 --> 00:13:38,870
A prototype for the get string function that says the name of the functions
258
00:13:38,870 --> 00:13:42,830
get string, it takes one input, or argument, called prompt,
259
00:13:42,830 --> 00:13:45,710
and that type of that prompt is a string.
260
00:13:45,710 --> 00:13:51,150
Get string, not surprisingly, has a return value and it returns a string.
261
00:13:51,150 --> 00:13:54,800
So literally, that line and a bunch of others, are in cs50.h.
262
00:13:54,800 --> 00:13:58,280
So rather than you all having to copy paste the prototype,
263
00:13:58,280 --> 00:14:01,160
you can just trust that cs50 figured out what it is.
264
00:14:01,160 --> 00:14:04,970
You can include cs50.h and the compiler is going
265
00:14:04,970 --> 00:14:07,420
to go find that prototype for you.
266
00:14:07,420 --> 00:14:09,480
Same thing in standard I/O. Someone else-- what
267
00:14:09,480 --> 00:14:13,620
must clearly be in stdio.h, among other stuff, that
268
00:14:13,620 --> 00:14:17,590
motivates our including stdio.h, too?
269
00:14:17,590 --> 00:14:18,090
Yeah?
270
00:14:18,090 --> 00:14:18,798
AUDIENCE: Printf.
271
00:14:18,798 --> 00:14:21,030
DAVID MALAN: Printf, the prototype for printf,
272
00:14:21,030 --> 00:14:24,010
and I'll just change it here in yellow, to be the same.
273
00:14:24,010 --> 00:14:25,410
And it turns out, the format--
274
00:14:25,410 --> 00:14:28,590
the prototype for printf is, actually, pretty fancy,
275
00:14:28,590 --> 00:14:31,740
because, as you might have noticed, printf can take one argument, just
276
00:14:31,740 --> 00:14:35,910
something to print, 2, if you want to plug a value into it, 3 or more.
277
00:14:35,910 --> 00:14:38,620
So the dot dot dot just represents exactly that.
278
00:14:38,620 --> 00:14:42,330
It's not quite as simple a prototype as get strain, but more on that
279
00:14:42,330 --> 00:14:43,115
another time.
280
00:14:43,115 --> 00:14:46,050
So what does it mean to preprocess your code?
281
00:14:46,050 --> 00:14:49,860
The very first thing the compiler, clang, in this case,
282
00:14:49,860 --> 00:14:54,270
is doing for you when it reads your code top-to-bottom, left-to-right, is it
283
00:14:54,270 --> 00:14:57,960
notices, oh, here is hash include, oh, here's another hash include.
284
00:14:57,960 --> 00:15:03,090
And it, essentially, finds those files on the hard drive, cs50.h, stdio.h,
285
00:15:03,090 --> 00:15:06,990
and does the equivalent of copying and pasting them automatically
286
00:15:06,990 --> 00:15:09,360
into your code at the very top.
287
00:15:09,360 --> 00:15:12,450
Thereby teaching the compiler that gets string and printf
288
00:15:12,450 --> 00:15:14,430
will eventually exist somewhere.
289
00:15:14,430 --> 00:15:18,480
So that's the preprocessing step, whereby, again, it's
290
00:15:18,480 --> 00:15:22,080
just doing a find-and-replace of anything that starts with hash include.
291
00:15:22,080 --> 00:15:24,510
It's plugging in the files there so that you, essentially,
292
00:15:24,510 --> 00:15:27,780
get all the prototypes you need automatically.
293
00:15:27,780 --> 00:15:28,830
OK.
294
00:15:28,830 --> 00:15:31,230
What does it mean, then, to compile the results?
295
00:15:31,230 --> 00:15:33,450
Because at this point in the story, your code
296
00:15:33,450 --> 00:15:35,678
now looks like this in the computer's memory.
297
00:15:35,678 --> 00:15:37,470
It doesn't change your file, it's doing all
298
00:15:37,470 --> 00:15:39,990
of this in the computer's memory, or RAM, for you.
299
00:15:39,990 --> 00:15:42,070
But it, essentially, looks like this.
300
00:15:42,070 --> 00:15:45,600
Well the next step is what's, technically, really compiling.
301
00:15:45,600 --> 00:15:48,420
Even though again, we use compile as an umbrella term.
302
00:15:48,420 --> 00:15:51,510
Compiling code in C means to take code that
303
00:15:51,510 --> 00:15:53,740
now looks like this in the computer's memory
304
00:15:53,740 --> 00:15:56,890
and turn it into something that looks like this.
305
00:15:56,890 --> 00:15:58,350
Which is way more cryptic.
306
00:15:58,350 --> 00:16:00,990
But it was just a few decades ago that, if you
307
00:16:00,990 --> 00:16:03,930
were taking a class like CS50 in its earlier form,
308
00:16:03,930 --> 00:16:07,740
we wouldn't be using C it didn't exist yet, we would actually be using this,
309
00:16:07,740 --> 00:16:09,690
something called assembly language.
310
00:16:09,690 --> 00:16:13,230
And there's different types of, or flavors of, assembly language.
311
00:16:13,230 --> 00:16:17,010
But this is about as low level as you can get to what a computer really
312
00:16:17,010 --> 00:16:19,410
understands, be it a Mac, or PC, or a phone,
313
00:16:19,410 --> 00:16:22,650
before you start getting into actual 0s and 1s.
314
00:16:22,650 --> 00:16:24,013
And most of this is cryptic.
315
00:16:24,013 --> 00:16:27,180
I couldn't tell you what this is doing unless I thought it through carefully
316
00:16:27,180 --> 00:16:30,300
and rewound mentally, years ago, from having studied it,
317
00:16:30,300 --> 00:16:32,880
but let's highlight a few key words in yellow.
318
00:16:32,880 --> 00:16:37,380
Notice that this assembly language that the computer is outputting
319
00:16:37,380 --> 00:16:40,530
for you automatically, still has mention of main
320
00:16:40,530 --> 00:16:43,290
and it has mention of get string, and it has mention of printf.
321
00:16:43,290 --> 00:16:46,358
So there's some relationship to the C code we saw a moment ago.
322
00:16:46,358 --> 00:16:48,150
And then if I highlight these other things,
323
00:16:48,150 --> 00:16:50,430
these are what are called computer instructions.
324
00:16:50,430 --> 00:16:52,740
At the end of the day, your Mac, your PC,
325
00:16:52,740 --> 00:16:56,340
your phone actually only understands very basic instructions,
326
00:16:56,340 --> 00:17:01,020
like addition, subtraction, division, multiplication, move into memory,
327
00:17:01,020 --> 00:17:06,190
load from memory, print something to the screen, very basic operations.
328
00:17:06,190 --> 00:17:07,755
And that's what you're seeing here.
329
00:17:07,755 --> 00:17:12,750
These assembly instructions are what the computer actually
330
00:17:12,750 --> 00:17:16,870
feeds into the brains of the computer, the CPU, the central processing unit.
331
00:17:16,870 --> 00:17:19,770
And it's that Intel CPU, or whatever you have,
332
00:17:19,770 --> 00:17:23,220
that understands this instruction, and this one, and this one, and this one.
333
00:17:23,220 --> 00:17:25,860
And collectively, long story short, all they do
334
00:17:25,860 --> 00:17:28,620
is print hello, world on the screen, but in a way
335
00:17:28,620 --> 00:17:31,910
that the machine understands how to do.
336
00:17:31,910 --> 00:17:34,500
So let me pause here.
337
00:17:34,500 --> 00:17:37,010
Are there any questions on what we mean by preprocessing?
338
00:17:37,010 --> 00:17:40,850
Which finds and replaces the hash includes symbols, among others,
339
00:17:40,850 --> 00:17:44,450
and compiling, which technically takes your source code,
340
00:17:44,450 --> 00:17:48,170
once preprocessed, and converts it to that stuff called assembly language.
341
00:17:48,170 --> 00:17:50,342
AUDIENCE: [INAUDIBLE] each CPU has--
342
00:17:50,342 --> 00:17:51,290
DAVID MALAN: Correct.
343
00:17:51,290 --> 00:17:54,710
Each type of CPU has its own instruction set.
344
00:17:54,710 --> 00:17:55,280
Indeed.
345
00:17:55,280 --> 00:17:58,970
And as a teaser, this is why, at least back in the day, when
346
00:17:58,970 --> 00:18:02,900
we used to install software from CD-ROMs, or some other type of media,
347
00:18:02,900 --> 00:18:08,222
this is why you can't take a program that was sold for a Windows computer
348
00:18:08,222 --> 00:18:09,680
and run it on a Mac, or vice-versa.
349
00:18:09,680 --> 00:18:14,420
Because the commands, the instructions that those two products understand,
350
00:18:14,420 --> 00:18:15,500
are actually different.
351
00:18:15,500 --> 00:18:20,150
Now Microsoft, or any company, could generally write code in one language,
352
00:18:20,150 --> 00:18:24,109
like C or another, and they can compile it twice, saving a PC version
353
00:18:24,109 --> 00:18:25,790
and saving a Mac version.
354
00:18:25,790 --> 00:18:30,109
It's twice as much work and sometimes you get into some incompatibilities,
355
00:18:30,109 --> 00:18:33,140
but that's why these steps are somewhat distinct.
356
00:18:33,140 --> 00:18:36,710
You can now use the same code and support even different platforms,
357
00:18:36,710 --> 00:18:37,940
or systems, if you'd want.
358
00:18:37,940 --> 00:18:38,440
All right.
359
00:18:38,440 --> 00:18:39,650
Assembly, assembling.
360
00:18:39,650 --> 00:18:42,800
Thankfully, this part is fairly straightforward, at least, in concept.
361
00:18:42,800 --> 00:18:46,250
To assemble code, which is step three of four, that is just
362
00:18:46,250 --> 00:18:50,360
happening for you every time you run make or, in turn, clang,
363
00:18:50,360 --> 00:18:53,570
this assembly language, which the computer generated automatically
364
00:18:53,570 --> 00:18:57,080
for you from your source code, is turned into 0s and 1s.
365
00:18:57,080 --> 00:19:00,783
So that's the step that, last week, I simplified and said,
366
00:19:00,783 --> 00:19:03,950
when you compile your code, you convert it to source code-- from source code
367
00:19:03,950 --> 00:19:04,970
to machine code.
368
00:19:04,970 --> 00:19:07,685
Technically, that happens when you assemble your code.
369
00:19:07,685 --> 00:19:10,940
But no one in normal conversations says that, they just
370
00:19:10,940 --> 00:19:13,280
say compile for all of these terms.
371
00:19:13,280 --> 00:19:14,310
All right.
372
00:19:14,310 --> 00:19:17,450
So that's assembling.
373
00:19:17,450 --> 00:19:19,070
There's one final step.
374
00:19:19,070 --> 00:19:22,400
Even in this simple program of getting the user's name
375
00:19:22,400 --> 00:19:27,120
and then plugging it into printf, I'm using three different people's code,
376
00:19:27,120 --> 00:19:27,620
if you will.
377
00:19:27,620 --> 00:19:30,200
My own, which is in hello.c.
378
00:19:30,200 --> 00:19:35,600
Some of CS50s, which is in hello.c, sorry-- which
379
00:19:35,600 --> 00:19:39,080
is in cs50.c, which is not a file I've mentioned, yet,
380
00:19:39,080 --> 00:19:43,220
but it stands to reason, that if there's a cs50.h that has prototypes,
381
00:19:43,220 --> 00:19:45,380
turns out, the actual implementation of get string
382
00:19:45,380 --> 00:19:47,600
and other things are in cs50.c.
383
00:19:47,600 --> 00:19:51,290
And there's a third file somewhere on the hard drive
384
00:19:51,290 --> 00:19:54,260
that's involved in compiling even this simple program.
385
00:19:54,260 --> 00:19:59,971
hello.c, cs50.c, and by that logic, what might the other be?
386
00:19:59,971 --> 00:20:00,471
Yeah?
387
00:20:00,471 --> 00:20:02,275
AUDIENCE: stdio?
388
00:20:02,275 --> 00:20:03,600
DAVID MALAN: Stdio.c.
389
00:20:03,600 --> 00:20:06,690
And that's a bit of a white lie, because that's such a big, fancy library
390
00:20:06,690 --> 00:20:09,750
that there's actually multiple files that compose it, but the same idea,
391
00:20:09,750 --> 00:20:11,380
and we'll take the simplification.
392
00:20:11,380 --> 00:20:16,200
So when I have this code, and I compile my code,
393
00:20:16,200 --> 00:20:21,300
I get those 0s and 1s that end up taking hello.c and turning it, effectively,
394
00:20:21,300 --> 00:20:26,830
into 0s and 1s that are combined with cs50.c, followed by stdio.c as well.
395
00:20:26,830 --> 00:20:27,840
So let me rewind here.
396
00:20:27,840 --> 00:20:33,300
Here might be the 0s and 1s for my code, the two lines of code that I wrote.
397
00:20:33,300 --> 00:20:37,920
Here might be the 0s and 1s for what cs50 wrote some years ago in cs50.c.
398
00:20:37,920 --> 00:20:42,210
Here might be the 0s and 1s that someone wrote for standard I/O decades ago.
399
00:20:42,210 --> 00:20:45,720
The last and final step is that linking command
400
00:20:45,720 --> 00:20:48,330
that links all of these 0s and 1s together,
401
00:20:48,330 --> 00:20:53,820
essentially stitches them together into one single file called hello,
402
00:20:53,820 --> 00:20:56,385
or called a.out, whatever you name it.
403
00:20:56,385 --> 00:21:01,650
That last step is what combines all of these different programmers' 0s and 1s.
404
00:21:01,650 --> 00:21:04,050
And my God, now we're really in the weeds.
405
00:21:04,050 --> 00:21:07,020
Who wants to even think about running code at this level?
406
00:21:07,020 --> 00:21:08,160
You shouldn't need to.
407
00:21:08,160 --> 00:21:09,180
But it's not magic.
408
00:21:09,180 --> 00:21:11,748
When you're running make, there's some very concrete steps
409
00:21:11,748 --> 00:21:14,290
that are happening that humans have developed over the years,
410
00:21:14,290 --> 00:21:17,700
over the decades, that breakdown this big problem of source code going
411
00:21:17,700 --> 00:21:22,410
to 0s and 1s, or machine code, into these very specific steps.
412
00:21:22,410 --> 00:21:26,100
But henceforth, you can call all of this compiling.
413
00:21:26,100 --> 00:21:27,120
Questions?
414
00:21:27,120 --> 00:21:27,780
Or confusion?
415
00:21:27,780 --> 00:21:28,596
Yeah?
416
00:21:28,596 --> 00:21:30,804
AUDIENCE: Can you explain again what a.out signifies?
417
00:21:30,804 --> 00:21:31,770
DAVID MALAN: Sure.
418
00:21:31,770 --> 00:21:33,270
What does a.out signify?
419
00:21:33,270 --> 00:21:37,890
a.out is just the conventional, default file name for any program
420
00:21:37,890 --> 00:21:41,280
that you compile directly with a compiler, like clang.
421
00:21:41,280 --> 00:21:43,680
It's a meaningless name, though.
422
00:21:43,680 --> 00:21:47,250
It stands for assembler output, and assembler might now sound familiar
423
00:21:47,250 --> 00:21:48,690
from this assembling process.
424
00:21:48,690 --> 00:21:51,150
It's a lame name for a computer program, and we
425
00:21:51,150 --> 00:21:56,450
can override it by outputting something like hello, instead.
426
00:21:56,450 --> 00:21:57,317
Yeah?
427
00:21:57,317 --> 00:22:03,426
AUDIENCE: [INAUDIBLE]
428
00:22:03,426 --> 00:22:07,860
DAVID MALAN: To recap, there are other prototypes in those files,
429
00:22:07,860 --> 00:22:11,910
cs50.h, stdio.h, technically, they're all included on top of your file,
430
00:22:11,910 --> 00:22:14,460
even though you, strictly speaking, don't need most of them,
431
00:22:14,460 --> 00:22:18,190
but they are there, just in case you might want them.
432
00:22:18,190 --> 00:22:19,660
And finally, any other questions?
433
00:22:19,660 --> 00:22:20,160
Yeah?
434
00:22:20,160 --> 00:22:23,878
AUDIENCE: [INAUDIBLE]
435
00:22:23,878 --> 00:22:26,920
DAVID MALAN: Does it matter what order we're telling the computer to run?
436
00:22:26,920 --> 00:22:29,140
Sometimes with libraries, yes, it matters
437
00:22:29,140 --> 00:22:31,520
what order they are linked in together.
438
00:22:31,520 --> 00:22:34,330
But for our purposes, it's really not going to matter.
439
00:22:34,330 --> 00:22:38,750
It's going to-- make is going to take care of automating that process for us.
440
00:22:38,750 --> 00:22:39,250
All right.
441
00:22:39,250 --> 00:22:41,795
So with that said, henceforth, compiling, technically,
442
00:22:41,795 --> 00:22:42,670
is these four things.
443
00:22:42,670 --> 00:22:46,690
But we'll focus on it as a higher level concept, an abstraction,
444
00:22:46,690 --> 00:22:49,880
known as compiling itself.
445
00:22:49,880 --> 00:22:52,510
So another process that we'll now begin to focus on all the
446
00:22:52,510 --> 00:22:55,690
more this week because, invariably, this past week you ran against--
447
00:22:55,690 --> 00:22:57,160
ran up against some challenges.
448
00:22:57,160 --> 00:23:00,550
You probably created your very first bugs, or mistakes, in a program
449
00:23:00,550 --> 00:23:03,940
and so let's focus for a moment on actual techniques for debugging.
450
00:23:03,940 --> 00:23:07,060
As you spend more time this semester, in the years
451
00:23:07,060 --> 00:23:10,270
to come If you continue to program, you're never, frankly, probably,
452
00:23:10,270 --> 00:23:13,577
going to write bug free code, ultimately.
453
00:23:13,577 --> 00:23:16,660
Though your programs are going to get more featureful, more sophisticated,
454
00:23:16,660 --> 00:23:20,230
and we're all going to start to make more sophisticated mistakes.
455
00:23:20,230 --> 00:23:22,570
And to this day, I write buggy code all the time.
456
00:23:22,570 --> 00:23:24,520
And I'm always horrified when I do it up here.
457
00:23:24,520 --> 00:23:26,620
But hopefully, that won't happen too often.
458
00:23:26,620 --> 00:23:30,100
But when it does, it's a process, now, of debugging, trying
459
00:23:30,100 --> 00:23:32,230
to find the mistakes in your program.
460
00:23:32,230 --> 00:23:35,600
You don't have to stare at your code, or shake your fist at your code.
461
00:23:35,600 --> 00:23:38,590
There are actual tools that real world programmers
462
00:23:38,590 --> 00:23:41,860
use to help debug their code and find these faults.
463
00:23:41,860 --> 00:23:44,455
So what are some of the techniques and tools that folks use?
464
00:23:44,455 --> 00:23:49,440
Well as an aside, if you've ever--
465
00:23:49,440 --> 00:23:52,840
a bug in a program is a mistake, that's been around for some time.
466
00:23:52,840 --> 00:23:58,010
If you've ever heard this tale, some 50 plus years ago, in 1947.
467
00:23:58,010 --> 00:24:02,770
This is an entry in a log book written by a famous computer scientist known
468
00:24:02,770 --> 00:24:05,230
as-- named Grace Hopper, who happened to be the one
469
00:24:05,230 --> 00:24:09,345
to record the very first discovery of a quote-unquote actual bug in a computer.
470
00:24:09,345 --> 00:24:11,860
This was like a moth that had flown into,
471
00:24:11,860 --> 00:24:17,080
at the time, a very sophisticated system known as the Harvard Mark II computer,
472
00:24:17,080 --> 00:24:20,050
very large, refrigerator-sized type systems,
473
00:24:20,050 --> 00:24:24,160
in which an actual bug caused an issue.
474
00:24:24,160 --> 00:24:27,190
The etymology of bug though, predates this particular instance,
475
00:24:27,190 --> 00:24:30,580
but here you have, as any computer scientists might know, the example
476
00:24:30,580 --> 00:24:32,845
of a first physical bug in a computer.
477
00:24:32,845 --> 00:24:35,322
How, though, do you go about removing such a thing?
478
00:24:35,322 --> 00:24:37,780
Well, let's consider a very simple scenario from last time,
479
00:24:37,780 --> 00:24:40,780
for instance, when we were trying to print out various aspects of Mario,
480
00:24:40,780 --> 00:24:42,970
like this column of 3 bricks.
481
00:24:42,970 --> 00:24:46,660
Let's consider how I might go about implementing a program like this.
482
00:24:46,660 --> 00:24:51,130
Let me switch back over to VS Code here, and I'm going to run--
483
00:24:51,130 --> 00:24:52,750
write a program.
484
00:24:52,750 --> 00:24:54,640
And I'm not going to trust myself, so I'm
485
00:24:54,640 --> 00:24:56,507
going to call it buggy.c from the get-go,
486
00:24:56,507 --> 00:24:58,340
knowing that I'm going to mess something up.
487
00:24:58,340 --> 00:25:01,150
But I'm going to go ahead and include stdio.h.
488
00:25:01,150 --> 00:25:03,940
And I'm going to define main, as usual.
489
00:25:03,940 --> 00:25:05,950
So hopefully, no mistakes just yet.
490
00:25:05,950 --> 00:25:08,710
And now, I want to print those 3 bricks on the screen using
491
00:25:08,710 --> 00:25:10,270
just hashes for bricks.
492
00:25:10,270 --> 00:25:16,420
So how about 4 int i get 0, i less than or equal to 3, i plus plus.
493
00:25:16,420 --> 00:25:18,280
Now, inside of my curly braces, I'm going
494
00:25:18,280 --> 00:25:23,960
to go ahead and print out a hash followed by a backslash n, semicolon.
495
00:25:23,960 --> 00:25:27,975
All right, saving the file, doing make, buggy, Enter, it compiles.
496
00:25:27,975 --> 00:25:33,340
So there's no syntactical errors, my code is syntactically correct.
497
00:25:33,340 --> 00:25:36,640
But some of you have probably seen the logical error already,
498
00:25:36,640 --> 00:25:39,370
because when I run this program I don't get
499
00:25:39,370 --> 00:25:45,430
this picture, which was 3 bricks high, I seem to have 4 bricks instead.
500
00:25:45,430 --> 00:25:47,930
Now, this might be jumping out at you, why it's happening,
501
00:25:47,930 --> 00:25:49,930
but I've kept the program simple just so that we
502
00:25:49,930 --> 00:25:54,010
don't have to find an actual bug, we can use a tool to find one that we already
503
00:25:54,010 --> 00:25:55,970
know about, in this case.
504
00:25:55,970 --> 00:25:59,050
What might be the first strategy for finding a bug like this,
505
00:25:59,050 --> 00:26:03,292
rather than staring at your code, asking a question, trying to think
506
00:26:03,292 --> 00:26:04,125
through the problem?
507
00:26:04,125 --> 00:26:07,690
Well, let's actually try to diagnose the problem more proactively.
508
00:26:07,690 --> 00:26:10,420
And the simplest way to do this now, and years from now,
509
00:26:10,420 --> 00:26:13,870
is, honestly, going to be to use a function like printf.
510
00:26:13,870 --> 00:26:15,790
Printf is a wonderfully useful function, not
511
00:26:15,790 --> 00:26:18,550
for formatting-- printing formatted strings and all that, for
512
00:26:18,550 --> 00:26:21,430
just looking inside the values of variables
513
00:26:21,430 --> 00:26:24,352
that you might be curious about to see what's going on.
514
00:26:24,352 --> 00:26:25,060
So you know what?
515
00:26:25,060 --> 00:26:26,320
Let me do this.
516
00:26:26,320 --> 00:26:29,110
I see that there's 4 coming out, but I intended 3.
517
00:26:29,110 --> 00:26:31,740
So clearly, something's wrong with my i variables.
518
00:26:31,740 --> 00:26:34,090
So let me be a little more pedantic.
519
00:26:34,090 --> 00:26:37,300
Let me go inside of this loop and, temporarily,
520
00:26:37,300 --> 00:26:40,480
say something explicit, like, i is--
521
00:26:40,480 --> 00:26:45,200
&i /n, and then just plug in the value of i.
522
00:26:45,200 --> 00:26:45,700
Right?
523
00:26:45,700 --> 00:26:48,970
This is not the program I want to write, it's the program I'm temporarily
524
00:26:48,970 --> 00:26:54,400
writing, because now I'm going to say make buggy, ./buggy.
525
00:26:54,400 --> 00:26:56,500
And if I look, now, at the output, I have
526
00:26:56,500 --> 00:27:01,090
some helpful diagnostic information. i is 0, and I get a hash, i is 1,
527
00:27:01,090 --> 00:27:03,610
and I get a hash, 2 and I get a hash, 3 and I get hash.
528
00:27:03,610 --> 00:27:04,527
OK, wait a minute.
529
00:27:04,527 --> 00:27:06,610
I'm clearly going too many steps because, maybe, I
530
00:27:06,610 --> 00:27:09,250
forgot that computers are, essentially, counting from 0,
531
00:27:09,250 --> 00:27:11,450
and now, oh, it's less than or equal to.
532
00:27:11,450 --> 00:27:13,030
Now you see it, right?
533
00:27:13,030 --> 00:27:15,940
Again, trivial example, but just by using printf,
534
00:27:15,940 --> 00:27:18,910
you can see inside of the computer's memory
535
00:27:18,910 --> 00:27:21,130
by just printing stuff out like this.
536
00:27:21,130 --> 00:27:25,770
And now, once you've figured it out, oh, so this should probably be less than 3,
537
00:27:25,770 --> 00:27:28,140
or I should start counting from 1, there's
538
00:27:28,140 --> 00:27:29,640
any number of ways I could fix this.
539
00:27:29,640 --> 00:27:32,655
But the most conventional is probably just to say less than 3.
540
00:27:32,655 --> 00:27:39,180
Now, I can delete my temporary print statement, rerun make buggy, ./buggy.
541
00:27:39,180 --> 00:27:41,790
And, voila, problem solved.
542
00:27:41,790 --> 00:27:43,830
All right, and to this day, I do this.
543
00:27:43,830 --> 00:27:46,860
Whether it's making a command line application, or a web application,
544
00:27:46,860 --> 00:27:49,050
or mobile application, It's very common to use
545
00:27:49,050 --> 00:27:51,270
printf, or some equivalent in any language,
546
00:27:51,270 --> 00:27:55,350
just to poke around and see what's inside the computer's memory.
547
00:27:55,350 --> 00:27:58,570
Thankfully, there's more sophisticated tools than this.
548
00:27:58,570 --> 00:28:00,930
Let me go ahead and reintroduce the bug here.
549
00:28:00,930 --> 00:28:04,620
And let me reopen my sidebar at left here.
550
00:28:04,620 --> 00:28:08,550
Let me now recompile the code to make sure it's current.
551
00:28:08,550 --> 00:28:11,310
And I'm going to run a command called debug50.
552
00:28:11,310 --> 00:28:15,090
Which is a command that's representative of a type of program
553
00:28:15,090 --> 00:28:16,740
known as a debugger.
554
00:28:16,740 --> 00:28:19,680
And this debugger is actually built into VS Code.
555
00:28:19,680 --> 00:28:23,700
And all debug50 is doing for us is automating the process of starting
556
00:28:23,700 --> 00:28:25,650
VS Code's built-in debugger.
557
00:28:25,650 --> 00:28:28,260
So this isn't even a CS50-specific tool, we've
558
00:28:28,260 --> 00:28:31,170
just given you a debug50 command to make it easier
559
00:28:31,170 --> 00:28:32,855
to start it up from the get-go.
560
00:28:32,855 --> 00:28:37,560
And the way you run this debugger is you say debug50, space, and then
561
00:28:37,560 --> 00:28:40,120
the name of the program that you want to debug.
562
00:28:40,120 --> 00:28:42,210
So, in this case, . /buggy.
563
00:28:42,210 --> 00:28:44,010
So you don't mention your c-file.
564
00:28:44,010 --> 00:28:46,650
You mention your already-compiled code.
565
00:28:46,650 --> 00:28:52,230
And what this debugger is going to let me do is, most powerfully,
566
00:28:52,230 --> 00:28:54,930
walk through my code step-by-step.
567
00:28:54,930 --> 00:28:58,930
Because every program we've written thus far, runs from start to finish,
568
00:28:58,930 --> 00:29:02,325
even if I'm not done thinking through each step at a time.
569
00:29:02,325 --> 00:29:05,850
With a debugger, I can actually click on a line number
570
00:29:05,850 --> 00:29:09,180
and say pause execution here, and the debugger
571
00:29:09,180 --> 00:29:14,130
will let me walk through my code one step at a time, one second at a time,
572
00:29:14,130 --> 00:29:16,740
one minute at a time, at my own human pace.
573
00:29:16,740 --> 00:29:19,470
Which is super compelling when the programs get more complicated
574
00:29:19,470 --> 00:29:22,600
and they might, otherwise, fly by on the screen.
575
00:29:22,600 --> 00:29:25,860
So I'm going to click to the left of line 5.
576
00:29:25,860 --> 00:29:27,970
And notice that these little red dots appear.
577
00:29:27,970 --> 00:29:31,290
And if I click on one it stays, and gets even redder.
578
00:29:31,290 --> 00:29:34,230
And I'm going to run debug50 on ./buggy.
579
00:29:34,230 --> 00:29:39,090
And in just a moment, you'll see that a new panel opens on the left hand side.
580
00:29:39,090 --> 00:29:41,910
It's doing some configuration of the screen.
581
00:29:41,910 --> 00:29:46,690
Let me zoom out a little bit here so we can see more on the screen at once.
582
00:29:46,690 --> 00:29:50,440
And sometimes, you'll see in VS Code that debug console opens up,
583
00:29:50,440 --> 00:29:54,480
which looks very cryptic, just go back to terminal window if that happens.
584
00:29:54,480 --> 00:29:57,875
Because at the terminal window is where you can still interact with your code.
585
00:29:57,875 --> 00:30:00,120
And let's now take a look at what's going on.
586
00:30:00,120 --> 00:30:04,650
If I zoom in on my buggy.c code here, you'll
587
00:30:04,650 --> 00:30:10,890
notice that we have the same program as before, but highlighted in yellow
588
00:30:10,890 --> 00:30:11,820
is line 5.
589
00:30:11,820 --> 00:30:15,660
Not a coincidence, that's the line I set a so-called breakpoint at.
590
00:30:15,660 --> 00:30:20,400
The little red dot means break here, pause execution here.
591
00:30:20,400 --> 00:30:23,716
And the yellow line has not yet been executed.
592
00:30:23,716 --> 00:30:27,600
But if I, now, at the top of my screen, notice these little arrows.
593
00:30:27,600 --> 00:30:28,750
There's one for Play.
594
00:30:28,750 --> 00:30:30,750
There's one for this, which, if I hover over it,
595
00:30:30,750 --> 00:30:34,140
says Step Over, there's another that's going to say Step Into,
596
00:30:34,140 --> 00:30:35,820
there's a third that says Step Out.
597
00:30:35,820 --> 00:30:38,520
I'm just going to use the first of these, Step Over.
598
00:30:38,520 --> 00:30:41,580
And I'm going to do this, and you'll see that the yellow highlight
599
00:30:41,580 --> 00:30:45,660
moved from line 5 to line 7 because now it's ready,
600
00:30:45,660 --> 00:30:47,955
but hasn't yet printed out that hash.
601
00:30:47,955 --> 00:30:51,817
But the most powerful thing here, notice, is that top left here.
602
00:30:51,817 --> 00:30:54,150
It's a little cryptic, because there's a bunch of things
603
00:30:54,150 --> 00:30:56,910
going on that will make more sense over time, but at the top
604
00:30:56,910 --> 00:30:58,470
there's a section called variables.
605
00:30:58,470 --> 00:31:00,750
Below that, something called locals, which means
606
00:31:00,750 --> 00:31:02,820
local to my current function, main.
607
00:31:02,820 --> 00:31:07,410
And notice, there's my variable called i, and its current value is 0.
608
00:31:07,410 --> 00:31:12,810
So now, once I click Step Over again, watch what happens.
609
00:31:12,810 --> 00:31:15,660
We go from line 7 back to line 5.
610
00:31:15,660 --> 00:31:19,455
But look in the terminal window, one of the hashes has printed.
611
00:31:19,455 --> 00:31:22,050
But now, it's printed at my own pace.
612
00:31:22,050 --> 00:31:24,030
I can think through this step-by-step.
613
00:31:24,030 --> 00:31:26,340
Notice that i has not changed, yet.
614
00:31:26,340 --> 00:31:29,700
It's still 0 because the yellow highlighted line hasn't yet executed.
615
00:31:29,700 --> 00:31:34,140
But the moment I click Step Over, it's going to execute line 5.
616
00:31:34,140 --> 00:31:41,010
Now, notice at top left, i has become 1, and nothing has printed, yet,
617
00:31:41,010 --> 00:31:43,290
because now, highlighted is line 7.
618
00:31:43,290 --> 00:31:48,000
So if I click Step Over again, we'll see the hash.
619
00:31:48,000 --> 00:31:51,930
If I repeat this process at my own human, comfortable pace,
620
00:31:51,930 --> 00:31:57,040
I can see my variables changing, I can see output changing on the screen,
621
00:31:57,040 --> 00:31:59,902
and I can just think about should that have just happened.
622
00:31:59,902 --> 00:32:01,860
I can pause and give thought to what's actually
623
00:32:01,860 --> 00:32:06,240
going on without trying to race the computer and figure it all out at once.
624
00:32:06,240 --> 00:32:08,490
I'm going to go ahead and stop here because we already
625
00:32:08,490 --> 00:32:11,430
know what this particular problem is, and that brings me back
626
00:32:11,430 --> 00:32:12,720
to my default terminal window.
627
00:32:12,720 --> 00:32:16,180
But this debugger, let me disable the breakpoint now
628
00:32:16,180 --> 00:32:18,570
so it doesn't keep breaking, this debugger
629
00:32:18,570 --> 00:32:20,760
will be your friend moving forward in order
630
00:32:20,760 --> 00:32:25,290
to step through your code step-by-step, at your own pace to figure out
631
00:32:25,290 --> 00:32:26,820
where something has gone wrong.
632
00:32:26,820 --> 00:32:30,397
Printf is great, but it gets annoying if you have to constantly add print this,
633
00:32:30,397 --> 00:32:33,480
print this, print this, print this, recompile, rerun it, oh wait a minute,
634
00:32:33,480 --> 00:32:34,980
print this, print this.
635
00:32:34,980 --> 00:32:39,780
The debugger lets you do the equivalent, but automatically.
636
00:32:39,780 --> 00:32:45,960
Questions on this debugger, which you'll see all the more hands-on over time?
637
00:32:45,960 --> 00:32:47,430
Questions on debugger?
638
00:32:47,430 --> 00:32:48,554
Yeah?
639
00:32:48,554 --> 00:32:50,560
AUDIENCE: You were using a Step Over feature.
640
00:32:50,560 --> 00:32:53,303
What do the other features in the debugger--
641
00:32:53,303 --> 00:32:54,720
DAVID MALAN: Really good question.
642
00:32:54,720 --> 00:32:57,720
We'll see this before long, but those other buttons that I glossed over,
643
00:32:57,720 --> 00:33:02,460
step into and step out of, actually let you step into specific functions
644
00:33:02,460 --> 00:33:04,200
if I had any more than main.
645
00:33:04,200 --> 00:33:06,960
So if main called a function called something,
646
00:33:06,960 --> 00:33:10,380
and something called a function called something else, instead of just
647
00:33:10,380 --> 00:33:14,730
stepping over the entire execution of that function, I could step into it
648
00:33:14,730 --> 00:33:17,105
and walk through its lines of code one by one.
649
00:33:17,105 --> 00:33:19,020
So any time you have a problem set you're
650
00:33:19,020 --> 00:33:22,140
working on that has multiple functions, you can set a breakpoint in main,
651
00:33:22,140 --> 00:33:26,250
if you want, or you can set it inside of one of your additional functions
652
00:33:26,250 --> 00:33:29,130
to focus your attention only on that.
653
00:33:29,130 --> 00:33:32,640
And we'll see examples of that over time.
654
00:33:32,640 --> 00:33:33,780
All right, so what else?
655
00:33:33,780 --> 00:33:38,100
And what's the sort of, elephant in the room, so to speak,
656
00:33:38,100 --> 00:33:39,750
is actually a duck in this case.
657
00:33:39,750 --> 00:33:42,160
Why is there this duck and all of these ducks here?
658
00:33:42,160 --> 00:33:46,440
Well, it turns out, a third, genuinely recommended, debugging technique
659
00:33:46,440 --> 00:33:50,055
is talking through problems, talking through code with someone else.
660
00:33:50,055 --> 00:33:52,620
Now, in the absence of having a family member, or a friend,
661
00:33:52,620 --> 00:33:56,520
or a roommate who actually wants to hear you talk about code, of all things,
662
00:33:56,520 --> 00:34:01,320
generally, programmers turn to a rubber duck, or other inanimate objects
663
00:34:01,320 --> 00:34:03,360
if something animate is not available.
664
00:34:03,360 --> 00:34:06,760
The idea behind rubber duck debugging, so to speak,
665
00:34:06,760 --> 00:34:12,750
is that simply by looking at your code and talking it through, OK, on line 3,
666
00:34:12,750 --> 00:34:17,040
I'm starting a 4 loop and I'm initializing i to 0.
667
00:34:17,040 --> 00:34:18,990
OK, then, I'm printing out a hash.
668
00:34:18,990 --> 00:34:24,112
Just by talking through your code, step-by-step, invariably,
669
00:34:24,112 --> 00:34:26,820
finds you having the proverbial light bulb go off over your head,
670
00:34:26,820 --> 00:34:29,040
because you realize, wait a minute I just said something stupid,
671
00:34:29,040 --> 00:34:30,510
or I just said something wrong.
672
00:34:30,510 --> 00:34:34,500
And this is really just a proxy for any other human, teaching fellow, teacher
673
00:34:34,500 --> 00:34:36,060
or friend, colleague.
674
00:34:36,060 --> 00:34:38,440
But in the absence of any of those people in the room,
675
00:34:38,440 --> 00:34:40,357
you're welcome to take, on your way out today.
676
00:34:40,357 --> 00:34:44,280
One of these little, rubber ducks and consider using it, for real, any time
677
00:34:44,280 --> 00:34:47,820
you want to talk through one of your problems in CS50,
678
00:34:47,820 --> 00:34:49,140
or maybe life more generally.
679
00:34:49,140 --> 00:34:51,480
But having it there on your desk is just a way
680
00:34:51,480 --> 00:34:55,140
to help you hear illogic in what you think
681
00:34:55,140 --> 00:34:57,790
might, otherwise, be logical code.
682
00:34:57,790 --> 00:35:02,400
So printf, debugging, rubber-duck debugging are just three of the ways,
683
00:35:02,400 --> 00:35:05,207
you'll see over time, to get to the source of code
684
00:35:05,207 --> 00:35:06,790
that you will write that has mistakes.
685
00:35:06,790 --> 00:35:08,880
Which is going to happen, but it will empower you
686
00:35:08,880 --> 00:35:12,000
all the more to solve those mistakes.
687
00:35:12,000 --> 00:35:17,440
All right, any questions on debugging, in general, or these three techniques?
688
00:35:17,440 --> 00:35:17,940
Yeah?
689
00:35:17,940 --> 00:35:19,740
AUDIENCE: [INAUDIBLE]
690
00:35:19,740 --> 00:35:22,650
DAVID MALAN: What's the difference between Step Over and Step Into?
691
00:35:22,650 --> 00:35:25,980
At the moment, the only one that's applicable to the code I just wrote
692
00:35:25,980 --> 00:35:29,340
is Step Over, because it means step over each line of code.
693
00:35:29,340 --> 00:35:34,050
If, though, I had other functions that I had written in this program,
694
00:35:34,050 --> 00:35:39,300
maybe lower down in the file, I could step into those function calls
695
00:35:39,300 --> 00:35:41,469
and walk through them one at a time.
696
00:35:41,469 --> 00:35:43,650
So we'll come back to this with an actual example,
697
00:35:43,650 --> 00:35:46,230
but step into will allow me to do exactly that.
698
00:35:46,230 --> 00:35:49,210
In fact, this is a perfect segue to doing a little something like this.
699
00:35:49,210 --> 00:35:51,632
Let me go ahead and open up another file here.
700
00:35:51,632 --> 00:35:53,340
And, actually, we'll use the same, buggy.
701
00:35:53,340 --> 00:35:56,320
And we're going to write one other thing that's buggy, as well.
702
00:35:56,320 --> 00:36:00,000
Let me go up here and include, as before, cs50.h.
703
00:36:00,000 --> 00:36:03,780
Let me include stdio.h.
704
00:36:03,780 --> 00:36:05,520
Let me do int main(void).
705
00:36:05,520 --> 00:36:08,050
So all of this, I think, is correct, so far.
706
00:36:08,050 --> 00:36:11,280
And let's do this, let's give myself an int called i,
707
00:36:11,280 --> 00:36:14,530
and let's ask the user for a negative integer.
708
00:36:14,530 --> 00:36:17,300
This is not a function that exists, technically, yet.
709
00:36:17,300 --> 00:36:20,050
But I'm going to assume, for the sake of discussion, that it does.
710
00:36:20,050 --> 00:36:23,700
Then, I'm just going to print out, with %i and a new line,
711
00:36:23,700 --> 00:36:25,360
whatever the human typed in.
712
00:36:25,360 --> 00:36:28,320
So at this point in the story, my program, I think, is correct.
713
00:36:28,320 --> 00:36:30,930
Except for the fact that get negative int is not
714
00:36:30,930 --> 00:36:33,690
a function in the CS50 library or anywhere else.
715
00:36:33,690 --> 00:36:35,460
I'm going to need to invent it myself.
716
00:36:35,460 --> 00:36:41,310
So suppose, in this case, that I declare a function called get negative int.
717
00:36:41,310 --> 00:36:45,630
It's return type, so to speak, should be int, because, as its name suggests,
718
00:36:45,630 --> 00:36:48,360
I want to hand the user back in integer, and it's going
719
00:36:48,360 --> 00:36:50,310
to take no input to keep it simple.
720
00:36:50,310 --> 00:36:51,810
So I'm just going to say void there.
721
00:36:51,810 --> 00:36:54,810
No inputs, no special prompts, nothing like that.
722
00:36:54,810 --> 00:36:57,600
Let me, now, give myself some curly braces.
723
00:36:57,600 --> 00:37:00,510
And let me do something familiar, perhaps, from problem set 1.
724
00:37:00,510 --> 00:37:05,550
Let me give myself a variable, like n, and let me do the following
725
00:37:05,550 --> 00:37:07,320
within this block of code.
726
00:37:07,320 --> 00:37:13,590
Assign n the value of get int, asking the user for a negative integer using
727
00:37:13,590 --> 00:37:14,850
get int's own prompt.
728
00:37:14,850 --> 00:37:18,750
And I want to do this while n is less than 0, because I
729
00:37:18,750 --> 00:37:20,390
want to get a negative from the user.
730
00:37:20,390 --> 00:37:24,140
And recall, from having used this block in the past,
731
00:37:24,140 --> 00:37:27,770
I can now return n as the very last step to hand back
732
00:37:27,770 --> 00:37:31,790
whatever the user has typed in, so long as they cooperated and gave me
733
00:37:31,790 --> 00:37:33,750
an actual negative integer.
734
00:37:33,750 --> 00:37:36,710
Now, I've deliberately made a mistake here,
735
00:37:36,710 --> 00:37:39,080
and it's a subtle, silly, mathematical one,
736
00:37:39,080 --> 00:37:43,910
but let me compile this program after copying the prototype up to the top,
737
00:37:43,910 --> 00:37:45,380
so I don't make that mistake again.
738
00:37:45,380 --> 00:37:48,470
Let me do make buggy, Enter.
739
00:37:48,470 --> 00:37:50,720
And now, let me do ./buggy.
740
00:37:50,720 --> 00:37:54,020
I'll give it a negative integer, like negative 50.
741
00:37:54,020 --> 00:37:55,370
Uh-huh.
742
00:37:55,370 --> 00:37:59,330
That did not take.
743
00:37:59,330 --> 00:38:00,860
How about negative 5?
744
00:38:00,860 --> 00:38:02,060
No.
745
00:38:02,060 --> 00:38:04,500
How about 0?
746
00:38:04,500 --> 00:38:05,000
All right.
747
00:38:05,000 --> 00:38:09,080
So it's, clearly, working backwards, or incorrectly here, logically.
748
00:38:09,080 --> 00:38:10,800
So how could I go about debugging this?
749
00:38:10,800 --> 00:38:12,425
Well, I could do what I've done before?
750
00:38:12,425 --> 00:38:18,920
I could use my printf technique and say something explicit like n is %i,
751
00:38:18,920 --> 00:38:25,310
new line, comma n, just to print it out, let me recompile buggy,
752
00:38:25,310 --> 00:38:28,640
let me rerun buggy, let me type in negative 50.
753
00:38:28,640 --> 00:38:30,630
OK, n is negative 50.
754
00:38:30,630 --> 00:38:33,173
So that didn't really help me at this point,
755
00:38:33,173 --> 00:38:34,590
because that's the same as before.
756
00:38:34,590 --> 00:38:38,030
So let me do this, debug50, ./buggy.
757
00:38:38,030 --> 00:38:39,870
Oh, but I've made a mistake.
758
00:38:39,870 --> 00:38:41,700
So I didn't set my breakpoint, yet.
759
00:38:41,700 --> 00:38:44,930
So let me do this, and I'll set a breakpoint this time.
760
00:38:44,930 --> 00:38:47,330
I could set it here, on line 8.
761
00:38:47,330 --> 00:38:49,340
Let's do it in main, as before.
762
00:38:49,340 --> 00:38:51,530
Let me rerun debug50, now.
763
00:38:51,530 --> 00:38:52,970
On ./buggy.
764
00:38:52,970 --> 00:38:55,190
That fancy user interface is going to pop up.
765
00:38:55,190 --> 00:38:58,310
It's going to highlight the line that I set the breakpoint on.
766
00:38:58,310 --> 00:39:01,250
Notice that, on the left hand side of the screen,
767
00:39:01,250 --> 00:39:04,650
i is defaulting, at the moment to 0, because I haven't typed anything in,
768
00:39:04,650 --> 00:39:05,150
yet.
769
00:39:05,150 --> 00:39:10,815
But let me, now, Step Over this line that's highlighted in yellow,
770
00:39:10,815 --> 00:39:12,440
and you'll see that I'm being prompted.
771
00:39:12,440 --> 00:39:16,220
So let's type in my negative 50, Enter.
772
00:39:16,220 --> 00:39:21,470
Notice now that I'm stuck in that function.
773
00:39:21,470 --> 00:39:22,250
All right.
774
00:39:22,250 --> 00:39:26,520
So clearly, the issue seems to be in my get negative int function.
775
00:39:26,520 --> 00:39:30,120
So, OK, let me stop this execution.
776
00:39:30,120 --> 00:39:33,175
My problem doesn't seem to be in main, per se, maybe it's down here.
777
00:39:33,175 --> 00:39:33,800
So that's fine.
778
00:39:33,800 --> 00:39:35,990
Let me set my same breakpoint at line 8.
779
00:39:35,990 --> 00:39:38,510
Let me rerun debug50 one more time.
780
00:39:38,510 --> 00:39:43,110
But this time, instead of just stepping over that line, let's step into it.
781
00:39:43,110 --> 00:39:45,410
So notice line 8 is, again, highlighted in yellow.
782
00:39:45,410 --> 00:39:47,690
In the past I've been clicking Step Over.
783
00:39:47,690 --> 00:39:50,180
Let's click Step into, now.
784
00:39:50,180 --> 00:39:53,480
When I click Step Into, boom, now, the debugger
785
00:39:53,480 --> 00:39:56,390
jumps into that specific function.
786
00:39:56,390 --> 00:39:59,330
Now, I can step through these lines of code, again and again.
787
00:39:59,330 --> 00:40:01,700
I can see what the value of n is as I'm typing it in.
788
00:40:01,700 --> 00:40:03,500
I can think through my logic, and voila.
789
00:40:03,500 --> 00:40:07,640
Hopefully, once I've solved the issue, I can exit the debugger, fix my code,
790
00:40:07,640 --> 00:40:09,180
and move on.
791
00:40:09,180 --> 00:40:12,050
So Step Over just goes over the line, but executes it,
792
00:40:12,050 --> 00:40:17,210
Step Into lets you go into other functions you've written.
793
00:40:17,210 --> 00:40:19,400
So let's go ahead and do this.
794
00:40:19,400 --> 00:40:23,550
We've got a bunch of possible approaches that we
795
00:40:23,550 --> 00:40:25,550
can take to solving some problems let's go ahead
796
00:40:25,550 --> 00:40:26,730
and pace ourselves today, though.
797
00:40:26,730 --> 00:40:27,900
Let's take a five-minute break, here.
798
00:40:27,900 --> 00:40:30,688
And when we come back, we'll take a look at that computer's memory
799
00:40:30,688 --> 00:40:31,730
we've been talking about.
800
00:40:31,730 --> 00:40:32,950
See you in five.
801
00:40:32,950 --> 00:40:36,380
All right.
802
00:40:36,380 --> 00:40:41,000
So let's dive back in.
803
00:40:41,000 --> 00:40:46,860
Up until now, both, by way of week 1 and problems set 1, for the most part,
804
00:40:46,860 --> 00:40:50,660
we've just translated from Scratch into C all of these basic building blocks,
805
00:40:50,660 --> 00:40:53,700
like loops and conditionals, Boolean expressions, variables.
806
00:40:53,700 --> 00:40:54,950
So sort of, more of the same.
807
00:40:54,950 --> 00:40:58,430
But there are features in C that we've already stumbled across already,
808
00:40:58,430 --> 00:41:02,300
like data types, the types of variables that doesn't exist in Scratch,
809
00:41:02,300 --> 00:41:04,450
but that, in fact, does exist in other languages.
810
00:41:04,450 --> 00:41:06,200
In fact, a few that we'll see before long.
811
00:41:06,200 --> 00:41:10,670
So to summarize the types we saw last week, recall this little list here.
812
00:41:10,670 --> 00:41:15,050
We had ints, and floats, and longs, and doubles, and chars,
813
00:41:15,050 --> 00:41:18,510
there's also Booles and also string, which we've seen a few times.
814
00:41:18,510 --> 00:41:21,830
But today, let's actually start to formalize what these things are,
815
00:41:21,830 --> 00:41:25,760
and actually what your Mac and PC are doing when you manipulate bits
816
00:41:25,760 --> 00:41:29,170
as an int versus a char, versus a string, versus something else.
817
00:41:29,170 --> 00:41:31,920
And see if we can't put more tools into your toolkit, so to speak,
818
00:41:31,920 --> 00:41:35,630
so we can start quickly writing more featureful, more sophisticated
819
00:41:35,630 --> 00:41:36,800
programs in C.
820
00:41:36,800 --> 00:41:40,640
So it turns out, that on most systems nowadays,
821
00:41:40,640 --> 00:41:43,010
though this can vary by actual computer, this
822
00:41:43,010 --> 00:41:46,040
is how large each of the data types, typically,
823
00:41:46,040 --> 00:41:51,590
is in C. When you store a Boolean value, a 0 or 1, a true, a false, or true,
824
00:41:51,590 --> 00:41:52,850
it actually uses 1 byte.
825
00:41:52,850 --> 00:41:55,100
That's a little excessive, because, strictly speaking,
826
00:41:55,100 --> 00:41:58,580
you only need 1 bit, which is 1/8 of this size.
827
00:41:58,580 --> 00:42:01,190
But for simplicity, computers use a whole byte
828
00:42:01,190 --> 00:42:03,740
to represent a Boole, true or false.
829
00:42:03,740 --> 00:42:08,040
A char, we saw last week, is only 1 byte, or 8 bits.
830
00:42:08,040 --> 00:42:12,950
And this is why ASCII, which uses 1 byte, or technically, only 7 bits early
831
00:42:12,950 --> 00:42:17,600
on, was confined to only 256 maximally possible characters.
832
00:42:17,600 --> 00:42:21,940
Notice that an int is 4 bytes, or 32 bits.
833
00:42:21,940 --> 00:42:24,580
A float is also 4 bytes or 32 bits.
834
00:42:24,580 --> 00:42:27,850
But the things that we call long, it's, literally, twice as long,
835
00:42:27,850 --> 00:42:29,710
8 bytes or 64 bits.
836
00:42:29,710 --> 00:42:30,430
So is a double.
837
00:42:30,430 --> 00:42:33,900
A double is 64 bits of precision for floating point values.
838
00:42:33,900 --> 00:42:37,215
And a string, for today, we're going to leave as a question mark.
839
00:42:37,215 --> 00:42:39,340
We'll come back to that, later today and next week,
840
00:42:39,340 --> 00:42:42,520
as to how much space a string takes up, but, suffice it to say,
841
00:42:42,520 --> 00:42:45,488
it's going to take up a variable amount of space,
842
00:42:45,488 --> 00:42:47,530
depending on whether the string is short or long.
843
00:42:47,530 --> 00:42:50,470
But we'll see exactly what that means, before long.
844
00:42:50,470 --> 00:42:55,030
So here's a photograph of a typical piece of memory
845
00:42:55,030 --> 00:42:57,760
inside of your Mac, or PC, or phone.
846
00:42:57,760 --> 00:43:00,160
Odds are, it might be a little smaller in some devices.
847
00:43:00,160 --> 00:43:02,950
This is known as RAM, or random access memory.
848
00:43:02,950 --> 00:43:05,410
Each of these little black chips on this circuit
849
00:43:05,410 --> 00:43:07,720
board, the green thing, these little black chips
850
00:43:07,720 --> 00:43:10,630
are where 0s and 1s are actually stored.
851
00:43:10,630 --> 00:43:12,670
Each of those stores some number of bytes.
852
00:43:12,670 --> 00:43:15,130
Maybe megabytes, maybe even gigabytes, nowadays.
853
00:43:15,130 --> 00:43:21,430
So let's focus on one of those chips, to give us a zoomed in version, thereof.
854
00:43:21,430 --> 00:43:25,390
Let's consider the fact that, even though we don't have to care, exactly ,
855
00:43:25,390 --> 00:43:29,470
how this kind of thing is made, if this is, like, 1 gigabyte of memory,
856
00:43:29,470 --> 00:43:31,930
for the sake of discussion, it stands to reason that,
857
00:43:31,930 --> 00:43:35,830
if this thing is storing 1 billion bytes, 1 gigabyte,
858
00:43:35,830 --> 00:43:38,110
then we can number them, arbitrarily.
859
00:43:38,110 --> 00:43:41,590
Maybe this will be byte 0, 1, 2, 3, 4, 5, 6, 7, 8.
860
00:43:41,590 --> 00:43:45,000
Then, maybe, way down here in the bottom right corner is byte number 1 billion.
861
00:43:45,000 --> 00:43:48,760
We can just number these things, as might be our convention.
862
00:43:48,760 --> 00:43:50,710
Let's draw that graphically.
863
00:43:50,710 --> 00:43:53,090
Not with a billion squares, but fewer than those.
864
00:43:53,090 --> 00:43:55,410
And let's zoom in further, and consider that.
865
00:43:55,410 --> 00:43:57,160
At this point in the story, let's abstract
866
00:43:57,160 --> 00:43:59,380
away all the hardware, and all the little wires,
867
00:43:59,380 --> 00:44:03,730
and just think of memory as taking up-- or, rather, just think of data
868
00:44:03,730 --> 00:44:06,170
as taking up some number of bytes.
869
00:44:06,170 --> 00:44:09,820
So, for instance, if you were to store a char in a computer's memory, which
870
00:44:09,820 --> 00:44:14,230
was 1 byte, it might be stored at this top left-hand location
871
00:44:14,230 --> 00:44:16,195
of this black chip of memory.
872
00:44:16,195 --> 00:44:20,290
If you were to store something like an integer that uses 4 bytes, well,
873
00:44:20,290 --> 00:44:23,560
it might use four of those bytes, but they're going to be contiguous
874
00:44:23,560 --> 00:44:25,220
back-to-back-to-back, in this case.
875
00:44:25,220 --> 00:44:29,270
If you were to store a long or a double, you might, actually, need 8 bytes.
876
00:44:29,270 --> 00:44:31,390
So I'm filling in these squares to represent
877
00:44:31,390 --> 00:44:36,160
how much memory and given variable of some data type would take up.
878
00:44:36,160 --> 00:44:39,230
1, or 4, or 8, in this case, here.
879
00:44:39,230 --> 00:44:42,160
Well, from here, let's abstract away from all of the hardware
880
00:44:42,160 --> 00:44:44,320
and really focus on memory as being a grid.
881
00:44:44,320 --> 00:44:47,650
Or, really, like a canvas that we can paint any types of data
882
00:44:47,650 --> 00:44:48,850
onto that we want.
883
00:44:48,850 --> 00:44:52,600
At the end of the day, all of this data is just going to be 0s and 1s.
884
00:44:52,600 --> 00:44:56,500
But it's up to you and I to build abstractions on top of that.
885
00:44:56,500 --> 00:45:00,130
Things like actual numbers, colors, images, movies, and beyond.
886
00:45:00,130 --> 00:45:02,440
But we'll start lower-level, here, first.
887
00:45:02,440 --> 00:45:05,950
Suppose I had a program that needs three integers.
888
00:45:05,950 --> 00:45:08,800
A simple program whose purpose in life is to average your three
889
00:45:08,800 --> 00:45:12,400
scores on an exam, or some such thing.
890
00:45:12,400 --> 00:45:17,020
Suppose that your three scores were these, 72, 73, not too bad, and 33,
891
00:45:17,020 --> 00:45:18,145
which is particularly low.
892
00:45:18,145 --> 00:45:23,030
Let's write a program that does this kind of averaging for us.
893
00:45:23,030 --> 00:45:24,860
Let me go back to VS Code, here.
894
00:45:24,860 --> 00:45:28,270
Let me open up a file called scores.c.
895
00:45:28,270 --> 00:45:30,830
Let me implement this as follows.
896
00:45:30,830 --> 00:45:35,860
Let me include stdio.h at the top, int main(void) as before.
897
00:45:35,860 --> 00:45:41,320
Then, inside of main, let me declare score 1, which is 72.
898
00:45:41,320 --> 00:45:43,990
Give me another score, 73.
899
00:45:43,990 --> 00:45:47,140
Then, a third score, called score 3, which is going to be 33.
900
00:45:47,140 --> 00:45:50,740
Now, I'm going to use printf to print out the average of those things,
901
00:45:50,740 --> 00:45:52,520
and I can do this in a few different ways.
902
00:45:52,520 --> 00:45:57,850
But I'm going to print out %f, and I'm going to do score 1, plus score 2,
903
00:45:57,850 --> 00:46:03,760
plus score 3, divided by 3, close parentheses semicolon.
904
00:46:03,760 --> 00:46:07,300
Some relatively simple arithmetic to compute the average of three scores,
905
00:46:07,300 --> 00:46:10,570
if I'm curious what my average grade is in the class with these three
906
00:46:10,570 --> 00:46:11,620
assessments.
907
00:46:11,620 --> 00:46:15,616
Let me, now, do make scores.
908
00:46:15,616 --> 00:46:19,240
All right, so I've somehow made an error already.
909
00:46:19,240 --> 00:46:25,150
But this one is, actually, germane to a problem we, hopefully,
910
00:46:25,150 --> 00:46:26,860
won't encounter too frequently.
911
00:46:26,860 --> 00:46:27,860
What's going on here?
912
00:46:27,860 --> 00:46:31,360
So underlined to score 1, plus score 2, plus score 3, divided by 3.
913
00:46:31,360 --> 00:46:36,250
Format specifies type double, but the argument has type int, well,
914
00:46:36,250 --> 00:46:38,530
what's going on here?
915
00:46:38,530 --> 00:46:40,430
Because the arithmetic seems to check out.
916
00:46:40,430 --> 00:46:40,930
Yeah?
917
00:46:40,930 --> 00:46:44,560
AUDIENCE: So the computer is doing the math, but they basically [INAUDIBLE]
918
00:46:44,560 --> 00:46:49,260
just gives out a value at the end because, well [INAUDIBLE]
919
00:46:49,260 --> 00:46:50,210
DAVID MALAN: Correct.
920
00:46:50,210 --> 00:46:51,640
And we'll come back to this in more detail,
921
00:46:51,640 --> 00:46:54,522
but, indeed, what's happening here is I'm adding three ints together,
922
00:46:54,522 --> 00:46:56,480
obviously, because I define them right up here.
923
00:46:56,480 --> 00:46:59,470
And I'm dividing by another int, 3, but the catch
924
00:46:59,470 --> 00:47:03,890
is, recall that C when it performs math, treats all of these things as integers.
925
00:47:03,890 --> 00:47:05,810
But integers are not floating point value.
926
00:47:05,810 --> 00:47:08,890
So if you actually want to get a precise, average for your score
927
00:47:08,890 --> 00:47:12,760
without throwing away the remainder, everything after the decimal point,
928
00:47:12,760 --> 00:47:15,430
it turns out, we're going to have to--
929
00:47:15,430 --> 00:47:17,410
we're going to-- aww--
930
00:47:17,410 --> 00:47:18,430
we're going to have to--
931
00:47:18,430 --> 00:47:22,720
[LAUGHTER] we're going to have to convert this whole expression, somehow,
932
00:47:22,720 --> 00:47:23,350
to a float.
933
00:47:23,350 --> 00:47:26,230
And there's a few ways to do this but the easiest way,
934
00:47:26,230 --> 00:47:28,540
for now, I'm going to go ahead and do this up here,
935
00:47:28,540 --> 00:47:31,360
I'm going to change the divide by 3 to divide by 3.0.
936
00:47:31,360 --> 00:47:35,440
Because it turns out, long story short, in C, so long as one of the values
937
00:47:35,440 --> 00:47:37,300
participating in an arithmetic expression
938
00:47:37,300 --> 00:47:39,730
like this is something like a float, the rest
939
00:47:39,730 --> 00:47:44,210
will be treated as promoted to a floating point value as well.
940
00:47:44,210 --> 00:47:49,495
So let me, now, recompile this code with make scores, Enter.
941
00:47:49,495 --> 00:47:53,500
This time it worked OK, because I'm treating a float as a float.
942
00:47:53,500 --> 00:47:55,600
Let me do . /scores, Enter.
943
00:47:55,600 --> 00:48:00,150
All right, my average is 59.33333 and so forth.
944
00:48:00,150 --> 00:48:00,650
All right.
945
00:48:00,650 --> 00:48:03,340
So the math, presumably, checks out.
946
00:48:03,340 --> 00:48:06,220
Floating point imprecision per last week aside.
947
00:48:06,220 --> 00:48:09,280
But let's consider the design of this program.
948
00:48:09,280 --> 00:48:16,680
What is, kind of, bad about it, or if we maintain this program longer term,
949
00:48:16,680 --> 00:48:19,480
are we going to regret the design of this program?
950
00:48:19,480 --> 00:48:20,990
What might not be ideal here?
951
00:48:20,990 --> 00:48:21,490
Yeah?
952
00:48:21,490 --> 00:48:30,364
AUDIENCE: [INAUDIBLE]
953
00:48:30,364 --> 00:48:34,220
DAVID MALAN: Yeah, so in this case, I have hard coded my three scores.
954
00:48:34,220 --> 00:48:37,140
So, if I'm hearing you correctly, this program
955
00:48:37,140 --> 00:48:39,600
is only ever going to tell me this specific average.
956
00:48:39,600 --> 00:48:41,730
I'm not even using something like, get int
957
00:48:41,730 --> 00:48:44,790
or get float to get three different scores, so that's not good.
958
00:48:44,790 --> 00:48:46,942
And suppose that we wait later in the semester,
959
00:48:46,942 --> 00:48:48,400
I think other problems could arise.
960
00:48:48,400 --> 00:48:48,900
Yeah?
961
00:48:48,900 --> 00:48:51,020
AUDIENCE: Just thinking also somewhat of an issue
962
00:48:51,020 --> 00:48:52,900
that you can't reuse that number.
963
00:48:52,900 --> 00:48:55,450
DAVID MALAN: I can't reuse the number because I
964
00:48:55,450 --> 00:48:59,088
haven't stored the average in some variable, which in this program, not
965
00:48:59,088 --> 00:49:01,630
a big deal, but certainly, if I wanted to reuse it elsewhere,
966
00:49:01,630 --> 00:49:02,650
that's a problem.
967
00:49:02,650 --> 00:49:05,025
Let's fast-forward again, a little later in the semester,
968
00:49:05,025 --> 00:49:07,390
I don't just have three test scores or exam scores,
969
00:49:07,390 --> 00:49:09,430
maybe I have 4, or 5, or 6.
970
00:49:09,430 --> 00:49:10,690
Where might this take us?
971
00:49:10,690 --> 00:49:12,301
AUDIENCE: Yeah, if you ever want to have to take
972
00:49:12,301 --> 00:49:14,900
the average of any number of scores other than 3, [INAUDIBLE]
973
00:49:14,900 --> 00:49:18,110
DAVID MALAN: Yeah, I've sort of, capped this program at 3.
974
00:49:18,110 --> 00:49:20,942
And honestly, this is, kind of, bordering on copy paste.
975
00:49:20,942 --> 00:49:23,900
Even though the variables, yes, have different names; score 1, score 2,
976
00:49:23,900 --> 00:49:24,800
score 3.
977
00:49:24,800 --> 00:49:27,230
Imagine doing this for a whole grade book for a class.
978
00:49:27,230 --> 00:49:32,990
Having to score 4, 5, 6, 11 10, 12, 20, 30, that's a lot of variables.
979
00:49:32,990 --> 00:49:35,420
You can imagine just how ugly the code starts
980
00:49:35,420 --> 00:49:38,635
to get if you're just defining variable after variable, after variable.
981
00:49:38,635 --> 00:49:42,740
So it turns out, there are better ways, in languages like C,
982
00:49:42,740 --> 00:49:47,240
if you want to have multiple values stored in memory that
983
00:49:47,240 --> 00:49:49,040
happened to be of the same data type.
984
00:49:49,040 --> 00:49:50,420
Let's take a look back at this memory, here,
985
00:49:50,420 --> 00:49:52,545
to see what these things might look like in memory.
986
00:49:52,545 --> 00:49:54,170
Here's that grid of memory.
987
00:49:54,170 --> 00:49:56,450
Each of these recall represents a byte.
988
00:49:56,450 --> 00:49:59,690
To be clear, if I store score 1 in memory first,
989
00:49:59,690 --> 00:50:01,130
how many bytes will it take up?
990
00:50:01,130 --> 00:50:02,520
AUDIENCE: [INAUDIBLE]
991
00:50:02,520 --> 00:50:03,650
DAVID MALAN: So 4, a.k.a.
992
00:50:03,650 --> 00:50:04,430
32 bits.
993
00:50:04,430 --> 00:50:08,578
So I might draw a score 1 as filling up this part of the memory.
994
00:50:08,578 --> 00:50:11,870
It's up to the computer as to whether it goes here, or down there, or wherever.
995
00:50:11,870 --> 00:50:15,290
I'm just keeping the pictures clean for today, from the top-left on down.
996
00:50:15,290 --> 00:50:18,080
If I, then, declare another variable, called score 2,
997
00:50:18,080 --> 00:50:20,730
it might end up over there, also taking up 4 bytes.
998
00:50:20,730 --> 00:50:23,330
And then score 3 might end up here.
999
00:50:23,330 --> 00:50:26,880
So that's just representing what's going on inside of the computer's memory.
1000
00:50:26,880 --> 00:50:30,680
But technically speaking, to be clear, per week 0, what's
1001
00:50:30,680 --> 00:50:34,580
really being stored in the computer's memory, are patterns of 0s and 1s.
1002
00:50:34,580 --> 00:50:39,350
32 total, in this case, because 32 bits is 4 bytes.
1003
00:50:39,350 --> 00:50:43,280
But again, it gets boring quickly to think in and look
1004
00:50:43,280 --> 00:50:44,760
at binary all the time.
1005
00:50:44,760 --> 00:50:47,120
So we'll, generally, abstract this away as just using
1006
00:50:47,120 --> 00:50:49,550
decimal numbers, in this case, instead.
1007
00:50:49,550 --> 00:50:54,170
But there might be a better way to store, not just three of these things,
1008
00:50:54,170 --> 00:50:57,500
but maybe four, maybe, five, maybe 10, maybe, more,
1009
00:50:57,500 --> 00:51:03,110
by declaring one variable to store all of them, instead of 3, or 4, or 5,
1010
00:51:03,110 --> 00:51:05,750
or more individual variables.
1011
00:51:05,750 --> 00:51:10,250
The way to do this is by way of something known as an array.
1012
00:51:10,250 --> 00:51:18,320
An array is another type of data that allows you to store multiple values
1013
00:51:18,320 --> 00:51:20,980
of the same type back-to-back-to-back.
1014
00:51:20,980 --> 00:51:22,230
That is, to say, contiguously.
1015
00:51:22,230 --> 00:51:29,840
So an array can let you create memory for one int, or two, or three,
1016
00:51:29,840 --> 00:51:32,600
or even more than that, but describe them
1017
00:51:32,600 --> 00:51:36,390
all using the same variable name, the same one name.
1018
00:51:36,390 --> 00:51:40,740
So for instance, if, for one program, I only need three integers,
1019
00:51:40,740 --> 00:51:45,800
but I don't want to messily declare them as score 1, score 2, score 3,
1020
00:51:45,800 --> 00:51:46,960
I can do this, instead.
1021
00:51:46,960 --> 00:51:49,130
This is today's first new piece of syntax,
1022
00:51:49,130 --> 00:51:51,290
the square brackets that we're now seeing.
1023
00:51:51,290 --> 00:51:57,140
This line of code, here, is similar to int score 1 semicolon,
1024
00:51:57,140 --> 00:52:00,360
or int score 1 equals 72 semicolon.
1025
00:52:00,360 --> 00:52:05,780
This line of code is declaring for me, so to speak, an array of size 3.
1026
00:52:05,780 --> 00:52:09,260
And that array is going to store three integers.
1027
00:52:09,260 --> 00:52:09,770
Why?
1028
00:52:09,770 --> 00:52:14,990
Because the type of that array is an int, here.
1029
00:52:14,990 --> 00:52:18,110
The square brackets tell the computer how many ints you want.
1030
00:52:18,110 --> 00:52:18,980
In this case, 3.
1031
00:52:18,980 --> 00:52:21,140
And the name is, of course, scores.
1032
00:52:21,140 --> 00:52:23,540
Which, in English, I've deliberately pluralized
1033
00:52:23,540 --> 00:52:28,100
so that I can describe this array as storing multiple scores, indeed.
1034
00:52:28,100 --> 00:52:32,970
So if I want to now assign values to this variable, called scores,
1035
00:52:32,970 --> 00:52:34,760
I can do code like this.
1036
00:52:34,760 --> 00:52:40,160
I can say, scores bracket 0 equals 72, scores bracket 1 equals 73,
1037
00:52:40,160 --> 00:52:42,190
and scores bracket 2 equals 33.
1038
00:52:42,190 --> 00:52:43,940
The only thing weird there is, admittedly,
1039
00:52:43,940 --> 00:52:45,830
the square brackets which are still new.
1040
00:52:45,830 --> 00:52:49,820
But we're also, notice, 0 indexing things.
1041
00:52:49,820 --> 00:52:52,345
To zero index means to start counting at 0.
1042
00:52:52,345 --> 00:52:54,470
When we've talked about that before, our four loops
1043
00:52:54,470 --> 00:52:56,000
have, generally, been zero indexed.
1044
00:52:56,000 --> 00:52:59,870
Arrays in C are zero indexed.
1045
00:52:59,870 --> 00:53:01,430
And you do not have choice over that.
1046
00:53:01,430 --> 00:53:04,550
You can't start counting at 1 in arrays because you prefer to,
1047
00:53:04,550 --> 00:53:06,830
you'd be sacrificing one of the elements.
1048
00:53:06,830 --> 00:53:09,620
You have to start in arrays counting from 0.
1049
00:53:09,620 --> 00:53:13,130
So out of context, this doesn't solve a problem,
1050
00:53:13,130 --> 00:53:15,200
but it, definitely, is going to once we have more
1051
00:53:15,200 --> 00:53:16,910
than, even, three scores here.
1052
00:53:16,910 --> 00:53:19,750
In fact, let me change this program a little bit.
1053
00:53:19,750 --> 00:53:21,450
Let me go back to VS Code.
1054
00:53:21,450 --> 00:53:24,020
And delete these three lines, here.
1055
00:53:24,020 --> 00:53:27,080
And replace it with a scores variable that's
1056
00:53:27,080 --> 00:53:30,140
ready to store three total integers.
1057
00:53:30,140 --> 00:53:34,130
And then, initialize them as follows, scores bracket 0 is 72,
1058
00:53:34,130 --> 00:53:38,300
as before, scores bracket 1 is going to be 73, scores bracket 2
1059
00:53:38,300 --> 00:53:39,740
is going to be 33.
1060
00:53:39,740 --> 00:53:44,068
Notice, I do not need to say int before any of these lines,
1061
00:53:44,068 --> 00:53:45,860
because that's been taken care of, already,
1062
00:53:45,860 --> 00:53:50,570
for me on line 5, where I already specified that everything in this array
1063
00:53:50,570 --> 00:53:53,330
is going to be an int.
1064
00:53:53,330 --> 00:53:57,020
Now, down here, this code needs to change because I no longer have
1065
00:53:57,020 --> 00:53:59,300
three variables, score 1, 2, and 3.
1066
00:53:59,300 --> 00:54:03,950
I have 1 variable, but that I can index into.
1067
00:54:03,950 --> 00:54:08,750
I'm going to, here, then, do scores bracket 0, plus scores bracket 1,
1068
00:54:08,750 --> 00:54:13,370
plus scores bracket 2, which is equivalent to what I did earlier,
1069
00:54:13,370 --> 00:54:14,900
giving me back those three integers.
1070
00:54:14,900 --> 00:54:17,860
But notice, I'm using the same variable name, every time.
1071
00:54:17,860 --> 00:54:21,070
And again, I'm using this new square bracket notation to, quote-unquote,
1072
00:54:21,070 --> 00:54:26,590
index into the array to get at the first int, the second int, and the third,
1073
00:54:26,590 --> 00:54:28,840
and then, to do it again down here.
1074
00:54:28,840 --> 00:54:31,907
Now, this program, still not really solving all the problems we describe,
1075
00:54:31,907 --> 00:54:34,240
I still can only store three scores, but we'll come back
1076
00:54:34,240 --> 00:54:35,930
to something like that before long.
1077
00:54:35,930 --> 00:54:38,950
But for now, we're just introducing a new syntax and a new feature,
1078
00:54:38,950 --> 00:54:44,980
whereby, I can now store multiple values in the same variable.
1079
00:54:44,980 --> 00:54:47,110
Well, let's enhance this a bit more.
1080
00:54:47,110 --> 00:54:50,660
Instead of hard coding these scores, as was identified as a problem,
1081
00:54:50,660 --> 00:54:54,790
let's use get int to ask the user for a score.
1082
00:54:54,790 --> 00:54:58,330
Let's, then, use get int to ask the user for another score.
1083
00:54:58,330 --> 00:55:01,540
Let's use get int to ask the user for a third score,
1084
00:55:01,540 --> 00:55:04,400
storing them in those respective locations.
1085
00:55:04,400 --> 00:55:09,820
And, now, if I go ahead and save this program, recompile scores, huh.
1086
00:55:09,820 --> 00:55:10,900
I've messed up, here.
1087
00:55:10,900 --> 00:55:13,990
Now these errors should be getting a little familiar.
1088
00:55:13,990 --> 00:55:16,750
What mistake did I make?
1089
00:55:16,750 --> 00:55:17,875
Let me give folks a moment.
1090
00:55:17,875 --> 00:55:18,970
AUDIENCE: cs50.h
1091
00:55:18,970 --> 00:55:21,100
DAVID MALAN: cs50.h.
1092
00:55:21,100 --> 00:55:24,220
That was not intentional, so still making mistakes all these years later.
1093
00:55:24,220 --> 00:55:26,320
I need to include cs50.h.
1094
00:55:26,320 --> 00:55:29,570
Now, I'm going to go back to the bottom in the terminal window, make scores.
1095
00:55:29,570 --> 00:55:30,070
OK.
1096
00:55:30,070 --> 00:55:31,670
We're back in business, ./scores.
1097
00:55:31,670 --> 00:55:33,920
Now, the program is getting a little more interesting.
1098
00:55:33,920 --> 00:55:38,020
So maybe, this year was better and I got a 100, and a 99, and a 98, and there,
1099
00:55:38,020 --> 00:55:40,900
my average is 99.0000.
1100
00:55:40,900 --> 00:55:42,370
So now, it's a little more dynamic.
1101
00:55:42,370 --> 00:55:43,270
It's a little more interesting.
1102
00:55:43,270 --> 00:55:45,978
But it's still capping the number of scores at three, admittedly.
1103
00:55:45,978 --> 00:55:50,740
But now, I've introduced another, sort of, symptom of bad programming.
1104
00:55:50,740 --> 00:55:54,108
There's this expression in programming, too, called code smell, where like--
1105
00:55:54,108 --> 00:55:55,900
[SNIFFS AIR] something smells a little off.
1106
00:55:55,900 --> 00:56:00,550
And there's something off here in that I could do better with this code.
1107
00:56:00,550 --> 00:56:05,080
Does anyone see an opportunity to improve the design of this code, here,
1108
00:56:05,080 --> 00:56:08,230
if my goal, still, is to get three scores from the user but [SNIFF SNIFF]
1109
00:56:08,230 --> 00:56:10,430
without it smelling [SNIFF] kind of bad?
1110
00:56:10,430 --> 00:56:10,930
Yeah?
1111
00:56:10,930 --> 00:56:12,940
AUDIENCE: [INAUDIBLE] use a 4 loop?
1112
00:56:12,940 --> 00:56:15,958
That way you don't have to copy and paste all of those scores.
1113
00:56:15,958 --> 00:56:17,160
DAVID MALAN: Yeah, exactly.
1114
00:56:17,160 --> 00:56:19,022
Those lines of code are almost identical.
1115
00:56:19,022 --> 00:56:21,480
And honestly, the only thing that's changing is the number,
1116
00:56:21,480 --> 00:56:23,100
and it's just incrementing by 1.
1117
00:56:23,100 --> 00:56:25,330
We have all of the building blocks to do this better.
1118
00:56:25,330 --> 00:56:27,130
So let me go ahead and improve this.
1119
00:56:27,130 --> 00:56:29,560
Let me delete that code.
1120
00:56:29,560 --> 00:56:31,720
Let me, now, have a 4 loop.
1121
00:56:31,720 --> 00:56:36,150
So for int i get 0, i less than 3, i plus plus.
1122
00:56:36,150 --> 00:56:39,060
Then, inside of this 4 loop, I can distill all three
1123
00:56:39,060 --> 00:56:40,860
of those lines into something more generic,
1124
00:56:40,860 --> 00:56:46,530
like scores bracket i equals get int, and now, ask the user, just
1125
00:56:46,530 --> 00:56:48,905
once, via get int, for a score.
1126
00:56:48,905 --> 00:56:52,000
So this is where arrays start to get pretty powerful.
1127
00:56:52,000 --> 00:56:54,000
You don't have to hard code, that is, literally,
1128
00:56:54,000 --> 00:56:56,462
type in all of these magic numbers like 0, 1, and 2.
1129
00:56:56,462 --> 00:56:58,170
You can start to do it, programmatically,
1130
00:56:58,170 --> 00:56:59,770
as you propose with a loop.
1131
00:56:59,770 --> 00:57:01,350
So now, I've tightened things up.
1132
00:57:01,350 --> 00:57:04,230
I'm now, dynamically, getting three different scores,
1133
00:57:04,230 --> 00:57:06,766
but putting them in three different locations.
1134
00:57:06,766 --> 00:57:10,470
And so this program, ultimately, is going to work, pretty much, the same.
1135
00:57:10,470 --> 00:57:17,520
Make scores, ./scores, and 100, 99, 98, and we're back to the same answer.
1136
00:57:17,520 --> 00:57:19,440
But it's a little better designed, too.
1137
00:57:19,440 --> 00:57:21,360
If I really want to nitpick, there's something
1138
00:57:21,360 --> 00:57:23,100
that still smells, a little bit, here.
1139
00:57:23,100 --> 00:57:27,540
The fact that I have indeed, this magic number three, that really
1140
00:57:27,540 --> 00:57:29,890
has to be the same as this number here.
1141
00:57:29,890 --> 00:57:32,170
Otherwise, who knows what's going to go wrong.
1142
00:57:32,170 --> 00:57:34,380
So what might be a solution, per last week,
1143
00:57:34,380 --> 00:57:36,960
to cleaning that code up further, too?
1144
00:57:36,960 --> 00:57:39,750
AUDIENCE: [INAUDIBLE] the user's discretion
1145
00:57:39,750 --> 00:57:41,742
how many input scores [INAUDIBLE].
1146
00:57:41,742 --> 00:57:44,790
DAVID MALAN: OK, so we could leave it up to the user's discretion.
1147
00:57:44,790 --> 00:57:47,500
And so we could, actually, do something like this.
1148
00:57:47,500 --> 00:57:49,200
Let me take this a few steps ahead.
1149
00:57:49,200 --> 00:57:56,230
Let me say something like, int n gets get int, how many scores question mark,
1150
00:57:56,230 --> 00:58:00,600
then I could actually change this to an n, and then this to an n,
1151
00:58:00,600 --> 00:58:02,970
and, indeed, make the whole program dynamic?
1152
00:58:02,970 --> 00:58:05,670
Ask the human how many tests have there been this semester?
1153
00:58:05,670 --> 00:58:07,500
Then, you can type in each of those scores
1154
00:58:07,500 --> 00:58:09,708
because the loop is going to iterate that many times.
1155
00:58:09,708 --> 00:58:13,020
And then you'll get the average of one test, two test, three--
1156
00:58:13,020 --> 00:58:17,520
well, lost another-- or however many scores that were actually
1157
00:58:17,520 --> 00:58:20,760
specified by the user Yeah, question?
1158
00:58:20,760 --> 00:58:25,765
AUDIENCE: How many bits or bytes get used in an array?
1159
00:58:25,765 --> 00:58:28,060
DAVID MALAN: How many bytes are used in an array?
1160
00:58:28,060 --> 00:58:32,524
AUDIENCE: [INAUDIBLE] point of doing this is to save [INAUDIBLE]
1161
00:58:32,524 --> 00:58:35,500
DAVID MALAN: So the purpose of an array is not to save space.
1162
00:58:35,500 --> 00:58:39,010
It's to eliminate having multiple variable names
1163
00:58:39,010 --> 00:58:40,900
because that gets very messy quickly.
1164
00:58:40,900 --> 00:58:44,980
If you have score 1, score 2, score 3, dot, dot, dot, score 99,
1165
00:58:44,980 --> 00:58:48,100
that's, like, 99 different variables, potentially,
1166
00:58:48,100 --> 00:58:54,160
that you could collapse into one variable that has 99 locations.
1167
00:58:54,160 --> 00:58:56,230
At different indices, or indexes.
1168
00:58:56,230 --> 00:58:58,570
As someone would say, the index for an array
1169
00:58:58,570 --> 00:59:00,756
is whatever is in the square brackets.
1170
00:59:00,756 --> 00:59:11,560
AUDIENCE: [INAUDIBLE]
1171
00:59:11,560 --> 00:59:13,280
DAVID MALAN: So it's a good question.
1172
00:59:13,280 --> 00:59:15,370
So if you-- I'm using ints for everything--
1173
00:59:15,370 --> 00:59:17,560
and honestly, we don't really need ints for scores
1174
00:59:17,560 --> 00:59:21,770
because I'm not likely to get a 2 billion on a test anytime soon.
1175
00:59:21,770 --> 00:59:23,620
And so you could use different data types.
1176
00:59:23,620 --> 00:59:26,287
And that list we had on the screen, earlier, is not all of them.
1177
00:59:26,287 --> 00:59:29,770
There's a data type called short, which is shorter than an int,
1178
00:59:29,770 --> 00:59:34,850
you could, technically, use char, in some form or other data types as well.
1179
00:59:34,850 --> 00:59:36,940
Generally speaking, in the year 2021, these
1180
00:59:36,940 --> 00:59:40,990
tend to be over optima-- overly optimized decisions.
1181
00:59:40,990 --> 00:59:42,940
Everyone just uses ints, even though no one
1182
00:59:42,940 --> 00:59:46,300
is going to get a test score that's 2 billion, or more, because int is just,
1183
00:59:46,300 --> 00:59:47,260
kind of, the go-to.
1184
00:59:47,260 --> 00:59:50,252
Years ago, memory was expensive.
1185
00:59:50,252 --> 00:59:52,210
And every one of your instincts would have been
1186
00:59:52,210 --> 00:59:54,700
spot on because memory is so tight.
1187
00:59:54,700 --> 00:59:56,930
But, nowadays, we don't worry as much about it.
1188
00:59:56,930 --> 00:59:57,430
Yeah?
1189
00:59:57,430 --> 01:00:02,556
AUDIENCE: I have a question about the error [INAUDIBLE]..
1190
01:00:02,556 --> 01:00:06,605
Could it-- when you're doing a hash problem on the problem set--
1191
01:00:06,605 --> 01:00:10,010
DAVID MALAN: So what is the difference between dividing two ints
1192
01:00:10,010 --> 01:00:12,380
and not getting an error, as you might have encountered
1193
01:00:12,380 --> 01:00:15,920
in a program like cash, versus dividing two ints
1194
01:00:15,920 --> 01:00:18,150
and getting an error like I did a moment ago?
1195
01:00:18,150 --> 01:00:22,280
The problem with the scenario I created a moment ago was printf was involved.
1196
01:00:22,280 --> 01:00:27,980
And I was telling printf to use a %f, but I was giving printf the result
1197
01:00:27,980 --> 01:00:30,580
of dividing integers by another integer.
1198
01:00:30,580 --> 01:00:32,930
So it was printf that was yelling at me.
1199
01:00:32,930 --> 01:00:35,930
I'm guessing in the scenario you're describing, for something like cash,
1200
01:00:35,930 --> 01:00:39,180
printf was not involved in that particular line of code.
1201
01:00:39,180 --> 01:00:40,865
So that's the difference, there.
1202
01:00:40,865 --> 01:00:41,660
All right.
1203
01:00:41,660 --> 01:00:45,110
So we, now, have this ability to create an array.
1204
01:00:45,110 --> 01:00:47,510
And an array can store multiple values.
1205
01:00:47,510 --> 01:00:51,450
What, then, might we do that's more interesting than just storing numbers
1206
01:00:51,450 --> 01:00:51,950
in memory?
1207
01:00:51,950 --> 01:00:54,230
Well, let's take this one step further.
1208
01:00:54,230 --> 01:01:01,130
As opposed to just storing 72, 73, 33 or 100, 99, 98, at these given locations,
1209
01:01:01,130 --> 01:01:05,930
because again, an array gives you one variable name, but multiple locations,
1210
01:01:05,930 --> 01:01:08,360
or indices therein, bracket 0, bracket 1,
1211
01:01:08,360 --> 01:01:11,330
bracket 2 on up, if it were even bigger than that.
1212
01:01:11,330 --> 01:01:16,100
Let's, now, start to consider something more modest, like simple chars.
1213
01:01:16,100 --> 01:01:18,830
Chars, being 1 byte each, so they're even smaller,
1214
01:01:18,830 --> 01:01:20,090
they take up much less space.
1215
01:01:20,090 --> 01:01:22,048
And, indeed, if I wanted to say a message like,
1216
01:01:22,048 --> 01:01:24,200
hi I could use three variables.
1217
01:01:24,200 --> 01:01:28,520
If I wanted a program to print, hi, H-I exclamation point,
1218
01:01:28,520 --> 01:01:33,230
I could, of course, store those in three variables, like c1, c2, c3.
1219
01:01:33,230 --> 01:01:36,710
And let's, for the sake of discussion, let's whip this up real quickly.
1220
01:01:36,710 --> 01:01:39,680
Let me create a new program, now, in VS Code.
1221
01:01:39,680 --> 01:01:42,920
This time, I'm going to call it hi.c.
1222
01:01:42,920 --> 01:01:45,650
And I'm not going to bother with the CS50 library.
1223
01:01:45,650 --> 01:01:47,660
I just need the standard I/O one, for now.
1224
01:01:47,660 --> 01:01:49,220
int main(void).
1225
01:01:49,220 --> 01:01:52,400
And then, inside of main, I'm going to, simply, create three variables.
1226
01:01:52,400 --> 01:01:55,760
And this is already, hopefully, striking you as a bad idea.
1227
01:01:55,760 --> 01:01:58,310
But we'll go down this road, temporarily,
1228
01:01:58,310 --> 01:02:02,300
with c1, and c2, and, finally, c3.
1229
01:02:02,300 --> 01:02:05,660
Storing each character in the phrase I want to print,
1230
01:02:05,660 --> 01:02:09,450
and I'm going to print this in a different way than usual.
1231
01:02:09,450 --> 01:02:10,880
Now I'm dealing with chars.
1232
01:02:10,880 --> 01:02:14,480
And we've, generally, dealt with strings, which was easier last week.
1233
01:02:14,480 --> 01:02:21,600
But %c, %c, %c, will let me print out three chars, and like c1, c2, and c3.
1234
01:02:21,600 --> 01:02:24,420
So, kind of, a stupid way of printing out a string.
1235
01:02:24,420 --> 01:02:26,940
So we already have a solution to this problem last week.
1236
01:02:26,940 --> 01:02:30,540
But let's poke around at what's going on underneath the hood, here.
1237
01:02:30,540 --> 01:02:33,350
So let's make hi, ./hi.
1238
01:02:33,350 --> 01:02:34,475
And, voila no surprise.
1239
01:02:34,475 --> 01:02:36,350
But we, again, could have done this last week
1240
01:02:36,350 --> 01:02:39,530
with a string and just one variable, or even, 0, at that.
1241
01:02:39,530 --> 01:02:43,220
But let's start converting these characters
1242
01:02:43,220 --> 01:02:47,750
to their apparent numeric equivalents like we talked about in week 0 too.
1243
01:02:47,750 --> 01:02:52,310
Let me modify these %c's, just to be fun, to be %i's.
1244
01:02:52,310 --> 01:02:56,180
And let me add some spaces so there are gaps between each of them.
1245
01:02:56,180 --> 01:03:00,350
Let me, now, recompile hi, and let me rerun it.
1246
01:03:00,350 --> 01:03:02,900
Just to guess, what should I see on the screen now?
1247
01:03:05,690 --> 01:03:06,200
Any guesses?
1248
01:03:06,200 --> 01:03:06,700
Yeah?
1249
01:03:06,700 --> 01:03:08,036
AUDIENCE: The ASCII values?
1250
01:03:08,036 --> 01:03:09,760
DAVID MALAN: The ASCII values.
1251
01:03:09,760 --> 01:03:12,220
And it's intentional that I keep using the same word,
1252
01:03:12,220 --> 01:03:18,250
hi, because it should be, hopefully, the old friends, 72, 73, and 33.
1253
01:03:18,250 --> 01:03:22,120
Which, is to say, that c knows about ASCII, or equivalently, Unicode,
1254
01:03:22,120 --> 01:03:24,320
and can do this conversion for us automatically.
1255
01:03:24,320 --> 01:03:27,670
And it seems to be doing it implicitly for us, so to speak.
1256
01:03:27,670 --> 01:03:31,000
Notice that c1, c2 and c3 are, obviously, chars,
1257
01:03:31,000 --> 01:03:34,420
but printf is able to tolerate printing them as integers.
1258
01:03:34,420 --> 01:03:38,870
If I really want it to be pedantic, I could use this technique, again,
1259
01:03:38,870 --> 01:03:41,320
known as typecasting, where I can actually
1260
01:03:41,320 --> 01:03:46,610
convert one data type to another, if it makes logical sense to do so.
1261
01:03:46,610 --> 01:03:49,900
And we saw in week 0, chars, or characters,
1262
01:03:49,900 --> 01:03:53,500
are just numbers, like 72, 73, and 33.
1263
01:03:53,500 --> 01:03:57,680
So I can use this parenthetical expression to convert, incorrectly,
1264
01:03:57,680 --> 01:04:02,623
[LAUGHTER] three chars to three integers, instead.
1265
01:04:02,623 --> 01:04:04,540
So that's what I meant to type the first time.
1266
01:04:04,540 --> 01:04:05,040
There we go.
1267
01:04:05,040 --> 01:04:05,800
Strike two, today.
1268
01:04:05,800 --> 01:04:09,280
So parenthesis, int, close parenthesis says
1269
01:04:09,280 --> 01:04:14,840
take whatever variable comes after this, c1, c2, or c3 and convert it to an int.
1270
01:04:14,840 --> 01:04:18,640
The effect is going to be no different, make hi, and then rerunning whoops--
1271
01:04:18,640 --> 01:04:24,910
then running ./hi still works the same, but now I'm explicitly converting chars
1272
01:04:24,910 --> 01:04:25,660
to ints.
1273
01:04:25,660 --> 01:04:29,260
And we can do this all day long, chars to ints, floats to ints,
1274
01:04:29,260 --> 01:04:30,250
ints to floats.
1275
01:04:30,250 --> 01:04:31,888
Sometimes, it's equivalent.
1276
01:04:31,888 --> 01:04:33,805
Other times, you're going to lose information.
1277
01:04:33,805 --> 01:04:37,270
Taking a float to an int, just intuitively,
1278
01:04:37,270 --> 01:04:39,790
is going to throw away everything after the decimal point,
1279
01:04:39,790 --> 01:04:42,680
because an int has no decimal point.
1280
01:04:42,680 --> 01:04:45,100
But, for now, I'm going to rewind to the version of this
1281
01:04:45,100 --> 01:04:49,150
that just did implicit-type conversion, or implicit casting,
1282
01:04:49,150 --> 01:04:53,350
just to demonstrate that we can, indeed, see the values underneath the hood.
1283
01:04:53,350 --> 01:04:53,950
All right.
1284
01:04:53,950 --> 01:04:56,370
Let me go ahead and do this, now, the week 1 way.
1285
01:04:56,370 --> 01:04:57,370
This was kind of stupid.
1286
01:04:57,370 --> 01:05:00,205
Let's just do printf, quote-unquote--
1287
01:05:00,205 --> 01:05:04,630
Actually, let's do this, string s equals quote-unquote hi,
1288
01:05:04,630 --> 01:05:09,680
and then let's do a simple printf with %s, printing out s's there.
1289
01:05:09,680 --> 01:05:12,520
So now I've rewound to last week, where we began this story,
1290
01:05:12,520 --> 01:05:16,660
but you'll notice that, if we keep playing around with this--
1291
01:05:16,660 --> 01:05:18,860
whoops, what did I do here?
1292
01:05:18,860 --> 01:05:23,470
Oh, and let me introduce the C50 library here, more on that next before long.
1293
01:05:23,470 --> 01:05:26,260
Let me go ahead and recompile, rerun this,
1294
01:05:26,260 --> 01:05:28,268
we seem to be coding in circles, here.
1295
01:05:28,268 --> 01:05:30,810
Like, I've just done the same thing multiple, different ways.
1296
01:05:30,810 --> 01:05:33,400
But there's clearly an equivalence, then,
1297
01:05:33,400 --> 01:05:36,978
between sequences of chars and strings.
1298
01:05:36,978 --> 01:05:38,770
And if you do it the real pedantic way, you
1299
01:05:38,770 --> 01:05:43,390
have three different variables, c1, c2, c3, representing H-I exclamation point,
1300
01:05:43,390 --> 01:05:47,870
or you can just treat them all together like this h, i, exclamation point.
1301
01:05:47,870 --> 01:05:52,030
But it turns out that strings are actually
1302
01:05:52,030 --> 01:05:58,060
implemented by the computer in a pretty now familiar way.
1303
01:05:58,060 --> 01:06:04,382
What might a string actually be as of this point in the story?
1304
01:06:04,382 --> 01:06:05,590
Where are we going with this?
1305
01:06:05,590 --> 01:06:06,923
Let me try to look further back.
1306
01:06:06,923 --> 01:06:07,850
Yeah, in way back?
1307
01:06:07,850 --> 01:06:08,350
Yeah?
1308
01:06:08,350 --> 01:06:10,600
AUDIENCE: Can a string like this be an array of chars?
1309
01:06:10,600 --> 01:06:13,410
DAVID MALAN: Yeah, a string might be, and indeed is, just
1310
01:06:13,410 --> 01:06:14,800
an array of characters.
1311
01:06:14,800 --> 01:06:17,190
So last week we took for granted that strings exist.
1312
01:06:17,190 --> 01:06:19,530
Technically, strings exist, but they're implemented
1313
01:06:19,530 --> 01:06:23,070
as arrays of characters, which actually opens up
1314
01:06:23,070 --> 01:06:25,770
some interesting possibilities for us.
1315
01:06:25,770 --> 01:06:28,300
Because, let me see, let me see if I can do this.
1316
01:06:28,300 --> 01:06:31,560
Let me try to print out, now, three integers again.
1317
01:06:31,560 --> 01:06:37,530
But if string s is but an array, as you propose, maybe I can do s bracket 0,
1318
01:06:37,530 --> 01:06:39,760
s bracket 1, and s bracket 2.
1319
01:06:39,760 --> 01:06:43,650
So maybe I can start poking around inside of strings,
1320
01:06:43,650 --> 01:06:45,630
even though we didn't do this last week, so I
1321
01:06:45,630 --> 01:06:47,260
can get at those individual values.
1322
01:06:47,260 --> 01:06:51,270
So make hi, ./hi and, voila, there we go again.
1323
01:06:51,270 --> 01:06:56,208
It's the same 72, 73, 33, but now, I'm sort of, hopefully,
1324
01:06:56,208 --> 01:06:58,500
like, wrapping my mind around the fact that, all right,
1325
01:06:58,500 --> 01:07:01,650
a string is just an array of characters, and arrays, you
1326
01:07:01,650 --> 01:07:04,960
can index into them using this new square bracket notation.
1327
01:07:04,960 --> 01:07:08,040
So I can get at any one of these individual characters,
1328
01:07:08,040 --> 01:07:14,055
and, heck, convert it to an integer like we did in week 0.
1329
01:07:14,055 --> 01:07:17,010
Let me get a little curious now.
1330
01:07:17,010 --> 01:07:20,020
What else might be in the computer's memory?
1331
01:07:20,020 --> 01:07:23,550
Well, let's-- I'll go back to the depiction of these same things.
1332
01:07:23,550 --> 01:07:25,860
Here might be how we originally implemented hi
1333
01:07:25,860 --> 01:07:28,800
with three variables, c1, c2, c3.
1334
01:07:28,800 --> 01:07:31,500
Of course, that map to these decimal digits or equivalent,
1335
01:07:31,500 --> 01:07:32,880
these binary values.
1336
01:07:32,880 --> 01:07:35,310
But what was this looking like in memory?
1337
01:07:35,310 --> 01:07:38,250
Literally, when you create a string in memory, like this,
1338
01:07:38,250 --> 01:07:41,240
string s equals quote-unquote hi, let's consider what's going on
1339
01:07:41,240 --> 01:07:42,615
underneath the hood, so to speak.
1340
01:07:42,615 --> 01:07:47,490
Well, as an abstraction, a string, it's H-I exclamation point taking up,
1341
01:07:47,490 --> 01:07:48,917
it would seem, 3 bytes, right?
1342
01:07:48,917 --> 01:07:51,000
I've gotten rid of the bars, there, because if you
1343
01:07:51,000 --> 01:07:55,650
think of a string as a type, I'm just going to use one big box of size 3.
1344
01:07:55,650 --> 01:08:00,210
But technically, a string, we've just revealed, is an array,
1345
01:08:00,210 --> 01:08:01,830
and the array is of size 3.
1346
01:08:01,830 --> 01:08:03,750
So technically, if the string is called s,
1347
01:08:03,750 --> 01:08:05,970
s bracket 0 will give you the first character,
1348
01:08:05,970 --> 01:08:09,810
s bracket 1, the second, and s bracket 3, the third.
1349
01:08:09,810 --> 01:08:13,290
But let me ask this question now, if this, at the end of the day,
1350
01:08:13,290 --> 01:08:16,560
is the only thing in your computer memory
1351
01:08:16,560 --> 01:08:20,790
and the ability, like a canvas to draw 0s and 1s, or numbers, or characters,
1352
01:08:20,790 --> 01:08:22,620
or whatever on it, but that's it, like this
1353
01:08:22,620 --> 01:08:25,770
is what your Mac, and PC, and phone ultimately reduced to.
1354
01:08:25,770 --> 01:08:29,730
Suppose that I'm running a piece of software, like a text messenger,
1355
01:08:29,730 --> 01:08:33,000
and now I write down bye exclamation point.
1356
01:08:33,000 --> 01:08:34,860
Well, where might that go in memory?
1357
01:08:34,860 --> 01:08:35,845
Well, it might go here.
1358
01:08:35,845 --> 01:08:39,333
B-Y-E. And then the next thing I type might go here, here, here and so forth.
1359
01:08:39,333 --> 01:08:41,250
My memory just might get filled up, over time,
1360
01:08:41,250 --> 01:08:44,310
with things that you or someone else are typing.
1361
01:08:44,310 --> 01:08:50,580
But then how does the computer know if, potentially, B-Y-E exclamation point
1362
01:08:50,580 --> 01:08:56,150
is right after H-I exclamation point where one string ends and the next one
1363
01:08:56,150 --> 01:08:56,650
begins?
1364
01:08:58,930 --> 01:08:59,430
Right?
1365
01:08:59,430 --> 01:09:03,070
All we have are bytes, or 0s and 1s.
1366
01:09:03,070 --> 01:09:05,730
So if you were designing this, how would you
1367
01:09:05,730 --> 01:09:08,280
implement some kind of delimiter between the two?
1368
01:09:08,280 --> 01:09:10,260
Or figure out what the length of a string is?
1369
01:09:10,260 --> 01:09:11,010
What do you think?
1370
01:09:11,010 --> 01:09:12,148
AUDIENCE: A nul character.
1371
01:09:12,148 --> 01:09:15,107
DAVID MALAN: OK, so the right answer is use a nul character,
1372
01:09:15,107 --> 01:09:17,190
and for those who don't know, what does that mean?
1373
01:09:17,190 --> 01:09:19,492
AUDIENCE: It's special.
1374
01:09:19,492 --> 01:09:21,450
DAVID MALAN: Yeah, so it's a special character.
1375
01:09:21,450 --> 01:09:23,520
Let me describe it as a sentinel character.
1376
01:09:23,520 --> 01:09:25,575
Humans decided some time ago that you know
1377
01:09:25,575 --> 01:09:28,560
what, if we want to delineate where one string ends
1378
01:09:28,560 --> 01:09:32,010
and where the next one begins, we just need some special symbol.
1379
01:09:32,010 --> 01:09:35,189
And the symbol they'll use is generally written as backslash 0.
1380
01:09:35,189 --> 01:09:39,555
This is just shorthand notation for literally eight 0 bits.
1381
01:09:39,555 --> 01:09:42,540
0, 0, 0, 0, 0, 0, 0, 0.
1382
01:09:42,540 --> 01:09:46,140
And the nickname for eight 0 bits, in this context,
1383
01:09:46,140 --> 01:09:48,930
is nul, N-U-L, so to speak.
1384
01:09:48,930 --> 01:09:51,910
And we can actually see this as follows.
1385
01:09:51,910 --> 01:09:53,913
If you look at the corresponding decimal digits,
1386
01:09:53,913 --> 01:09:56,580
like you could do by doing out the math or doing the conversion,
1387
01:09:56,580 --> 01:10:01,560
like we've done in code, you would see for storing hi, 72, 73, 33,
1388
01:10:01,560 --> 01:10:06,600
but then 1 extra byte that's sort of invisibly there, but that is all 0s.
1389
01:10:06,600 --> 01:10:09,120
And now I've just written it as the decimal number 0.
1390
01:10:09,120 --> 01:10:12,120
The implication of this is that the computer is apparently
1391
01:10:12,120 --> 01:10:16,695
using, not 3 bytes to store a word like hi, but 4 bytes.
1392
01:10:16,695 --> 01:10:22,050
Whatever the length of the string is, plus 1 for this special sentinel value
1393
01:10:22,050 --> 01:10:24,640
that demarcates the end of the string.
1394
01:10:24,640 --> 01:10:26,680
So we might draw it like this instead.
1395
01:10:26,680 --> 01:10:31,350
And this character is, again, pronounced nul, or written N-U-L.
1396
01:10:31,350 --> 01:10:32,319
So that's all, right?
1397
01:10:32,319 --> 01:10:35,069
If humans, at the end of the day, just have this canvas of memory,
1398
01:10:35,069 --> 01:10:36,902
they just needed to decide, all right, well,
1399
01:10:36,902 --> 01:10:39,990
how do we distinguish one string from another?
1400
01:10:39,990 --> 01:10:42,660
It's a lot easier with chars, individually, it's
1401
01:10:42,660 --> 01:10:45,450
a lot easier with ints, it's even easier With floats, why?
1402
01:10:45,450 --> 01:10:49,620
Because, per that chart earlier, every character is always 1 byte.
1403
01:10:49,620 --> 01:10:51,810
Every int is always 4 bytes.
1404
01:10:51,810 --> 01:10:54,750
Every long is always 8 bytes.
1405
01:10:54,750 --> 01:10:56,279
How long is a string?
1406
01:10:56,279 --> 01:10:59,760
Well, hi is 1, 2, 3 with an exclamation point.
1407
01:10:59,760 --> 01:11:03,029
Bye is 1, 2, 3, 4 with an exclamation point.
1408
01:11:03,029 --> 01:11:06,450
David is D-A-V-I-D, five without an exclamation point.
1409
01:11:06,450 --> 01:11:10,210
And so a string can be any number of bytes long,
1410
01:11:10,210 --> 01:11:12,700
so you somehow need to draw a line in the sand
1411
01:11:12,700 --> 01:11:16,706
to separate in memory one string from another.
1412
01:11:16,706 --> 01:11:19,412
So what's the implication of this?
1413
01:11:19,412 --> 01:11:20,870
Well, let me go back to code, here.
1414
01:11:20,870 --> 01:11:22,210
Let's actually poke around.
1415
01:11:22,210 --> 01:11:27,130
This is a bit dangerous, but I'm going to start looking at memory locations
1416
01:11:27,130 --> 01:11:29,210
past my string here.
1417
01:11:29,210 --> 01:11:33,250
So let me go ahead and recompile, make hi.
1418
01:11:33,250 --> 01:11:35,110
Whoops, what did I do here?
1419
01:11:35,110 --> 01:11:36,680
I forgot a format code.
1420
01:11:36,680 --> 01:11:38,620
Let me add one more %i.
1421
01:11:38,620 --> 01:11:42,550
Now let me go ahead and rerun make hi, ./hi, Enter.
1422
01:11:42,550 --> 01:11:43,580
There it is.
1423
01:11:43,580 --> 01:11:46,660
So you can actually see in the computer, unbeknownst to you
1424
01:11:46,660 --> 01:11:49,830
previously, that there's indeed something else going on there.
1425
01:11:49,830 --> 01:11:52,880
And if I were to make one other variant of this program--
1426
01:11:52,880 --> 01:11:55,630
let's get rid of just this one word and let's have two.
1427
01:11:55,630 --> 01:11:57,550
So let me give myself another string called t,
1428
01:11:57,550 --> 01:12:01,810
for instance, just this common convention with bye exclamation point.
1429
01:12:01,810 --> 01:12:04,900
Let me, then print out with %s.
1430
01:12:04,900 --> 01:12:10,785
And let me also print out with %s, whoops, printf, print out t, as well.
1431
01:12:10,785 --> 01:12:14,320
Let me recompile this program, and obviously the out--
1432
01:12:14,320 --> 01:12:17,470
ugh-- this is what happens when I go too fast.
1433
01:12:17,470 --> 01:12:20,740
All right, third mistake today, close quote.
1434
01:12:20,740 --> 01:12:22,030
As I was missing.
1435
01:12:22,030 --> 01:12:23,590
Make hi.
1436
01:12:23,590 --> 01:12:25,000
Fourth mistake today.
1437
01:12:25,000 --> 01:12:26,200
Make hi.
1438
01:12:26,200 --> 01:12:27,490
Dot slash hi.
1439
01:12:27,490 --> 01:12:28,210
OK, voila.
1440
01:12:28,210 --> 01:12:30,610
Now we have a program that's printing both hi and bye,
1441
01:12:30,610 --> 01:12:34,720
only so that we can consider what's going on in the computer's memory.
1442
01:12:34,720 --> 01:12:40,210
If s is storing hi and apparently one bonus byte that
1443
01:12:40,210 --> 01:12:43,240
demarcates the end of that string, bye is apparently
1444
01:12:43,240 --> 01:12:46,413
going to fit into the location directly after.
1445
01:12:46,413 --> 01:12:49,330
And it's wrapping around, but that's just an artist's rendition, here.
1446
01:12:49,330 --> 01:12:52,000
But bye, B-Y-E exclamation point is taking up
1447
01:12:52,000 --> 01:12:58,948
1, 2, 3, 4, plus a fifth byte, as well.
1448
01:12:58,948 --> 01:13:03,580
All right, any questions on this underlying representation of strings?
1449
01:13:03,580 --> 01:13:05,560
And we'll contextualize this, before long,
1450
01:13:05,560 --> 01:13:07,840
so that this isn't just like, OK, who really cares?
1451
01:13:07,840 --> 01:13:10,730
This is going to be the source of actually implementing things.
1452
01:13:10,730 --> 01:13:13,510
In fact for problem set 2, like cryptography, and encryption,
1453
01:13:13,510 --> 01:13:15,468
and scrambling actual human messages.
1454
01:13:15,468 --> 01:13:16,510
But some questions first.
1455
01:13:16,510 --> 01:13:20,650
AUDIENCE: So normally if you were to not use string,
1456
01:13:20,650 --> 01:13:23,480
you would just make a character range that would declare,
1457
01:13:23,480 --> 01:13:26,580
how many characters there are so you know how many characters are
1458
01:13:26,580 --> 01:13:27,330
going to be there.
1459
01:13:27,330 --> 01:13:29,480
DAVID MALAN: A good question, too and let
1460
01:13:29,480 --> 01:13:32,115
me summarize as, if we were instead to use chars all the time,
1461
01:13:32,115 --> 01:13:35,240
we would indeed have to know in advance how many chars you want for a given
1462
01:13:35,240 --> 01:13:38,750
string that you're storing, how, then, does something like get string work,
1463
01:13:38,750 --> 01:13:41,000
because when you CS50 wrote the get string function,
1464
01:13:41,000 --> 01:13:43,190
we obviously don't know how long the words are
1465
01:13:43,190 --> 01:13:45,020
going to be that you all are typing in.
1466
01:13:45,020 --> 01:13:48,560
It turns out, two weeks from now we'll see that get string
1467
01:13:48,560 --> 01:13:51,320
uses a technique known as dynamic memory allocation.
1468
01:13:51,320 --> 01:13:55,770
And it's going to grow or shrink the array automatically for you.
1469
01:13:55,770 --> 01:13:57,050
But more on that soon.
1470
01:13:57,050 --> 01:13:57,920
Other questions?
1471
01:13:57,920 --> 01:14:01,450
AUDIENCE: Why are we using a nul value?
1472
01:14:01,450 --> 01:14:02,725
Isn't that wasting a byte?
1473
01:14:02,725 --> 01:14:03,850
DAVID MALAN: Good question.
1474
01:14:03,850 --> 01:14:06,880
Why are we using a nul value, isn't it wasting a byte?
1475
01:14:06,880 --> 01:14:07,630
Yes.
1476
01:14:07,630 --> 01:14:13,210
But I claim there's really no other way to distinguish the end of one string
1477
01:14:13,210 --> 01:14:19,748
from the start of another, unless we make some sort of notation in memory.
1478
01:14:19,748 --> 01:14:22,540
All we have, at the end of the day, inside of a computer, are bits.
1479
01:14:22,540 --> 01:14:25,900
Therefore, all we can do is spin those bits in some creative way
1480
01:14:25,900 --> 01:14:27,520
to solve this problem.
1481
01:14:27,520 --> 01:14:30,710
So we're minimally going to spend 1 byte to solve this problem.
1482
01:14:30,710 --> 01:14:31,210
Yeah?
1483
01:14:31,210 --> 01:14:35,897
AUDIENCE: How does our memory device know to enter a line when you type
1484
01:14:35,897 --> 01:14:39,270
the /n if we don't have it stored as a char?
1485
01:14:39,270 --> 01:14:40,910
DAVID MALAN: If you don't--
1486
01:14:40,910 --> 01:14:44,690
how does the computer know to move to a next line when you have a /n?
1487
01:14:44,690 --> 01:14:47,990
So /n, even though it looks like two characters,
1488
01:14:47,990 --> 01:14:51,890
it's actually stored as just 1 byte in the computer's memory.
1489
01:14:51,890 --> 01:14:54,357
There's a mapping between it and an actual number.
1490
01:14:54,357 --> 01:14:57,440
And you can see that, for instance, on the ASCII chart from the other day.
1491
01:14:57,440 --> 01:15:01,224
AUDIENCE: So with that being stored would be the [INAUDIBLE]..
1492
01:15:01,224 --> 01:15:02,420
DAVID MALAN: It would be.
1493
01:15:02,420 --> 01:15:08,210
If I had put a /n in my code here, right after the exclamation point here
1494
01:15:08,210 --> 01:15:11,840
and here, that would actually shift everything in memory because we would
1495
01:15:11,840 --> 01:15:16,740
need to make room for a /n here and another one over here.
1496
01:15:16,740 --> 01:15:18,913
So it would take two more bytes, exactly.
1497
01:15:18,913 --> 01:15:19,580
Other questions?
1498
01:15:19,580 --> 01:15:26,050
AUDIENCE: So if hi exclamation point is written in binary and ASCII
1499
01:15:26,050 --> 01:15:32,630
too as 72, 73, 33, if we are to write those numbers in the string,
1500
01:15:32,630 --> 01:15:39,090
and convert them into binary how would the computer know what's 72
1501
01:15:39,090 --> 01:15:40,390
and what's 8?
1502
01:15:40,390 --> 01:15:42,390
DAVID MALAN: And what's the last thing you said?
1503
01:15:42,390 --> 01:15:43,806
AUDIENCE: 8, for example.
1504
01:15:43,806 --> 01:15:45,700
DAVID MALAN: It's context sensitive.
1505
01:15:45,700 --> 01:15:48,450
So if, at the end of the day, all we're storing is these numbers,
1506
01:15:48,450 --> 01:15:52,380
like 72, 73, 33, recall that it's up to the program
1507
01:15:52,380 --> 01:15:55,470
to decide, based on context, how to interpret them.
1508
01:15:55,470 --> 01:15:59,310
And I simplified this story in week 0 saying that Photoshop interprets them
1509
01:15:59,310 --> 01:16:02,910
as RGB colors, and iMessage or a text messaging program
1510
01:16:02,910 --> 01:16:07,440
interprets them as letters, and Excel interprets them as numbers.
1511
01:16:07,440 --> 01:16:12,540
How those programs do it is by way of variables like string, and int,
1512
01:16:12,540 --> 01:16:13,080
and float.
1513
01:16:13,080 --> 01:16:14,872
And in fact, later this semester, we'll see
1514
01:16:14,872 --> 01:16:19,500
a data type via which you can represent a color as a triple of numbers,
1515
01:16:19,500 --> 01:16:22,240
and red value, a green value, and a blue value.
1516
01:16:22,240 --> 01:16:24,600
So we'll see other data types as well.
1517
01:16:24,600 --> 01:16:25,100
Yeah?
1518
01:16:25,100 --> 01:16:29,320
AUDIENCE: It seems easy enough to just add a nul thing at the end of the word,
1519
01:16:29,320 --> 01:16:32,190
so why do we have integers and long integers?
1520
01:16:32,190 --> 01:16:35,192
Why can't we make everything variable in its data size?
1521
01:16:35,192 --> 01:16:36,900
DAVID MALAN: Really interesting question.
1522
01:16:36,900 --> 01:16:40,110
Why could we not just make all data types variable in size?
1523
01:16:40,110 --> 01:16:43,560
And some languages, some libraries do exactly this.
1524
01:16:43,560 --> 01:16:47,100
C is an older language, and because memory was expensive
1525
01:16:47,100 --> 01:16:48,300
memory was limited.
1526
01:16:48,300 --> 01:16:50,640
The reality was you gain benefits from just
1527
01:16:50,640 --> 01:16:53,010
standardizing the size of these things.
1528
01:16:53,010 --> 01:16:55,410
You also get performance increases in the sense
1529
01:16:55,410 --> 01:16:59,620
that if you know every int is 4 bytes, you can very quickly,
1530
01:16:59,620 --> 01:17:02,220
and we'll see this next week, jump from integer to another,
1531
01:17:02,220 --> 01:17:06,600
to another in memory just by adding 4 inside of those square brackets.
1532
01:17:06,600 --> 01:17:08,430
You can very quickly poke around.
1533
01:17:08,430 --> 01:17:11,522
Whereas, if you had variable length numbers, you would have to,
1534
01:17:11,522 --> 01:17:13,980
kind of, follow, follow, follow, looking for the end of it.
1535
01:17:13,980 --> 01:17:16,780
Follow, follow-- you would have to look at more locations in memory.
1536
01:17:16,780 --> 01:17:18,322
So that's a topic we'll come back to.
1537
01:17:18,322 --> 01:17:20,700
But it was generally for efficiency.
1538
01:17:20,700 --> 01:17:22,170
And other question, yeah?
1539
01:17:22,170 --> 01:17:27,942
AUDIENCE: Why not store the nul character [INAUDIBLE]
1540
01:17:27,942 --> 01:17:31,520
DAVID MALAN: Good question why not store the--
1541
01:17:31,520 --> 01:17:35,540
why not store the nul character at the beginning?
1542
01:17:35,540 --> 01:17:41,890
You could-- let's see, why not store it at the beginning?
1543
01:17:41,890 --> 01:17:45,080
You could do that.
1544
01:17:45,080 --> 01:17:48,325
You could absolutely-- well, could you do this?
1545
01:17:51,580 --> 01:17:56,380
If you were to do that at the beginning--
1546
01:17:56,380 --> 01:17:57,400
short answer, no.
1547
01:17:57,400 --> 01:17:58,420
OK, now I retract that.
1548
01:17:58,420 --> 01:18:00,628
No, because I finally thought of a problem with this.
1549
01:18:00,628 --> 01:18:02,483
If you store it at the beginning instead,
1550
01:18:02,483 --> 01:18:04,900
we'll see in just a moment how you can actually write code
1551
01:18:04,900 --> 01:18:07,150
to figure out where the end of a string is,
1552
01:18:07,150 --> 01:18:09,550
and the problem there is wouldn't necessarily
1553
01:18:09,550 --> 01:18:13,000
know if you eventually hit a 0 at the end of the string,
1554
01:18:13,000 --> 01:18:16,810
because it's the number 0 in the context of Excel using some memory,
1555
01:18:16,810 --> 01:18:20,180
or if it's the context of some other data type, altogether.
1556
01:18:20,180 --> 01:18:22,600
So the fact that we've standardized--
1557
01:18:22,600 --> 01:18:26,560
the fact that we've standardized strings as ending with nul
1558
01:18:26,560 --> 01:18:30,655
means that we can reliably distinguish one variable from another in memory.
1559
01:18:30,655 --> 01:18:32,560
And that's actually a perfect segue way, now,
1560
01:18:32,560 --> 01:18:35,693
to actually using this primitive to building up
1561
01:18:35,693 --> 01:18:38,360
our own code that manipulates these things that are lower level.
1562
01:18:38,360 --> 01:18:39,560
So let me do this.
1563
01:18:39,560 --> 01:18:41,650
Let me create a new file called length.
1564
01:18:41,650 --> 01:18:46,000
And let's use this basic idea to figure out what the length of a string
1565
01:18:46,000 --> 01:18:50,720
is after it's been stored in a variable.
1566
01:18:50,720 --> 01:18:51,860
So let's do this.
1567
01:18:51,860 --> 01:18:56,530
Let me include both the CS50 header and the standard I/O header,
1568
01:18:56,530 --> 01:19:01,250
give myself int main(void) again here, and inside of main, do this.
1569
01:19:01,250 --> 01:19:04,060
Let me prompt the user for a string s and I'll ask them
1570
01:19:04,060 --> 01:19:08,170
for a string like their name, here.
1571
01:19:08,170 --> 01:19:13,420
And then let me name it more verbosely name this time.
1572
01:19:13,420 --> 01:19:15,170
Now let me go ahead and do this.
1573
01:19:15,170 --> 01:19:20,260
Let me iterate over every character in this string
1574
01:19:20,260 --> 01:19:22,180
in order to figure out what its length is.
1575
01:19:22,180 --> 01:19:25,060
So initially, I'm going to go ahead and say this,
1576
01:19:25,060 --> 01:19:28,040
int length equals 0, because I don't know what it is yet.
1577
01:19:28,040 --> 01:19:29,290
So we're going to start at 0.
1578
01:19:29,290 --> 01:19:32,410
And then while the following is true--
1579
01:19:32,410 --> 01:19:37,370
while-- let me-- do I want to do this?
1580
01:19:37,370 --> 01:19:40,060
Let me change this to i, just for clarity, let me do
1581
01:19:40,060 --> 01:19:45,790
this, while name bracket i does not equal that special nul character.
1582
01:19:45,790 --> 01:19:49,180
So I typed it on the slide is N-U-L, but you don't write N-U-L in code,
1583
01:19:49,180 --> 01:19:53,665
you actually use its numeric equivalent, which is /0 in single quotes.
1584
01:19:53,665 --> 01:19:58,930
While name bracket i does not equal the nul character, I'm going to go ahead
1585
01:19:58,930 --> 01:20:02,470
and increment i to i plus plus.
1586
01:20:02,470 --> 01:20:05,470
And then down here I'm going to print out the value of i
1587
01:20:05,470 --> 01:20:09,270
to see what we actually get, printing out the value of i.
1588
01:20:09,270 --> 01:20:11,020
All right, so what's going to happen here?
1589
01:20:11,020 --> 01:20:13,420
Let me run make length.
1590
01:20:13,420 --> 01:20:14,740
Fortunately no errors.
1591
01:20:14,740 --> 01:20:19,570
./length and let me type in something like H-I, exclamation point, Enter.
1592
01:20:19,570 --> 01:20:20,740
And I get 3.
1593
01:20:20,740 --> 01:20:23,950
Let me try bye, exclamation point, Enter.
1594
01:20:23,950 --> 01:20:25,870
And I get 4.
1595
01:20:25,870 --> 01:20:28,510
Let me try my own name, David, Enter.
1596
01:20:28,510 --> 01:20:29,970
5, and so forth.
1597
01:20:29,970 --> 01:20:31,880
So what's actually going on here?
1598
01:20:31,880 --> 01:20:34,490
Well, it seems that by way of this 4 loop,
1599
01:20:34,490 --> 01:20:36,622
we are specifying a local variable called
1600
01:20:36,622 --> 01:20:39,580
i initialized to 0, because we're figuring out the length of the string
1601
01:20:39,580 --> 01:20:40,580
as we go.
1602
01:20:40,580 --> 01:20:44,050
I'm then asking the question, does location 0,
1603
01:20:44,050 --> 01:20:49,300
that is i in the name string, which we now know is an array,
1604
01:20:49,300 --> 01:20:51,700
does it not equal /0?
1605
01:20:51,700 --> 01:20:55,645
Because if it doesn't, that means it's an actual character like H, or B, or D.
1606
01:20:55,645 --> 01:20:57,640
So let's increment i.
1607
01:20:57,640 --> 01:21:00,910
Then, let's come back around to line 9 and let's ask the question again.
1608
01:21:00,910 --> 01:21:02,590
Now i equals 1.
1609
01:21:02,590 --> 01:21:06,420
So does name bracket 1 not equal /0?
1610
01:21:06,420 --> 01:21:12,070
Well, if it doesn't, and it won't if it's an i, or a y, or an a,
1611
01:21:12,070 --> 01:21:15,490
based on what I typed in, we're going to increment i once more.
1612
01:21:15,490 --> 01:21:18,940
Fast-forward to the end of the story, once I get to the end of the string,
1613
01:21:18,940 --> 01:21:22,420
technically, one space past the end of the string,
1614
01:21:22,420 --> 01:21:25,510
name bracket i will equal /0.
1615
01:21:25,510 --> 01:21:29,960
So I don't increment i anymore, I end up just printing the result.
1616
01:21:29,960 --> 01:21:34,510
So what we seem to have here with some low level C code, just this while loop,
1617
01:21:34,510 --> 01:21:39,070
is a program that figures out the length of a given string that's been typed in.
1618
01:21:39,070 --> 01:21:41,860
Let's practice our abstraction and decompose this into,
1619
01:21:41,860 --> 01:21:43,270
maybe, a helper function here.
1620
01:21:43,270 --> 01:21:47,110
Let me grab all of this code here, and assume,
1621
01:21:47,110 --> 01:21:51,580
for the sake of discussion for a moment, that I can call a function now called
1622
01:21:51,580 --> 01:21:53,740
string length.
1623
01:21:53,740 --> 01:21:56,830
And the length of the string is name that I want to get,
1624
01:21:56,830 --> 01:22:01,000
and then I'll go ahead and print out, just as before with %i,
1625
01:22:01,000 --> 01:22:02,398
the length of that string.
1626
01:22:02,398 --> 01:22:04,690
So now I'm abstracting away this notion of figuring out
1627
01:22:04,690 --> 01:22:05,732
the length of the string.
1628
01:22:05,732 --> 01:22:08,470
That's an opportunity for to me to create my own function.
1629
01:22:08,470 --> 01:22:11,515
If I want to create a function called string length,
1630
01:22:11,515 --> 01:22:15,610
I'll claim that I want to take a string as input,
1631
01:22:15,610 --> 01:22:20,860
and what should I have this function return as its return type?
1632
01:22:20,860 --> 01:22:26,090
What should get string presumably return?
1633
01:22:26,090 --> 01:22:26,590
Yeah?
1634
01:22:26,590 --> 01:22:27,430
AUDIENCE: Int.
1635
01:22:27,430 --> 01:22:28,270
DAVID MALAN: An int, right?
1636
01:22:28,270 --> 01:22:29,020
An int makes sense.
1637
01:22:29,020 --> 01:22:30,937
Float really wouldn't make sense because we're
1638
01:22:30,937 --> 01:22:33,377
measuring things that are integers.
1639
01:22:33,377 --> 01:22:34,960
In this case, the length of something.
1640
01:22:34,960 --> 01:22:36,640
So indeed, let's have it return an int.
1641
01:22:36,640 --> 01:22:39,380
I can use the same code as before, so I'm
1642
01:22:39,380 --> 01:22:42,175
going to paste what I cut earlier in the file.
1643
01:22:42,175 --> 01:22:46,660
The only thing I have to change is the name of the variable.
1644
01:22:46,660 --> 01:22:50,240
Because now this function, I decided arbitrarily
1645
01:22:50,240 --> 01:22:53,130
that I'm going to call it s, just to be more generic.
1646
01:22:53,130 --> 01:22:55,915
So I'm going to look at s bracket i at each location.
1647
01:22:55,915 --> 01:22:58,790
And I don't want to print it at the end, this would be a side effect.
1648
01:22:58,790 --> 01:23:01,250
What's the line of code I should include here if I actually
1649
01:23:01,250 --> 01:23:04,005
want to hand back the total length?
1650
01:23:04,005 --> 01:23:04,505
Yeah?
1651
01:23:04,505 --> 01:23:05,362
AUDIENCE: Return i.
1652
01:23:05,362 --> 01:23:06,320
DAVID MALAN: Say again?
1653
01:23:06,320 --> 01:23:07,112
AUDIENCE: Return i.
1654
01:23:07,112 --> 01:23:09,270
DAVID MALAN: Return i, in this case.
1655
01:23:09,270 --> 01:23:11,540
So I'm going return i, not print it.
1656
01:23:11,540 --> 01:23:16,490
Because now, my main function can use the return value stored in length
1657
01:23:16,490 --> 01:23:18,530
and print it on the next line itself.
1658
01:23:18,530 --> 01:23:22,520
I just need a prototype, so that's my one forgivable copy paste here.
1659
01:23:22,520 --> 01:23:24,170
I'm going to rerun make length.
1660
01:23:24,170 --> 01:23:25,640
Hopefully I didn't screw up.
1661
01:23:25,640 --> 01:23:29,330
I didn't. ./length, I'll type in hi-- oops--
1662
01:23:29,330 --> 01:23:31,340
I'll type in hi, again.
1663
01:23:31,340 --> 01:23:31,880
That works.
1664
01:23:31,880 --> 01:23:34,970
I'll type in bye again, and so forth.
1665
01:23:34,970 --> 01:23:38,703
So now we have a function that determines the length of a string.
1666
01:23:38,703 --> 01:23:41,120
Well, it turns out we didn't actually need this all along.
1667
01:23:41,120 --> 01:23:46,042
It turns out that we can get rid of my own custom string length function here.
1668
01:23:46,042 --> 01:23:48,500
I can definitely delete the whole implementation down here.
1669
01:23:48,500 --> 01:23:52,160
Because it turns out, in a file called string.h,
1670
01:23:52,160 --> 01:23:55,520
which is a new header file today, we actually have access to a function
1671
01:23:55,520 --> 01:23:59,690
called, more succinctly, strlen, S-T-R-L-E-N. Which,
1672
01:23:59,690 --> 01:24:01,130
literally does that.
1673
01:24:01,130 --> 01:24:05,240
This is a function that comes with C, albeit in the string.h header file,
1674
01:24:05,240 --> 01:24:09,450
and it does what we just implemented manually.
1675
01:24:09,450 --> 01:24:13,340
So here's an example of, admittedly, a wheel we just reinvented, but no more.
1676
01:24:13,340 --> 01:24:14,480
We don't have to do that.
1677
01:24:14,480 --> 01:24:16,850
And how do what kinds of functions exist?
1678
01:24:16,850 --> 01:24:21,260
Well, let me pop out of my browser here to a website that
1679
01:24:21,260 --> 01:24:24,455
is a CS50's incarnation of what are called manual pages.
1680
01:24:24,455 --> 01:24:28,070
It turns out that in a lot of systems, Macs, and Unix,
1681
01:24:28,070 --> 01:24:31,100
and Linux systems, including the Visual Studio Code
1682
01:24:31,100 --> 01:24:33,020
instance that we have in the cloud, there
1683
01:24:33,020 --> 01:24:36,290
are publicly accessible manual pages for functions.
1684
01:24:36,290 --> 01:24:39,770
They tend to be written very expertly, in a way that's
1685
01:24:39,770 --> 01:24:41,160
not very beginner-friendly.
1686
01:24:41,160 --> 01:24:45,650
So we have here at manual.cs50.io is CS50's version
1687
01:24:45,650 --> 01:24:48,740
of manual pages that have this less-comfortable mode that
1688
01:24:48,740 --> 01:24:51,290
give you a, sort of, cheat sheet of very frequently used,
1689
01:24:51,290 --> 01:24:55,010
helpful functions in C. And we've translated the expert
1690
01:24:55,010 --> 01:24:58,075
notation to things that a beginner can understand.
1691
01:24:58,075 --> 01:25:02,190
So, for instance, let me go ahead and search for a string up at the top here.
1692
01:25:02,190 --> 01:25:06,200
You'll see that there's documentation for our own get string function,
1693
01:25:06,200 --> 01:25:08,510
but more interestingly down here, there's
1694
01:25:08,510 --> 01:25:10,850
a whole bunch of string-related functions
1695
01:25:10,850 --> 01:25:12,620
that we haven't even seen most of, yet.
1696
01:25:12,620 --> 01:25:14,660
But there's indeed one here called strlen,
1697
01:25:14,660 --> 01:25:16,620
calculate the length of a string.
1698
01:25:16,620 --> 01:25:22,160
And so if I go to strlen here, I'll see some less-comfortable documentation
1699
01:25:22,160 --> 01:25:22,970
for this function.
1700
01:25:22,970 --> 01:25:25,400
And the way a manual page typically works,
1701
01:25:25,400 --> 01:25:28,310
whether in CS50's format or any other, system
1702
01:25:28,310 --> 01:25:30,950
is you see, typically, a synopsis of what header
1703
01:25:30,950 --> 01:25:33,330
files you need to use the function.
1704
01:25:33,330 --> 01:25:35,960
So you would copy paste these couple of lines here.
1705
01:25:35,960 --> 01:25:39,530
You see what the prototype is of the function so
1706
01:25:39,530 --> 01:25:42,533
that you know what its inputs are, if any, and its outputs are, if any.
1707
01:25:42,533 --> 01:25:45,200
Then down below you might see a description, which in this case,
1708
01:25:45,200 --> 01:25:46,320
is pretty straightforward.
1709
01:25:46,320 --> 01:25:48,170
This function calculates the length of s.
1710
01:25:48,170 --> 01:25:51,110
Then you see what the return value is, if any,
1711
01:25:51,110 --> 01:25:54,310
and you might even see an example, like this one that we've whipped up here.
1712
01:25:54,310 --> 01:25:57,012
So these manual pages which are again, accessible
1713
01:25:57,012 --> 01:25:59,720
here, and we'll link to these in the problem sets moving forward,
1714
01:25:59,720 --> 01:26:02,510
are pretty much the place to start when you want to figure out
1715
01:26:02,510 --> 01:26:05,210
has a wheel been invented already?
1716
01:26:05,210 --> 01:26:08,490
Is there a function that might help me solve some problems set problems
1717
01:26:08,490 --> 01:26:11,900
so that I don't have to really get into the weeds of doing all
1718
01:26:11,900 --> 01:26:13,712
of those lower-level steps as I've had.
1719
01:26:13,712 --> 01:26:16,670
Sometimes the answer is going to be yes, sometimes it's going to be no.
1720
01:26:16,670 --> 01:26:19,160
But again the point of our having just done this together
1721
01:26:19,160 --> 01:26:21,950
is to reveal that even the functions you start taking for
1722
01:26:21,950 --> 01:26:26,135
granted, they all reduce to some of these basic building blocks.
1723
01:26:26,135 --> 01:26:29,600
At the end of the day, this is all that's inside of your computer
1724
01:26:29,600 --> 01:26:30,950
is 0s and 1s.
1725
01:26:30,950 --> 01:26:33,060
We're just learning, now, how to harness those
1726
01:26:33,060 --> 01:26:37,220
and how to manipulate them ourselves.
1727
01:26:37,220 --> 01:26:41,510
Any questions here on this?
1728
01:26:41,510 --> 01:26:43,305
Any questions at all?
1729
01:26:43,305 --> 01:26:43,805
Yeah.
1730
01:26:43,805 --> 01:26:51,779
AUDIENCE: We did just see [INAUDIBLE] Is that so common
1731
01:26:51,779 --> 01:26:54,035
that we would have to specify it, or is it not?
1732
01:26:54,035 --> 01:26:55,160
DAVID MALAN: Good question.
1733
01:26:55,160 --> 01:26:57,920
Is it so common that you would have to specify it or not?
1734
01:26:57,920 --> 01:27:00,170
You do need to include its header files because that's
1735
01:27:00,170 --> 01:27:01,670
where all of those prototypes are.
1736
01:27:01,670 --> 01:27:05,190
You don't need to worry about linking it in with -l anything.
1737
01:27:05,190 --> 01:27:07,340
And in fact, moving forward, you do not ever
1738
01:27:07,340 --> 01:27:10,910
need to worry about linking in libraries when compiling your code.
1739
01:27:10,910 --> 01:27:14,940
We, the staff, have configured make to do all of that for you automatically.
1740
01:27:14,940 --> 01:27:17,030
We want you to understand that it is doing it,
1741
01:27:17,030 --> 01:27:19,340
but we'll take care of all of the -l's for you.
1742
01:27:19,340 --> 01:27:23,360
But the onus is on you for the prototypes and the header files.
1743
01:27:23,360 --> 01:27:27,150
Other questions on these representations or techniques?
1744
01:27:27,150 --> 01:27:27,650
Yeah?
1745
01:27:27,650 --> 01:27:35,920
AUDIENCE: [INAUDIBLE] exclamation mark.
1746
01:27:35,920 --> 01:27:40,524
How does it actually define the spaces [INAUDIBLE]??
1747
01:27:40,524 --> 01:27:41,920
DAVID MALAN: A good question.
1748
01:27:41,920 --> 01:27:45,700
If you were to have a string with actual spaces in it that is multiple words,
1749
01:27:45,700 --> 01:27:47,530
what would the computer actually do?
1750
01:27:47,530 --> 01:27:49,960
Well for this. let me go to asciichart.com.
1751
01:27:49,960 --> 01:27:54,880
Which is just a random website that's my go-to for the first 127 characters
1752
01:27:54,880 --> 01:27:55,930
of ASCII.
1753
01:27:55,930 --> 01:27:58,520
This is, in fact, what we had a screenshot of the other day.
1754
01:27:58,520 --> 01:28:02,088
And if you look here, it's a little non-obvious, but S-P is space.
1755
01:28:02,088 --> 01:28:05,380
If a computer were to store a space, it would actually store the decimal number
1756
01:28:05,380 --> 01:28:10,430
32, or technically, the pattern of 0s and 1s that represent the number 32.
1757
01:28:10,430 --> 01:28:13,240
All of the US English keys that you might type on a keyboard
1758
01:28:13,240 --> 01:28:16,390
can be represented with a number, and using Unicode can
1759
01:28:16,390 --> 01:28:18,920
you express even things like emojis and other languages.
1760
01:28:18,920 --> 01:28:19,420
Yeah?
1761
01:28:19,420 --> 01:28:23,130
AUDIENCE: Are only strings followed by nul number,
1762
01:28:23,130 --> 01:28:26,516
or let's say we had a series of numbers, would each one of them
1763
01:28:26,516 --> 01:28:27,845
be accompanied by nuls?
1764
01:28:27,845 --> 01:28:28,970
DAVID MALAN: Good question.
1765
01:28:28,970 --> 01:28:31,790
Only strings are accompanied by nuls at the end
1766
01:28:31,790 --> 01:28:34,760
because every other data type we've talked about thus far
1767
01:28:34,760 --> 01:28:37,130
is of well defined finite length.
1768
01:28:37,130 --> 01:28:40,190
1 byte for char, 4 bytes for ints and so forth.
1769
01:28:40,190 --> 01:28:44,240
If we think back to last week, we did end the week with a couple of problems.
1770
01:28:44,240 --> 01:28:48,080
Integer overflow, because 4 bytes, heck, even 8 bytes is sometimes not enough.
1771
01:28:48,080 --> 01:28:50,270
We also talked about floating point imprecision.
1772
01:28:50,270 --> 01:28:53,480
Thankfully in the world of scientific computing and financial computing,
1773
01:28:53,480 --> 01:28:56,930
there are libraries you can use that draw inspiration
1774
01:28:56,930 --> 01:28:58,820
from this idea of a string, and they might
1775
01:28:58,820 --> 01:29:02,640
use 9 bytes for an integer value or maybe 20 bytes
1776
01:29:02,640 --> 01:29:04,170
that you can count really high.
1777
01:29:04,170 --> 01:29:06,680
But they will then start to manage that memory for you
1778
01:29:06,680 --> 01:29:09,960
and what they're really probably doing is just grabbing a whole bunch of bytes
1779
01:29:09,960 --> 01:29:13,070
and somehow remembering how long the sequence of bytes is.
1780
01:29:13,070 --> 01:29:16,190
That's how these higher-level libraries work, too.
1781
01:29:16,190 --> 01:29:17,700
All right, this has been a lot.
1782
01:29:17,700 --> 01:29:19,080
Let's take one more break here.
1783
01:29:19,080 --> 01:29:20,670
We'll do a seven-minute break here.
1784
01:29:20,670 --> 01:29:23,465
And when we come back, we'll flesh out a few more details.
1785
01:29:23,465 --> 01:29:26,390
All right.
1786
01:29:26,390 --> 01:29:31,400
So we just saw strlen as an example of a function that
1787
01:29:31,400 --> 01:29:32,898
comes in the string library.
1788
01:29:32,898 --> 01:29:35,690
Let's start to take more of these library functions out for a spin.
1789
01:29:35,690 --> 01:29:39,530
So we're not relying only on the built ins that we saw last week.
1790
01:29:39,530 --> 01:29:41,660
Let me switch over to VS Code.
1791
01:29:41,660 --> 01:29:46,040
And create a file called, say string.h.
1792
01:29:46,040 --> 01:29:48,115
to apply this lesson learned, as follows.
1793
01:29:48,115 --> 01:29:54,770
Let me include cs50.h, stdio.h, and this new thing,
1794
01:29:54,770 --> 01:29:57,260
string.h as well, at the top.
1795
01:29:57,260 --> 01:29:59,698
I'm going to do the usual int main(void) here.
1796
01:29:59,698 --> 01:30:02,240
And then in this program suppose, for the sake of discussion,
1797
01:30:02,240 --> 01:30:05,540
that I didn't know about %s for printf or, heck,
1798
01:30:05,540 --> 01:30:09,300
maybe early on there was no %s format code.
1799
01:30:09,300 --> 01:30:12,420
And so there was no easy way to print strings.
1800
01:30:12,420 --> 01:30:15,830
Well, at least if we know that strings are just arrays of characters,
1801
01:30:15,830 --> 01:30:19,820
we could use %c as a workaround, a solution to that,
1802
01:30:19,820 --> 01:30:21,420
sort of, contrived problem.
1803
01:30:21,420 --> 01:30:24,920
So let me ask myself for a string s by using get string here
1804
01:30:24,920 --> 01:30:27,500
and I'll ask the user for some input.
1805
01:30:27,500 --> 01:30:33,260
And then, let me print out say, output , and all I want to do is print back out
1806
01:30:33,260 --> 01:30:34,460
what the user typed.
1807
01:30:34,460 --> 01:30:38,000
Now, the simplest way to do this, of course, is going to be like last week,
1808
01:30:38,000 --> 01:30:40,960
printf %s, and plug in the s, and we're done.
1809
01:30:40,960 --> 01:30:43,730
But again, for the sake of discussion, I forgot about,
1810
01:30:43,730 --> 01:30:47,820
or someone didn't implement %s, so how else could we do this?
1811
01:30:47,820 --> 01:30:51,800
Well, in pseudo code, or in English what's the gist of how we could solve
1812
01:30:51,800 --> 01:30:58,910
this problem, printing out the string s on the screen without using %s?
1813
01:30:58,910 --> 01:31:02,420
How might we go about solving this?
1814
01:31:02,420 --> 01:31:04,147
Just in English, high-level?
1815
01:31:04,147 --> 01:31:05,730
What would your pseudo code look like?
1816
01:31:05,730 --> 01:31:06,230
Yeah?
1817
01:31:06,230 --> 01:31:09,568
AUDIENCE: You could just print each letter.
1818
01:31:09,568 --> 01:31:11,360
DAVID MALAN: OK, so just print each letter.
1819
01:31:11,360 --> 01:31:13,490
And maybe, more precisely, some kind of loop.
1820
01:31:13,490 --> 01:31:17,030
Like, let's iterate over all of the characters in s
1821
01:31:17,030 --> 01:31:18,150
and print one at a time.
1822
01:31:18,150 --> 01:31:19,290
So how can I do that?
1823
01:31:19,290 --> 01:31:24,050
Well, for int i, get 0 is kind of the go-to starting point for most loops,
1824
01:31:24,050 --> 01:31:25,580
i is less than--
1825
01:31:25,580 --> 01:31:27,365
OK, how long do I want to iterate?
1826
01:31:27,365 --> 01:31:29,240
Well, it's going to depend on what I type in,
1827
01:31:29,240 --> 01:31:31,300
but that's why we have strlen now.
1828
01:31:31,300 --> 01:31:36,080
So iterate up to the length of s, and then increment i with plus
1829
01:31:36,080 --> 01:31:37,075
plus on each iteration.
1830
01:31:37,075 --> 01:31:40,670
And then let's just print out %c with no new line,
1831
01:31:40,670 --> 01:31:43,010
because I want everything on the same line,
1832
01:31:43,010 --> 01:31:47,780
whatever the character is at s bracket i.
1833
01:31:47,780 --> 01:31:49,790
And then at the very end, I'll give myself
1834
01:31:49,790 --> 01:31:52,350
that new line, just to move the cursor down to the next line
1835
01:31:52,350 --> 01:31:54,350
so the dollar sign is not in a weird place.
1836
01:31:54,350 --> 01:31:57,230
All right, so let's see if I didn't screw up any of the code,
1837
01:31:57,230 --> 01:32:02,690
make string, Enter, so far so good, string and let me type in something
1838
01:32:02,690 --> 01:32:04,520
like, hi, Enter.
1839
01:32:04,520 --> 01:32:06,020
And I see output of hi, too.
1840
01:32:06,020 --> 01:32:09,680
Let me do it once more with bye, Enter, and that works, too.
1841
01:32:09,680 --> 01:32:12,410
Notice I very deliberately and quickly gave myself
1842
01:32:12,410 --> 01:32:15,260
two spaces here and one space here just because I, literally,
1843
01:32:15,260 --> 01:32:18,620
wanted these things to line up properly, and input is shorter than output.
1844
01:32:18,620 --> 01:32:21,830
But that was just a deliberate formatting detail.
1845
01:32:21,830 --> 01:32:23,520
So this code is correct.
1846
01:32:23,520 --> 01:32:29,240
Which is a claim I've made before, but it's not well-designed.
1847
01:32:29,240 --> 01:32:33,170
It is well-designed in that I'm using someone else's library function,
1848
01:32:33,170 --> 01:32:35,660
like, I've not reinvented a wheel, there's no line 15
1849
01:32:35,660 --> 01:32:38,270
or below, I didn't implement string length myself.
1850
01:32:38,270 --> 01:32:43,640
So I'm at least practicing what I've preached.
1851
01:32:43,640 --> 01:32:48,360
But there's still an imperfection, a suboptimality.
1852
01:32:48,360 --> 01:32:50,910
This one's really subtle though.
1853
01:32:50,910 --> 01:32:54,330
And you have to think about how loops work.
1854
01:32:54,330 --> 01:32:58,640
What am I doing that's not super efficient?
1855
01:32:58,640 --> 01:32:59,870
Yeah, in back?
1856
01:32:59,870 --> 01:33:03,178
AUDIENCE: [INAUDIBLE] over and over again.
1857
01:33:03,178 --> 01:33:04,970
DAVID MALAN: Yeah, this is a little subtle.
1858
01:33:04,970 --> 01:33:07,460
But if you think back to the basic definition of a 4 loop
1859
01:33:07,460 --> 01:33:10,070
and recall when I highlighted things last week, what happens?
1860
01:33:10,070 --> 01:33:12,830
Well, the first thing is that i gets set to 0.
1861
01:33:12,830 --> 01:33:14,310
Then we check the condition.
1862
01:33:14,310 --> 01:33:15,560
How do we check the condition?
1863
01:33:15,560 --> 01:33:18,380
We call strlen on s, we get back an answer
1864
01:33:18,380 --> 01:33:24,810
like 3 if it's a H-I exclamation point and 0 is less than 3, so that's fine,
1865
01:33:24,810 --> 01:33:26,570
and then we print out the character.
1866
01:33:26,570 --> 01:33:29,060
Then we increment i from 0 to 1.
1867
01:33:29,060 --> 01:33:30,468
We recheck the condition.
1868
01:33:30,468 --> 01:33:31,760
How do I recheck the condition?
1869
01:33:31,760 --> 01:33:34,100
I call strlen of s.
1870
01:33:34,100 --> 01:33:36,890
Get back the same answer, 3.
1871
01:33:36,890 --> 01:33:38,720
Compare 3 against 1.
1872
01:33:38,720 --> 01:33:39,800
We're still good.
1873
01:33:39,800 --> 01:33:44,690
So we print out another character. i gets incremented again, i is now 2.
1874
01:33:44,690 --> 01:33:46,035
We check the condition.
1875
01:33:46,035 --> 01:33:46,910
What's the condition?
1876
01:33:46,910 --> 01:33:47,960
Well, what's the string like the best?
1877
01:33:47,960 --> 01:33:48,980
It's still 3.
1878
01:33:48,980 --> 01:33:51,860
2 is still less than 3.
1879
01:33:51,860 --> 01:33:55,430
So I keep asking the same question sort of stupidly
1880
01:33:55,430 --> 01:33:58,220
because the string is, presumably, never changing in length.
1881
01:33:58,220 --> 01:34:00,158
And indeed, every time I check that condition,
1882
01:34:00,158 --> 01:34:01,700
that function is going to get called.
1883
01:34:01,700 --> 01:34:04,380
And every time, the answer for hi is going to be 3.
1884
01:34:04,380 --> 01:34:04,880
3.
1885
01:34:04,880 --> 01:34:06,095
3.
1886
01:34:06,095 --> 01:34:10,850
So it's a marginal suboptimality, but I could do better, right?
1887
01:34:10,850 --> 01:34:15,560
Don't ask multiple times questions that you can remember the answer to.
1888
01:34:15,560 --> 01:34:20,960
So how could I remember the answer to this question and ask it just once?
1889
01:34:20,960 --> 01:34:24,750
How could I remember the answer to this question?
1890
01:34:24,750 --> 01:34:25,250
Let me see.
1891
01:34:25,250 --> 01:34:26,030
Yeah, back there?
1892
01:34:26,030 --> 01:34:27,446
AUDIENCE: Store it in a variable.
1893
01:34:27,446 --> 01:34:29,180
DAVID MALAN: So store it in a variable, right?
1894
01:34:29,180 --> 01:34:32,097
That's been our answer most any time we want to keep something around.
1895
01:34:32,097 --> 01:34:33,120
So how could I do this?
1896
01:34:33,120 --> 01:34:37,880
Well, I could do something like this, int, maybe, length equals strlen of s.
1897
01:34:37,880 --> 01:34:41,200
Then I can just change this function call.
1898
01:34:41,200 --> 01:34:43,160
Let me fix my spelling here.
1899
01:34:43,160 --> 01:34:47,360
Let me fix this to be comparing against length, and this is now OK.
1900
01:34:47,360 --> 01:34:50,240
Because now strlen is only called once on line 9.
1901
01:34:50,240 --> 01:34:52,740
And I'm reusing the value of that variable, a.k.a.
1902
01:34:52,740 --> 01:34:54,240
length, again, and again, and again.
1903
01:34:54,240 --> 01:34:55,282
So that's more efficient.
1904
01:34:55,282 --> 01:34:59,760
Turns out that 4 loops let you declare multiple variables at once,
1905
01:34:59,760 --> 01:35:04,020
so we can do this a little more elegantly all in one line.
1906
01:35:04,020 --> 01:35:06,770
And this is just some syntactic improvement.
1907
01:35:06,770 --> 01:35:11,930
I could actually do something like this, n equals strlen of s,
1908
01:35:11,930 --> 01:35:14,750
and then I could just say n here or I could call it length.
1909
01:35:14,750 --> 01:35:17,667
But heck, while I'm being succinct I'm just going to use n for number.
1910
01:35:17,667 --> 01:35:22,100
So now it's just a marginal change but I've now
1911
01:35:22,100 --> 01:35:26,030
declared two variables inside of my loop, i and n.
1912
01:35:26,030 --> 01:35:29,300
i is set to 0. n extends to the string length of s.
1913
01:35:29,300 --> 01:35:33,380
But now, hereafter, all of my condition checks are just, i less than n,
1914
01:35:33,380 --> 01:35:36,170
i less than n, and n is never changing.
1915
01:35:36,170 --> 01:35:38,008
All right, so a marginal improvement there.
1916
01:35:38,008 --> 01:35:39,800
Now that I've used this new function, let's
1917
01:35:39,800 --> 01:35:41,925
use some other functions that might be of interest.
1918
01:35:41,925 --> 01:35:48,680
Let me write a quick program here that capitalizes the beginning of--
1919
01:35:48,680 --> 01:35:51,810
changes to uppercase some string that the user types in.
1920
01:35:51,810 --> 01:35:55,490
So let me code a file called uppercase.c.
1921
01:35:55,490 --> 01:36:01,520
Up here I'll use my new friends, cs50.h, and standard I/O, and string.h.
1922
01:36:01,520 --> 01:36:07,070
So standard I/O, and string.h So just as before int main(void).
1923
01:36:07,070 --> 01:36:09,620
And then inside of main, what I'm going to do this time,
1924
01:36:09,620 --> 01:36:14,390
is let's ask the user for a string s using get string asking them
1925
01:36:14,390 --> 01:36:15,680
for the before value.
1926
01:36:15,680 --> 01:36:20,130
And then let me print out something like after.
1927
01:36:20,130 --> 01:36:24,410
So that it-- just so I can see what the uppercase version thereof is.
1928
01:36:24,410 --> 01:36:28,610
And then after this, let me do the following, for int, i
1929
01:36:28,610 --> 01:36:32,030
equals 0, oh, let's practice that same lesson,
1930
01:36:32,030 --> 01:36:37,790
so n equals the string length of s, i is less than n, i plus plus.
1931
01:36:37,790 --> 01:36:41,600
So really, nothing new, fundamentally yet.
1932
01:36:41,600 --> 01:36:47,270
How do I now convert characters from lowercase, if they are, to uppercase?
1933
01:36:47,270 --> 01:36:50,000
In other words, if I type in hi, H-I in lowercase,
1934
01:36:50,000 --> 01:36:55,490
I want my program, now, to uppercase everything to capital H, capital I.
1935
01:36:55,490 --> 01:36:58,770
Well how can I go about doing this?
1936
01:36:58,770 --> 01:37:01,010
Well you might recall that there is this--
1937
01:37:01,010 --> 01:37:03,900
you might recall that there is this ASCII chart.
1938
01:37:03,900 --> 01:37:06,855
So let's just consult this real quick on asciichart.com.
1939
01:37:06,855 --> 01:37:11,510
We've looked at this last week notice that a-- capital A is 65,
1940
01:37:11,510 --> 01:37:15,440
capital B is 66, capital C is 67, and heck, here's
1941
01:37:15,440 --> 01:37:19,640
lowercase a, lowercase b, lowercase c, and that's 97, 98, 99.
1942
01:37:19,640 --> 01:37:22,980
And if I actually do some math, there's a distance of 32.
1943
01:37:22,980 --> 01:37:23,480
Right?
1944
01:37:23,480 --> 01:37:25,640
So if I want to go from uppercase to lowercase,
1945
01:37:25,640 --> 01:37:30,788
I can do 65 plus 32 will give me 97 and that actually works out
1946
01:37:30,788 --> 01:37:32,330
across the board for everything else.
1947
01:37:32,330 --> 01:37:36,020
66 plus 32 gets me to 98 or lowercase b.
1948
01:37:36,020 --> 01:37:40,640
Or conversely, if you have a lowercase a, and its value is 97,
1949
01:37:40,640 --> 01:37:46,850
subtract 32 and boom, you have capital A. So there's some arithmetic involved.
1950
01:37:46,850 --> 01:37:49,460
But now that we know that strings are just arrays,
1951
01:37:49,460 --> 01:37:53,330
and we know that characters, which are in those arrays,
1952
01:37:53,330 --> 01:37:56,450
are just binary representations of numbers,
1953
01:37:56,450 --> 01:37:59,297
I think we can manipulate a few of these things as follows.
1954
01:37:59,297 --> 01:38:01,130
Let me go back to my program here, and first
1955
01:38:01,130 --> 01:38:05,360
ask the question, if the current character in the array during this loop
1956
01:38:05,360 --> 01:38:08,930
is lowercase, let's force it to uppercase.
1957
01:38:08,930 --> 01:38:10,250
So how am I going to do that?
1958
01:38:10,250 --> 01:38:16,460
If the character at s bracket i, the current location in the array,
1959
01:38:16,460 --> 01:38:21,320
is greater than or equal to lowercase a, and s bracket
1960
01:38:21,320 --> 01:38:26,660
i is less than or equal to lowercase z, kind of a weird Boolean
1961
01:38:26,660 --> 01:38:31,460
expression but it's completely legitimate, because in this array
1962
01:38:31,460 --> 01:38:34,230
s is a whole bunch of characters that the humans typed in,
1963
01:38:34,230 --> 01:38:37,520
because that's what a string is, greater than or equal to a might
1964
01:38:37,520 --> 01:38:39,680
be a little nonsensical because when have you ever
1965
01:38:39,680 --> 01:38:41,330
compared numbers to letters?
1966
01:38:41,330 --> 01:38:47,568
But we know from week 0 lowercase a is 97, lowercase z is, what is it, 1?
1967
01:38:47,568 --> 01:38:48,485
I don't even remember.
1968
01:38:48,485 --> 01:38:49,065
AUDIENCE: 132.
1969
01:38:49,065 --> 01:38:49,850
DAVID MALAN: What's that?
1970
01:38:49,850 --> 01:38:50,590
AUDIENCE: 132?
1971
01:38:50,590 --> 01:38:52,590
DAVID MALAN: 132, We know.
1972
01:38:52,590 --> 01:38:56,390
And so that would allow us to answer the question is the current letter
1973
01:38:56,390 --> 01:38:57,410
lowercase?
1974
01:38:57,410 --> 01:39:00,530
All right, so let me answer that question.
1975
01:39:00,530 --> 01:39:03,140
If it is, what do I want to print out?
1976
01:39:03,140 --> 01:39:05,870
I don't want to print out the letter itself,
1977
01:39:05,870 --> 01:39:09,290
I want to print out the letter minus 32, right?
1978
01:39:09,290 --> 01:39:13,160
Because if it happens to be a lowercase a, 97, 97 minus 32
1979
01:39:13,160 --> 01:39:15,530
gives me 65, which is uppercase A, and I know that
1980
01:39:15,530 --> 01:39:18,860
just from having stared at that chart in the past.
1981
01:39:18,860 --> 01:39:24,172
Else if the character is not between little a and big A,
1982
01:39:24,172 --> 01:39:25,880
I'm just going to print out the character
1983
01:39:25,880 --> 01:39:28,550
itself by printing s bracket i.
1984
01:39:28,550 --> 01:39:31,580
And at the very end of this, I'm going to print out a new line just
1985
01:39:31,580 --> 01:39:33,480
to move the cursor to the next line.
1986
01:39:33,480 --> 01:39:34,930
So again, it's a little wordy.
1987
01:39:34,930 --> 01:39:39,020
But this loop here, which I borrowed from our code previously,
1988
01:39:39,020 --> 01:39:41,510
just iterates over the string, a.k.a.
1989
01:39:41,510 --> 01:39:44,630
array, character-by-character, through its length.
1990
01:39:44,630 --> 01:39:47,360
This line 11 here is just asking the question
1991
01:39:47,360 --> 01:39:50,870
if that current character, the i-th character of s,
1992
01:39:50,870 --> 01:39:53,900
is greater than or equal to little a and less
1993
01:39:53,900 --> 01:39:59,240
than or equal to little z, that is between 97 and 132, then
1994
01:39:59,240 --> 01:40:04,940
we're going to go ahead and force it to uppercase instead.
1995
01:40:04,940 --> 01:40:09,290
All right, and let me zoom out here for just a second.
1996
01:40:09,290 --> 01:40:14,270
And sorry, I misspoke 122, which is what you might have said.
1997
01:40:14,270 --> 01:40:15,630
There's only 26 letters.
1998
01:40:15,630 --> 01:40:17,270
So 122 is little z.
1999
01:40:17,270 --> 01:40:20,280
Let me go ahead now and compile and run this program.
2000
01:40:20,280 --> 01:40:26,210
So make uppercase, ./uppercase, and let me type in hi in lowercase, Enter.
2001
01:40:26,210 --> 01:40:28,520
And there's the capitalized version, thereof.
2002
01:40:28,520 --> 01:40:30,920
Let me do it again, with my own name in lowercase,
2003
01:40:30,920 --> 01:40:33,100
and now it's capitalized as well.
2004
01:40:33,100 --> 01:40:34,860
Well, what could we do to improve this?
2005
01:40:34,860 --> 01:40:35,360
Well.
2006
01:40:35,360 --> 01:40:35,960
You know what?
2007
01:40:35,960 --> 01:40:37,640
Let's stop reinventing wheels.
2008
01:40:37,640 --> 01:40:39,840
Let's go to the manual pages.
2009
01:40:39,840 --> 01:40:43,490
So let me go here and search for something like, I don't know,
2010
01:40:43,490 --> 01:40:44,540
lowercase.
2011
01:40:44,540 --> 01:40:45,620
And there I go.
2012
01:40:45,620 --> 01:40:48,470
I did some auto complete here, our little search box
2013
01:40:48,470 --> 01:40:50,720
is saying that, OK there's an is-lower function,
2014
01:40:50,720 --> 01:40:52,550
check whether a character is lowercase.
2015
01:40:52,550 --> 01:40:53,640
Well how do I use this?
2016
01:40:53,640 --> 01:40:59,150
Well let me check, is lower, now I see the actual man page for this function.
2017
01:40:59,150 --> 01:41:01,850
Now we see, include ctype.h.
2018
01:41:01,850 --> 01:41:02,902
So that's the protot--
2019
01:41:02,902 --> 01:41:04,610
that's the header file I need to include.
2020
01:41:04,610 --> 01:41:08,570
This is the prototype for is-lower, it apparently takes a char as input
2021
01:41:08,570 --> 01:41:10,330
and returns an int.
2022
01:41:10,330 --> 01:41:11,330
Which is a little weird.
2023
01:41:11,330 --> 01:41:14,400
I feel like is-lower should return true or false.
2024
01:41:14,400 --> 01:41:18,680
So let's scroll down to the description and return value.
2025
01:41:18,680 --> 01:41:20,810
It returns, oh this is interesting.
2026
01:41:20,810 --> 01:41:25,370
And this is a convention in C. This function returns a non-zero int
2027
01:41:25,370 --> 01:41:30,820
if C is a lowercase letter and 0 if C is not a lowercase letter.
2028
01:41:30,820 --> 01:41:33,230
So it returns non-zero.
2029
01:41:33,230 --> 01:41:38,330
So like 1, negative 1, something that's not 0 if C is a lowercase letter,
2030
01:41:38,330 --> 01:41:41,400
and 0 if it is not a lowercase letter.
2031
01:41:41,400 --> 01:41:43,160
So how can we use this building block?
2032
01:41:43,160 --> 01:41:45,230
Let me go back to my code here.
2033
01:41:45,230 --> 01:41:49,610
Let me add this file, include ctype.h.
2034
01:41:49,610 --> 01:41:53,120
And down here, let me get rid of this cryptic expression, which
2035
01:41:53,120 --> 01:41:59,060
was kind of painful to come up with, and just ask this, is-lower s bracket i?
2036
01:42:01,970 --> 01:42:05,390
That should actually work but why?
2037
01:42:05,390 --> 01:42:10,520
Well is-lower, again, returns a non-zero value if the letter is lowercase.
2038
01:42:10,520 --> 01:42:12,150
Well, what does that mean?
2039
01:42:12,150 --> 01:42:13,415
That means it could return 1.
2040
01:42:13,415 --> 01:42:14,540
It could return negative 1.
2041
01:42:14,540 --> 01:42:16,370
It could return 50 or negative 50.
2042
01:42:16,370 --> 01:42:18,650
It's actually not precisely defined, why?
2043
01:42:18,650 --> 01:42:19,700
Just, because.
2044
01:42:19,700 --> 01:42:23,750
This was a common convention to use 0 to represent false and use
2045
01:42:23,750 --> 01:42:26,120
any other value to represent true.
2046
01:42:26,120 --> 01:42:30,140
And so it turns out, that inside of Boolean expressions,
2047
01:42:30,140 --> 01:42:34,755
if you put a value like a function call like this, that returns 0,
2048
01:42:34,755 --> 01:42:36,380
that's going to be equivalent to false.
2049
01:42:36,380 --> 01:42:38,975
It's like the answer being no, it is not lower.
2050
01:42:38,975 --> 01:42:41,990
But you can also, in parentheses, put the name
2051
01:42:41,990 --> 01:42:45,920
of the function and its arguments, and not compare it against anything.
2052
01:42:45,920 --> 01:42:51,230
Because we could do something like this, well if it's not equal to 0, then
2053
01:42:51,230 --> 01:42:52,247
it must be lowercase.
2054
01:42:52,247 --> 01:42:54,830
Because that's the definition, if it returns a non-zero value,
2055
01:42:54,830 --> 01:42:55,760
it's lowercase.
2056
01:42:55,760 --> 01:42:59,210
But a more succinct way to do that is just a bit more like English.
2057
01:42:59,210 --> 01:43:04,110
If it's is lower, then print out the character minus 32.
2058
01:43:04,110 --> 01:43:06,590
So this would be the common way of using one of these
2059
01:43:06,590 --> 01:43:10,025
is- functions to check if the answer is true or false.
2060
01:43:10,025 --> 01:43:12,810
AUDIENCE: [INAUDIBLE]
2061
01:43:12,810 --> 01:43:14,670
DAVID MALAN: OK, well we might be done.
2062
01:43:14,670 --> 01:43:15,170
OK.
2063
01:43:15,170 --> 01:43:16,922
AUDIENCE: [INAUDIBLE]
2064
01:43:16,922 --> 01:43:17,900
DAVID MALAN: No.
2065
01:43:17,900 --> 01:43:19,520
So it's not necessarily 1.
2066
01:43:19,520 --> 01:43:23,180
It would be incorrect to check for 1, or negative 1, or anything else.
2067
01:43:23,180 --> 01:43:25,550
You want to check for the opposite of 0.
2068
01:43:25,550 --> 01:43:26,870
So not equal 0.
2069
01:43:26,870 --> 01:43:31,820
Or more succinctly, like I did by just putting it into parentheses.
2070
01:43:31,820 --> 01:43:34,560
Let me see what happens here.
2071
01:43:34,560 --> 01:43:38,690
So this is great, but some of you might have spotted a better solution
2072
01:43:38,690 --> 01:43:39,680
to this problem.
2073
01:43:39,680 --> 01:43:42,230
A moment ago when we were on the manual pages searching
2074
01:43:42,230 --> 01:43:45,380
for things related to lowercase, what might be another building
2075
01:43:45,380 --> 01:43:46,475
block we can employ here?
2076
01:43:49,160 --> 01:43:50,700
Based on what's on the screen here?
2077
01:43:50,700 --> 01:43:51,200
Yeah?
2078
01:43:51,200 --> 01:43:52,888
AUDIENCE: To-upper.
2079
01:43:52,888 --> 01:43:54,140
DAVID MALAN: So to-upper.
2080
01:43:54,140 --> 01:43:57,098
There's a function that would literally do the uppercasing thing for me
2081
01:43:57,098 --> 01:44:00,032
so I don't have to get into the weeds of negative 32, plus 32.
2082
01:44:00,032 --> 01:44:01,490
I don't have to consult that chart.
2083
01:44:01,490 --> 01:44:05,120
Someone has solved this problem for me in the past.
2084
01:44:05,120 --> 01:44:09,680
And let's see if I can actually get back to it.
2085
01:44:09,680 --> 01:44:10,520
There we go.
2086
01:44:10,520 --> 01:44:12,540
Let me go ahead, now, and use this.
2087
01:44:12,540 --> 01:44:15,230
So instead of doing s bracket i minus 32,
2088
01:44:15,230 --> 01:44:19,880
let's use a function that someone else wrote, and just say to-upper, s bracket
2089
01:44:19,880 --> 01:44:20,420
i.
2090
01:44:20,420 --> 01:44:23,250
And now it's going to do the solution for me.
2091
01:44:23,250 --> 01:44:30,530
So if I rerun make uppercase, and then do, slowly, .uppercase, type in hi,
2092
01:44:30,530 --> 01:44:32,120
now it's working as expected.
2093
01:44:32,120 --> 01:44:35,870
And honestly, if I read the documentation for to-upper
2094
01:44:35,870 --> 01:44:39,170
by going back to its man page, or manual page, what you'll see
2095
01:44:39,170 --> 01:44:44,420
is that it says if it's lowercase, it will return the uppercase version
2096
01:44:44,420 --> 01:44:45,050
thereof.
2097
01:44:45,050 --> 01:44:48,913
If it's not lowercase, it's already uppercase, it's punctuation,
2098
01:44:48,913 --> 01:44:50,705
it will just return the original character.
2099
01:44:50,705 --> 01:44:53,900
Which means, thanks to this function, I can actually
2100
01:44:53,900 --> 01:44:57,650
tighten this up significantly, get rid of all of my conditional
2101
01:44:57,650 --> 01:45:02,030
there, and just print out the to-upper return value,
2102
01:45:02,030 --> 01:45:05,060
and leave it to whoever wrote that function to figure out
2103
01:45:05,060 --> 01:45:09,470
if something's uppercase or lowercase.
2104
01:45:09,470 --> 01:45:13,820
All right, questions on these kinds of tricks?
2105
01:45:13,820 --> 01:45:17,090
Again, it all reduces to week 0 basics, but we're just
2106
01:45:17,090 --> 01:45:18,750
building these abstractions on top.
2107
01:45:18,750 --> 01:45:19,250
Yeah?
2108
01:45:19,250 --> 01:45:21,208
AUDIENCE: I'm wondering if there's any way just
2109
01:45:21,208 --> 01:45:25,110
to import all packages under a certain subdomain instead
2110
01:45:25,110 --> 01:45:27,120
of having to do multiple [INAUDIBLE] statements,
2111
01:45:27,120 --> 01:45:28,412
kind of like a star [INAUDIBLE]
2112
01:45:28,412 --> 01:45:29,340
DAVID MALAN: Yes.
2113
01:45:29,340 --> 01:45:30,180
Unfortunately, no.
2114
01:45:30,180 --> 01:45:33,120
There is no easy way in C to say, give me everything.
2115
01:45:33,120 --> 01:45:35,670
That was for, historically, performance reasons.
2116
01:45:35,670 --> 01:45:38,940
They want you to be explicit as to what you want to include.
2117
01:45:38,940 --> 01:45:41,730
In other languages like Python, Java, one of which
2118
01:45:41,730 --> 01:45:44,513
we'll see later this term, you can say, give me everything.
2119
01:45:44,513 --> 01:45:47,430
But that, actually, tends to be best practice because it can slow down
2120
01:45:47,430 --> 01:45:50,000
execution or compilation of your code.
2121
01:45:50,000 --> 01:45:50,500
Yeah?
2122
01:45:50,500 --> 01:45:52,845
AUDIENCE: Does to-upper accommodate for special characters?
2123
01:45:52,845 --> 01:45:53,340
DAVID MALAN: Ah.
2124
01:45:53,340 --> 01:45:55,980
Does to-upper accommodate special characters like punctuation?
2125
01:45:55,980 --> 01:45:56,480
Yes.
2126
01:45:56,480 --> 01:45:58,440
If I read the documentation more pedantically,
2127
01:45:58,440 --> 01:45:59,710
we would see exactly that.
2128
01:45:59,710 --> 01:46:02,940
It will properly hand me back an exclamation point,
2129
01:46:02,940 --> 01:46:04,600
even if I passed it in.
2130
01:46:04,600 --> 01:46:08,970
So if I do make uppercase here, and let me do ./upper, sorry--
2131
01:46:08,970 --> 01:46:13,620
./uppercase, hi with an exclamation point, it's going to handle that, too,
2132
01:46:13,620 --> 01:46:15,810
pass it through unchanged Yeah?
2133
01:46:15,810 --> 01:46:19,200
AUDIENCE: Do we access to a function that would do all of that
2134
01:46:19,200 --> 01:46:21,590
but just to the screen rather than to [INAUDIBLE]
2135
01:46:21,590 --> 01:46:23,550
DAVID MALAN: Really good question, too.
2136
01:46:23,550 --> 01:46:28,110
No, we do not have access to a function that at least comes with C or comes
2137
01:46:28,110 --> 01:46:31,740
with CS50's library that will just force the whole thing to uppercase.
2138
01:46:31,740 --> 01:46:34,170
In C, that's actually easier said than done.
2139
01:46:34,170 --> 01:46:35,550
In Python, it's trivial.
2140
01:46:35,550 --> 01:46:39,810
So stay tuned for another language that will let us do exactly that.
2141
01:46:39,810 --> 01:46:42,510
All right, so what does this leave us with?
2142
01:46:42,510 --> 01:46:44,520
There's just a-- let's come full circle now,
2143
01:46:44,520 --> 01:46:47,490
to where we began today where we were talking about those command line
2144
01:46:47,490 --> 01:46:48,090
arguments.
2145
01:46:48,090 --> 01:46:51,810
Recall that we talked about rm taking command line argument.
2146
01:46:51,810 --> 01:46:54,470
The file you want to delete, we talked about clang
2147
01:46:54,470 --> 01:46:56,220
taking command line arguments, that again,
2148
01:46:56,220 --> 01:46:58,140
modify the behavior of the program.
2149
01:46:58,140 --> 01:47:01,680
How is it that maybe you and I can start to write programs that
2150
01:47:01,680 --> 01:47:03,840
actually take command line arguments?
2151
01:47:03,840 --> 01:47:07,620
Well here is where I can finally explain why
2152
01:47:07,620 --> 01:47:10,740
we've been typing int main(void) for the past week
2153
01:47:10,740 --> 01:47:14,490
and just asking that you take on faith that it's just the way you do things.
2154
01:47:14,490 --> 01:47:20,820
Well, by default in C, at least the most recent versions thereof,
2155
01:47:20,820 --> 01:47:24,010
there's only two official ways to write main functions.
2156
01:47:24,010 --> 01:47:26,460
You might see other formats online, but they're generally
2157
01:47:26,460 --> 01:47:28,870
not consistent with the current specification.
2158
01:47:28,870 --> 01:47:32,160
This, again, was sort of a boilerplate for the simplest
2159
01:47:32,160 --> 01:47:34,770
function we might write last week, and recall that we've
2160
01:47:34,770 --> 01:47:36,210
been doing this the whole time.
2161
01:47:36,210 --> 01:47:40,990
(Void) What that (void) means, for all of the programs I have written thus far
2162
01:47:40,990 --> 01:47:43,890
and you have written thus far, is that none of our programs
2163
01:47:43,890 --> 01:47:47,040
that we've written take command line arguments.
2164
01:47:47,040 --> 01:47:49,110
That's what the void there means.
2165
01:47:49,110 --> 01:47:53,950
It turns out that main is the way you can specify that your program does,
2166
01:47:53,950 --> 01:47:55,740
in fact, take command line arguments, that
2167
01:47:55,740 --> 01:47:59,760
is words after the command in your terminal window.
2168
01:47:59,760 --> 01:48:02,220
If you want to actually not use get int or get string,
2169
01:48:02,220 --> 01:48:05,970
you want the human to be able to say something, like hello, David
2170
01:48:05,970 --> 01:48:06,840
and hit Enter.
2171
01:48:06,840 --> 01:48:09,940
And just run-- print hello, David on the screen.
2172
01:48:09,940 --> 01:48:14,460
You can use command line arguments, words after the program name
2173
01:48:14,460 --> 01:48:16,750
on your command line.
2174
01:48:16,750 --> 01:48:20,460
So we're going to change this in a moment to be something more verbose,
2175
01:48:20,460 --> 01:48:23,930
but something that's now a bit more familiar syntactically.
2176
01:48:23,930 --> 01:48:28,440
If you change that (void) in main to be this incantation instead,
2177
01:48:28,440 --> 01:48:33,480
int, argc, comma, string, argv, open bracket, close bracket,
2178
01:48:33,480 --> 01:48:36,630
you are now giving yourself access to writing programs
2179
01:48:36,630 --> 01:48:38,910
that take command line arguments.
2180
01:48:38,910 --> 01:48:42,120
Argc, which stands for argument count is going
2181
01:48:42,120 --> 01:48:46,410
to be an integer that stores how many words the human typed at the prompt.
2182
01:48:46,410 --> 01:48:49,050
The C automatically gives that to you.
2183
01:48:49,050 --> 01:48:52,710
String argv stands for argument vector, that's
2184
01:48:52,710 --> 01:48:57,100
going to be an array of all of the words that the human typed at the prompt.
2185
01:48:57,100 --> 01:48:59,130
So with today's building block of an array,
2186
01:48:59,130 --> 01:49:01,980
we have the ability now to let the humans type as many words,
2187
01:49:01,980 --> 01:49:03,900
or as few words, as they want at the prompt.
2188
01:49:03,900 --> 01:49:06,900
C is going to automatically put them in an array called argv,
2189
01:49:06,900 --> 01:49:12,360
and it's going to tell us how many words there are in an int called argc.
2190
01:49:12,360 --> 01:49:16,060
The int, as the return type here, we'll come back to in just a moment.
2191
01:49:16,060 --> 01:49:19,350
Let's use this definition to make, maybe,
2192
01:49:19,350 --> 01:49:20,970
just a couple of simple programs.
2193
01:49:20,970 --> 01:49:23,070
But in problem set 2 will we actually use
2194
01:49:23,070 --> 01:49:26,470
this to control the behavior of your own code.
2195
01:49:26,470 --> 01:49:33,120
Let me code up a file called argv.0 just to keep it aptly named.
2196
01:49:33,120 --> 01:49:35,700
Let me include cs50.h.
2197
01:49:35,700 --> 01:49:37,240
Let me go ahead and include--
2198
01:49:37,240 --> 01:49:37,740
oops.
2199
01:49:37,740 --> 01:49:40,950
That is not the right name of a program, let's start that over.
2200
01:49:40,950 --> 01:49:45,450
Let's go ahead and code up argv.c.
2201
01:49:45,450 --> 01:49:46,800
And here we have--
2202
01:49:46,800 --> 01:49:52,890
include cs50.h, include stdio.h, int, main, not void,
2203
01:49:52,890 --> 01:50:00,025
let's actually say int, argc, string, argv, open bracket, close bracket.
2204
01:50:00,025 --> 01:50:02,400
No numbers in between because you don't know, in advance,
2205
01:50:02,400 --> 01:50:05,310
how many words the human's going to type at their prompt.
2206
01:50:05,310 --> 01:50:06,760
Now let's go ahead and do this.
2207
01:50:06,760 --> 01:50:10,800
Let's write a very simple program that just says, hello, David, hello, Carter,
2208
01:50:10,800 --> 01:50:12,660
whoever the name is that gets typed.
2209
01:50:12,660 --> 01:50:16,260
But not using get string, let's instead have the human just
2210
01:50:16,260 --> 01:50:19,890
type their name at the prompt, just like rm, just like clang, just like make,
2211
01:50:19,890 --> 01:50:22,170
so it's just one and done when you hit Enter.
2212
01:50:22,170 --> 01:50:23,610
No additional prompts.
2213
01:50:23,610 --> 01:50:28,380
Let me go ahead then and do this, printf, quote-unquote, hello,
2214
01:50:28,380 --> 01:50:31,500
comma, and instead of world today, I want to print out
2215
01:50:31,500 --> 01:50:33,370
whatever the human typed in.
2216
01:50:33,370 --> 01:50:38,850
So let's go ahead and do this, argv, bracket 0 for now.
2217
01:50:38,850 --> 01:50:43,080
But I don't think this is quite what I want because, of course,
2218
01:50:43,080 --> 01:50:48,370
that's going to literally print out argv, bracket, 0, bracket.
2219
01:50:48,370 --> 01:50:52,510
I need a placeholder, so let me put %s here and then put that here.
2220
01:50:52,510 --> 01:50:56,520
So if argv is an array, but it's an array of strings,
2221
01:50:56,520 --> 01:51:00,480
then argv bracket 0 is itself a single string.
2222
01:51:00,480 --> 01:51:03,450
And so it can be plugged into that %s placeholder.
2223
01:51:03,450 --> 01:51:05,740
Let me go ahead and save my program.
2224
01:51:05,740 --> 01:51:09,340
And compile argv, so far, so good.
2225
01:51:09,340 --> 01:51:13,170
Let me now type in my name after the name of the program.
2226
01:51:13,170 --> 01:51:13,980
So no get string.
2227
01:51:13,980 --> 01:51:18,280
I'm literally typing an extra word, my own name at the prompt, Enter.
2228
01:51:18,280 --> 01:51:21,290
OK, it's apparently a little buggy in a couple of ways.
2229
01:51:21,290 --> 01:51:24,500
I forgot my /n but that's not a huge deal.
2230
01:51:24,500 --> 01:51:28,960
But apparently, inside of argv is literally everything
2231
01:51:28,960 --> 01:51:31,270
that humans typed in including the name of the program.
2232
01:51:31,270 --> 01:51:36,250
So logically, how do I print out hello, David, or hello so-and-so and not
2233
01:51:36,250 --> 01:51:37,720
the actual name of the program?
2234
01:51:37,720 --> 01:51:38,960
What needs to change here?
2235
01:51:38,960 --> 01:51:39,460
Yeah?
2236
01:51:39,460 --> 01:51:41,050
AUDIENCE: Change the index to 1.
2237
01:51:41,050 --> 01:51:41,800
DAVID MALAN: Yeah.
2238
01:51:41,800 --> 01:51:45,940
So presumably index to 1, if that's the second thing I, or whichever human,
2239
01:51:45,940 --> 01:51:46,940
has typed at the prompt.
2240
01:51:46,940 --> 01:51:51,410
So let's do make argv again, ./argv, Enter.
2241
01:51:51,410 --> 01:51:52,090
Huh.
2242
01:51:52,090 --> 01:51:53,630
Hello, nul.
2243
01:51:53,630 --> 01:51:55,690
So this is another form of nul.
2244
01:51:55,690 --> 01:51:59,320
But this is user error, now, on my part.
2245
01:51:59,320 --> 01:52:01,070
I didn't do exactly what I said I would.
2246
01:52:01,070 --> 01:52:01,570
Yeah?
2247
01:52:01,570 --> 01:52:02,530
AUDIENCE: You forgot the parameter.
2248
01:52:02,530 --> 01:52:04,430
DAVID MALAN: Yeah, I forgot the parameter.
2249
01:52:04,430 --> 01:52:05,700
So that's actually, hm.
2250
01:52:05,700 --> 01:52:07,450
I should probably deal with that, somehow,
2251
01:52:07,450 --> 01:52:09,292
so that people aren't breaking my program
2252
01:52:09,292 --> 01:52:11,000
and printing out random things, like nul.
2253
01:52:11,000 --> 01:52:14,770
But if I do say argv, David, now you see hello, David.
2254
01:52:14,770 --> 01:52:18,070
I can get a little curious, like what's at location 2?
2255
01:52:18,070 --> 01:52:23,410
Well we can see, make argv, bracket, ./argv, David, Enter.
2256
01:52:23,410 --> 01:52:24,910
All right, so just nothing is there.
2257
01:52:24,910 --> 01:52:28,202
But it turns out, in a couple of weeks, we'll start really poking around memory
2258
01:52:28,202 --> 01:52:30,310
and see if we can't crash programs deliberately
2259
01:52:30,310 --> 01:52:32,800
because nothing is stopping me from saying,
2260
01:52:32,800 --> 01:52:36,470
oh what's at location 2 million, for instance?
2261
01:52:36,470 --> 01:52:38,350
We could really start to get curious.
2262
01:52:38,350 --> 01:52:40,420
But for now, we'll do the right thing.
2263
01:52:40,420 --> 01:52:44,360
But let's now make sure the human has typed in the right number of words.
2264
01:52:44,360 --> 01:52:50,920
So let's say this, if argc equals 2, that is the name of the program
2265
01:52:50,920 --> 01:52:54,760
and one more word after that, go ahead and trust that in argv 1,
2266
01:52:54,760 --> 01:52:56,980
as you proposed, is the person's name.
2267
01:52:56,980 --> 01:53:01,810
Else, let's go ahead and default here to something simple and basic,
2268
01:53:01,810 --> 01:53:05,860
like, well, if we don't get a name from the user, just say hello, world,
2269
01:53:05,860 --> 01:53:07,300
like always.
2270
01:53:07,300 --> 01:53:10,045
So now we're programming defensively.
2271
01:53:10,045 --> 01:53:13,090
This time the human, even if they screw up, they don't give us a name
2272
01:53:13,090 --> 01:53:15,965
or they give us too many names, we're just going to say hello, world,
2273
01:53:15,965 --> 01:53:17,890
because I now have some error handling here.
2274
01:53:17,890 --> 01:53:22,030
Because, again, argc is argument count, the number of words, total,
2275
01:53:22,030 --> 01:53:23,990
typed at the command line.
2276
01:53:23,990 --> 01:53:26,740
So make, argv, ./argv.
2277
01:53:26,740 --> 01:53:28,540
Let me make the same mistake as before.
2278
01:53:28,540 --> 01:53:29,050
OK.
2279
01:53:29,050 --> 01:53:30,910
I don't get this weird nul behavior.
2280
01:53:30,910 --> 01:53:32,350
I get something well-defined.
2281
01:53:32,350 --> 01:53:33,610
I could now do David.
2282
01:53:33,610 --> 01:53:36,850
I could do David Malan, but that's not currently supported.
2283
01:53:36,850 --> 01:53:41,290
I would need to alter my logic to support more than just two words
2284
01:53:41,290 --> 01:53:42,345
after the prompt.
2285
01:53:42,345 --> 01:53:43,770
So what's the point of this?
2286
01:53:43,770 --> 01:53:45,520
At the moment, it's just a simple exercise
2287
01:53:45,520 --> 01:53:50,702
to actually give myself a way of taking user input when they run the program.
2288
01:53:50,702 --> 01:53:52,660
Because, consider, it's just more convenient in
2289
01:53:52,660 --> 01:53:54,670
this new, command-line-interface world.
2290
01:53:54,670 --> 01:53:58,857
If you had to use get string every time you compile your code,
2291
01:53:58,857 --> 01:54:00,190
it'd be kind of annoying, right?
2292
01:54:00,190 --> 01:54:03,940
You type make, then you might get a prompt, what would you like to make?
2293
01:54:03,940 --> 01:54:07,690
Then you type in hello, or cash, or something else, then you hit Enter,
2294
01:54:07,690 --> 01:54:09,330
it just really slows the process.
2295
01:54:09,330 --> 01:54:11,440
But in this command-line-interface world,
2296
01:54:11,440 --> 01:54:14,770
if you support command line arguments, then you can use these little tricks.
2297
01:54:14,770 --> 01:54:18,170
Like, scrolling up and down in your history with your arrow keys.
2298
01:54:18,170 --> 01:54:22,430
You can just type commands more quickly because you can do it all at once.
2299
01:54:22,430 --> 01:54:25,000
And you don't have to keep prompting the user, more
2300
01:54:25,000 --> 01:54:27,760
pedantically, for more and more info.
2301
01:54:27,760 --> 01:54:30,280
So any questions then on command line arguments?
2302
01:54:30,280 --> 01:54:34,000
Which, finally, reveals why we had (void) initially,
2303
01:54:34,000 --> 01:54:36,610
but what more we can now put in main.
2304
01:54:36,610 --> 01:54:39,070
That's how you take command line arguments.
2305
01:54:39,070 --> 01:54:40,500
Yeah?
2306
01:54:40,500 --> 01:54:42,610
AUDIENCE: If you were to put--
2307
01:54:42,610 --> 01:54:47,320
if you were to use argv, and you were to put integers inside of it,
2308
01:54:47,320 --> 01:54:49,923
would it still give you, like, a string?
2309
01:54:49,923 --> 01:54:51,506
Would that still be considered string?
2310
01:54:51,506 --> 01:54:52,923
Or would you consider [INAUDIBLE]?
2311
01:54:52,923 --> 01:54:53,760
DAVID MALAN: Yes.
2312
01:54:53,760 --> 01:54:56,550
If you were to type at the command line something
2313
01:54:56,550 --> 01:55:00,660
like, not a word, but something like the number 42,
2314
01:55:00,660 --> 01:55:03,450
that would actually be treated as a string.
2315
01:55:03,450 --> 01:55:04,290
Why?
2316
01:55:04,290 --> 01:55:06,220
Because again, context matters.
2317
01:55:06,220 --> 01:55:08,940
So if your program is currently manipulating memory
2318
01:55:08,940 --> 01:55:12,510
as though its characters or strings, whatever those patterns of 0s and 1s
2319
01:55:12,510 --> 01:55:16,800
are, they will be interpreted as ASCII text, or Unicode text.
2320
01:55:16,800 --> 01:55:20,640
If we therefore go to the chart here, that might make you wonder, well,
2321
01:55:20,640 --> 01:55:24,510
then how do you distinguish numbers from letters in the context of something
2322
01:55:24,510 --> 01:55:25,890
like chars and strings?
2323
01:55:25,890 --> 01:55:34,380
Well, notice 65 is a, 97 is a, but also 49 is 1, and 50 is 2.
2324
01:55:34,380 --> 01:55:37,500
So the designers of ASCII, and then later Unicode,
2325
01:55:37,500 --> 01:55:40,680
realized well wait a minute, if we want to support programs
2326
01:55:40,680 --> 01:55:43,440
that let you type things that look like numbers,
2327
01:55:43,440 --> 01:55:46,350
even though they're not technically ints or floats,
2328
01:55:46,350 --> 01:55:50,620
we need a way in ASCII and Unicode to represent even numbers.
2329
01:55:50,620 --> 01:55:51,870
So here are your numbers.
2330
01:55:51,870 --> 01:55:55,210
And it's a little silly that we have numbers representing other numbers.
2331
01:55:55,210 --> 01:55:57,863
But again, if you're in the world of letters and characters,
2332
01:55:57,863 --> 01:56:00,030
you've got to come up with a mapping for everything.
2333
01:56:00,030 --> 01:56:01,790
And notice here, here's the dot.
2334
01:56:01,790 --> 01:56:06,390
Even if you were to represent 1.23 as a string, or as characters,
2335
01:56:06,390 --> 01:56:10,840
even the dot now is going to be represented as an ASCII character.
2336
01:56:10,840 --> 01:56:12,930
So again, context here matters.
2337
01:56:12,930 --> 01:56:17,370
All right, one final example to tease apart what this int is
2338
01:56:17,370 --> 01:56:19,840
and what it's been doing here for so long.
2339
01:56:19,840 --> 01:56:24,780
So I'm going to add one bit of logic to a new file
2340
01:56:24,780 --> 01:56:27,750
that I'm going to call exit.c.
2341
01:56:27,750 --> 01:56:29,130
So an exit.c.
2342
01:56:29,130 --> 01:56:32,880
We're going to introduce something that are generally known as exit status.
2343
01:56:32,880 --> 01:56:34,980
It turns out this is not a feature we've used yet,
2344
01:56:34,980 --> 01:56:37,240
but it's just useful to know about.
2345
01:56:37,240 --> 01:56:40,350
Especially when automating tests of your own code.
2346
01:56:40,350 --> 01:56:44,115
When it comes to figuring out if a program succeeded or failed.
2347
01:56:44,115 --> 01:56:48,870
It turns out that main has one more feature we haven't leveraged.
2348
01:56:48,870 --> 01:56:54,330
An ability to signal to the user whether something was successful or not.
2349
01:56:54,330 --> 01:56:57,760
And that's by way of main's return value.
2350
01:56:57,760 --> 01:57:02,060
So I'm going modify this program as follows, like this.
2351
01:57:02,060 --> 01:57:04,920
Suppose I want to write a similar program that
2352
01:57:04,920 --> 01:57:07,900
requires that the user type a word at the prompt.
2353
01:57:07,900 --> 01:57:12,450
So that argc has to be 2 for whatever design purpose.
2354
01:57:12,450 --> 01:57:18,990
If argc does not equal 2, I want to quit out of my program prematurely.
2355
01:57:18,990 --> 01:57:22,590
I want to insist that the user operate the program correctly.
2356
01:57:22,590 --> 01:57:28,800
So I might give them an error message like, missing command line argument /n.
2357
01:57:28,800 --> 01:57:31,180
But now I want to quit out of the program.
2358
01:57:31,180 --> 01:57:32,310
Now how can I do that?
2359
01:57:32,310 --> 01:57:37,260
The right way, quote-unquote, to do that is to return a value from main.
2360
01:57:37,260 --> 01:57:40,590
Now it's a little weird because no one called main yet,
2361
01:57:40,590 --> 01:57:42,990
right, main just gets called automatically,
2362
01:57:42,990 --> 01:57:45,300
but the convention is anytime something goes
2363
01:57:45,300 --> 01:57:50,100
wrong in a program you should return a non-zero value from main.
2364
01:57:50,100 --> 01:57:51,780
1 is fine as a go-to.
2365
01:57:51,780 --> 01:57:55,470
We don't need to get into the weeds of having many different exit statuses,
2366
01:57:55,470 --> 01:57:56,220
so to speak.
2367
01:57:56,220 --> 01:58:01,770
But if you return 1, that is a clue to the system, the Mac, the PC, the cloud
2368
01:58:01,770 --> 01:58:03,430
device that's something went wrong.
2369
01:58:03,430 --> 01:58:03,930
Why?
2370
01:58:03,930 --> 01:58:05,670
Because 1 is not 0.
2371
01:58:05,670 --> 01:58:11,460
If everything works fine, like, let's go ahead and print out hello comma %s like
2372
01:58:11,460 --> 01:58:16,620
before, quote-unquote argv bracket 1.
2373
01:58:16,620 --> 01:58:19,080
So this is just a version of the program without an else.
2374
01:58:19,080 --> 01:58:21,390
So this is the same as doing, essentially,
2375
01:58:21,390 --> 01:58:23,580
an else here like I did earlier.
2376
01:58:23,580 --> 01:58:26,740
I want to signal to the computer that all is well.
2377
01:58:26,740 --> 01:58:28,290
And so I return 0.
2378
01:58:28,290 --> 01:58:31,650
But strictly speaking, if I'm already returning here,
2379
01:58:31,650 --> 01:58:34,560
I don't technically need, if I really want to be nit picky,
2380
01:58:34,560 --> 01:58:36,870
I don't technically need the else because the only way
2381
01:58:36,870 --> 01:58:41,486
I'm going to get to line 11 is if I didn't already return.
2382
01:58:41,486 --> 01:58:43,180
So what's going on here?
2383
01:58:43,180 --> 01:58:46,530
The only new thing here logically, is that for the first time ever,
2384
01:58:46,530 --> 01:58:48,810
I'm returning a value from main.
2385
01:58:48,810 --> 01:58:50,730
That's something I could always have done
2386
01:58:50,730 --> 01:58:55,290
because main has always been defined by us as taking an int as a return value.
2387
01:58:55,290 --> 01:58:59,880
By default, main automatically, sort of secretly, returns 0 for you.
2388
01:58:59,880 --> 01:59:02,850
If you've never once use the return keyword, which you probably
2389
01:59:02,850 --> 01:59:05,370
haven't in main, it just automatically returns 0
2390
01:59:05,370 --> 01:59:07,295
and the system assumes that all went well.
2391
01:59:07,295 --> 01:59:09,390
But now that we're starting to get a little more
2392
01:59:09,390 --> 01:59:11,520
sophisticated with our code, and you know,
2393
01:59:11,520 --> 01:59:15,480
the programmer, something went wrong, you can abort programs early.
2394
01:59:15,480 --> 01:59:20,610
You can exit out of them by returning some other value, besides 0, from main.
2395
01:59:20,610 --> 01:59:23,040
And this is fortuitous that it's an int, right?
2396
01:59:23,040 --> 01:59:25,110
0 means everything worked.
2397
01:59:25,110 --> 01:59:29,250
Unfortunately, in programming, there are seemingly, an infinite number of things
2398
01:59:29,250 --> 01:59:30,240
that can go wrong.
2399
01:59:30,240 --> 01:59:33,210
And int gives you 4 billion possible codes
2400
01:59:33,210 --> 01:59:36,455
that you can use, a.k.a. exit statuses, to signify errors.
2401
01:59:36,455 --> 01:59:39,930
So if you've ever on your Mac or PC gotten some weird pop up
2402
01:59:39,930 --> 01:59:43,320
that an error happened, sometimes, there's a cryptic number in it.
2403
01:59:43,320 --> 01:59:45,420
Maybe it's positive, maybe it's negative.
2404
01:59:45,420 --> 01:59:50,170
It might say error code 123, or negative 49, or something like that.
2405
01:59:50,170 --> 01:59:54,310
What you're generally seeing, are these exit statuses, these return
2406
01:59:54,310 --> 01:59:57,610
values from main in a program that someone at Microsoft,
2407
01:59:57,610 --> 02:00:01,120
or Apple, or somewhere else wrote, something went wrong,
2408
02:00:01,120 --> 02:00:05,980
they are unnecessarily showing you, the user what the error code is.
2409
02:00:05,980 --> 02:00:09,100
If only, so that when you call customer support or submit a ticket,
2410
02:00:09,100 --> 02:00:12,190
you can tell them what exit status you encountered,
2411
02:00:12,190 --> 02:00:15,070
what error code you encounter.
2412
02:00:15,070 --> 02:00:19,390
All right, any questions on exit statuses,
2413
02:00:19,390 --> 02:00:24,580
which is the last of our new building blocks, for now?
2414
02:00:24,580 --> 02:00:25,540
Any questions at all?
2415
02:00:25,540 --> 02:00:26,040
Yeah?
2416
02:00:26,040 --> 02:00:33,540
AUDIENCE: [INAUDIBLE] You know how if you have get string or get int,
2417
02:00:33,540 --> 02:00:35,418
if you want to make [INAUDIBLE]
2418
02:00:35,418 --> 02:00:36,085
DAVID MALAN: No.
2419
02:00:36,085 --> 02:00:39,265
The question is can you do things again and again
2420
02:00:39,265 --> 02:00:41,890
at the command line like you could with get string and get int.
2421
02:00:41,890 --> 02:00:43,870
Which, by default, recall are automatically
2422
02:00:43,870 --> 02:00:46,420
designed to keep prompting the user in their own loop
2423
02:00:46,420 --> 02:00:49,960
until they give you an int, or a float, or the like with command line
2424
02:00:49,960 --> 02:00:50,740
arguments, no.
2425
02:00:50,740 --> 02:00:52,210
You're going to get an error message but then
2426
02:00:52,210 --> 02:00:54,002
you're going to be returned to your prompt.
2427
02:00:54,002 --> 02:00:57,387
And it's up to you to type it correctly the next time.
2428
02:00:57,387 --> 02:00:57,970
Good question.
2429
02:00:57,970 --> 02:00:58,470
Yeah?
2430
02:00:58,470 --> 02:01:03,435
AUDIENCE: [INAUDIBLE] automatically for you.
2431
02:01:03,435 --> 02:01:05,310
DAVID MALAN: If you do not return a value
2432
02:01:05,310 --> 02:01:08,730
explicitly main will automatically return 0 for you,
2433
02:01:08,730 --> 02:01:12,640
that is the way C simply works so it's not strictly necessary.
2434
02:01:12,640 --> 02:01:15,510
But now that we're starting to return values explicitly,
2435
02:01:15,510 --> 02:01:18,090
if something goes wrong, it would be good practice
2436
02:01:18,090 --> 02:01:21,480
to also start returning a value for main when something goes right
2437
02:01:21,480 --> 02:01:23,775
and there are no errors.
2438
02:01:23,775 --> 02:01:27,810
So let's now get out of the weeds and contextualize
2439
02:01:27,810 --> 02:01:31,200
this for some actual problems that we'll be solving in the coming days
2440
02:01:31,200 --> 02:01:33,130
by way of problems set 2 and beyond.
2441
02:01:33,130 --> 02:01:35,740
So here for instance--
2442
02:01:35,740 --> 02:01:39,990
So here for instance, is a problem that you might think back
2443
02:01:39,990 --> 02:01:43,980
to when you were a kid the readability of some text or some book,
2444
02:01:43,980 --> 02:01:46,230
the grade level in which some book is written.
2445
02:01:46,230 --> 02:01:49,740
If you're a young student, you might read at first-grade level
2446
02:01:49,740 --> 02:01:51,240
or third-grade level in the US.
2447
02:01:51,240 --> 02:01:53,032
Or, if you're in college presumably, you're
2448
02:01:53,032 --> 02:01:54,945
reading at a university-level of text.
2449
02:01:54,945 --> 02:01:58,073
But what does it mean for text, like in a book,
2450
02:01:58,073 --> 02:02:00,240
or in an essay, or something like that to correspond
2451
02:02:00,240 --> 02:02:01,590
to some kind of grade level?
2452
02:02:01,590 --> 02:02:04,950
Well, here's a quote-- a title of a childhood book.
2453
02:02:04,950 --> 02:02:07,590
One Fish, Two Fish, Red Fish, Blue Fish.
2454
02:02:07,590 --> 02:02:10,840
What might the grade level be for a book that has words like this?
2455
02:02:10,840 --> 02:02:13,590
Maybe, when you were a kid or if you have a siblings still reading
2456
02:02:13,590 --> 02:02:16,260
these things, what might the grade level of this thing be?
2457
02:02:18,800 --> 02:02:19,590
Any guesses?
2458
02:02:19,590 --> 02:02:20,090
Yeah?
2459
02:02:20,090 --> 02:02:21,257
AUDIENCE: Before grade 1.
2460
02:02:21,257 --> 02:02:22,340
DAVID MALAN: Sorry, again?
2461
02:02:22,340 --> 02:02:23,382
AUDIENCE: Before grade 1.
2462
02:02:23,382 --> 02:02:25,650
DAVID MALAN: Before grade 1 is, in fact, correct.
2463
02:02:25,650 --> 02:02:27,290
So that's for really young kids?
2464
02:02:27,290 --> 02:02:28,230
Why is that?
2465
02:02:28,230 --> 02:02:29,180
Well, let's consider.
2466
02:02:29,180 --> 02:02:32,210
These are pretty simple phrases, right?
2467
02:02:32,210 --> 02:02:33,500
One fish, two fish, red--
2468
02:02:33,500 --> 02:02:35,960
I mean there's not even verbs in these sentences,
2469
02:02:35,960 --> 02:02:40,040
they're just nouns and adjectives, and very short sentences.
2470
02:02:40,040 --> 02:02:42,200
And so that might be a heuristic we could use.
2471
02:02:42,200 --> 02:02:44,810
When analyzing text, well if the words are kind of short,
2472
02:02:44,810 --> 02:02:47,240
the sentences are kind of short, everything's very simple,
2473
02:02:47,240 --> 02:02:50,250
that's probably a very young, or early, grade level.
2474
02:02:50,250 --> 02:02:53,665
And so by one formulation, it might indeed be even before grade 1,
2475
02:02:53,665 --> 02:02:54,665
for someone quite young.
2476
02:02:54,665 --> 02:02:55,670
How about this?
2477
02:02:55,670 --> 02:02:58,022
Mr and Mrs. Dursley, of number 4, Privet Drive,
2478
02:02:58,022 --> 02:03:00,980
were proud to say that they were perfectly normal, thank you very much.
2479
02:03:00,980 --> 02:03:02,960
They were the last people you would expect
2480
02:03:02,960 --> 02:03:05,120
to be involved in anything strange or mysterious
2481
02:03:05,120 --> 02:03:07,850
because they just didn't hold with such nonsense.
2482
02:03:07,850 --> 02:03:08,782
And, onward.
2483
02:03:08,782 --> 02:03:10,490
All right, what grade level is this book?
2484
02:03:10,490 --> 02:03:11,778
AUDIENCE: Third.
2485
02:03:11,778 --> 02:03:13,070
DAVID MALAN: OK, I heard third.
2486
02:03:13,070 --> 02:03:14,585
AUDIENCE: What?
2487
02:03:14,585 --> 02:03:15,980
DAVID MALAN: Seventh, fifth.
2488
02:03:15,980 --> 02:03:17,150
OK, all over the place.
2489
02:03:17,150 --> 02:03:20,540
But grade 7, according to one particular measure.
2490
02:03:20,540 --> 02:03:24,802
And whether or not we can debate exactly what age you were when you read this,
2491
02:03:24,802 --> 02:03:27,260
and maybe you're feeling ahead of your time, or behind now.
2492
02:03:27,260 --> 02:03:31,470
But here, we have a snippet of text.
2493
02:03:31,470 --> 02:03:36,560
What makes this text assume an older audience, a more mature audience,
2494
02:03:36,560 --> 02:03:39,690
a higher grade level, would you think?
2495
02:03:39,690 --> 02:03:40,190
Yeah?
2496
02:03:40,190 --> 02:03:42,415
AUDIENCE: [INAUDIBLE]
2497
02:03:42,415 --> 02:03:45,110
DAVID MALAN: Yeah, it's longer, different types of words,
2498
02:03:45,110 --> 02:03:47,513
there's commas now in phrases, and so forth.
2499
02:03:47,513 --> 02:03:49,680
So there's just some kind of sophistication to this.
2500
02:03:49,680 --> 02:03:52,280
So it turns out for the upcoming problem set,
2501
02:03:52,280 --> 02:03:55,370
among the things you'll do is take, as input, texts like this
2502
02:03:55,370 --> 02:03:56,510
and analyze them.
2503
02:03:56,510 --> 02:03:59,072
Considering , well, how many words are in the text?
2504
02:03:59,072 --> 02:04:00,530
How many sentences are in the text?
2505
02:04:00,530 --> 02:04:02,375
How many letters are in the text?
2506
02:04:02,375 --> 02:04:06,170
And use those according to a well-defined formula to prescribe what,
2507
02:04:06,170 --> 02:04:09,680
exactly, the grade level of some actual text-- there's the third--
2508
02:04:09,680 --> 02:04:10,582
might actually be.
2509
02:04:10,582 --> 02:04:12,790
Well what else are we going to do in the coming days?
2510
02:04:12,790 --> 02:04:15,410
Well I've alluded to this notion of cryptography in the past.
2511
02:04:15,410 --> 02:04:18,350
This notion of scrambling information in such a way
2512
02:04:18,350 --> 02:04:21,422
that you can hide the contents of a message
2513
02:04:21,422 --> 02:04:23,630
from someone who might otherwise intercept it, right?
2514
02:04:23,630 --> 02:04:26,130
The earliest form of this might also be when you're younger,
2515
02:04:26,130 --> 02:04:29,390
and you're in class, and you're passing a note from one person to another,
2516
02:04:29,390 --> 02:04:30,650
from yourself to someone else.
2517
02:04:30,650 --> 02:04:32,960
You don't want to necessarily write a note in English,
2518
02:04:32,960 --> 02:04:35,120
or some other written, language you might want
2519
02:04:35,120 --> 02:04:37,430
to scramble it somehow, or encrypt it.
2520
02:04:37,430 --> 02:04:40,460
Maybe you change the As to a B, and the Bs to a C.
2521
02:04:40,460 --> 02:04:42,770
So that if the teacher snaps it up and intercepts it,
2522
02:04:42,770 --> 02:04:45,200
they can't actually understand what it is you've
2523
02:04:45,200 --> 02:04:47,160
written because it's encrypted.
2524
02:04:47,160 --> 02:04:49,610
So long as your friend, the recipient of this note,
2525
02:04:49,610 --> 02:04:51,890
knows how you manipulated it.
2526
02:04:51,890 --> 02:04:55,640
How you added or subtracted letters to each other,
2527
02:04:55,640 --> 02:04:58,850
they can decrypt it, which is to reverse that process.
2528
02:04:58,850 --> 02:05:02,070
So formally, in the world of cryptography and computer science,
2529
02:05:02,070 --> 02:05:04,130
this is another problem to solve.
2530
02:05:04,130 --> 02:05:07,173
Your input, though, when you have a message you want to send securely,
2531
02:05:07,173 --> 02:05:08,840
is what's generally known as plain text.
2532
02:05:08,840 --> 02:05:12,980
There's some algorithm that's going to then encipher, or encrypt
2533
02:05:12,980 --> 02:05:16,100
that information, into what's called ciphertext, which
2534
02:05:16,100 --> 02:05:18,650
is the scrambled version that theoretically can get safely
2535
02:05:18,650 --> 02:05:21,110
intercepted and your message has not been spoiled,
2536
02:05:21,110 --> 02:05:24,620
unless that intercept actually knows what algorithm
2537
02:05:24,620 --> 02:05:27,150
you used inside of this process.
2538
02:05:27,150 --> 02:05:29,720
So that would be generally known as a cipher.
2539
02:05:29,720 --> 02:05:33,080
The ciphers typically take, though, not one input, but two.
2540
02:05:33,080 --> 02:05:37,685
If, for instance, your cipher is as simple as A becomes B,
2541
02:05:37,685 --> 02:05:41,420
B becomes C, C becomes D, dot dot dot, Z becomes A,
2542
02:05:41,420 --> 02:05:45,140
you're essentially adding one to every letter and encrypting it.
2543
02:05:45,140 --> 02:05:47,750
Now that would be, what we call, the key.
2544
02:05:47,750 --> 02:05:51,470
You and the recipient both have to agree, presumably, before class,
2545
02:05:51,470 --> 02:05:55,280
in advance, what number you're going to use that day to rotate,
2546
02:05:55,280 --> 02:05:56,960
or change all of these letters by.
2547
02:05:56,960 --> 02:06:00,410
Because when you add 1, they upon receiving your ciphertext
2548
02:06:00,410 --> 02:06:03,090
have to subtract 1 to get back the answer.
2549
02:06:03,090 --> 02:06:07,730
For instance, if the input, plaintext, is hi, as before,
2550
02:06:07,730 --> 02:06:13,010
and the key is 1, the ciphertext using this simple rotational algorithm,
2551
02:06:13,010 --> 02:06:17,720
otherwise known as the Caesar cipher, might be ij exclamation point.
2552
02:06:17,720 --> 02:06:21,408
So it's similar, but it's at least scrambled at first glance.
2553
02:06:21,408 --> 02:06:23,450
And unless the teacher really cares to figure out
2554
02:06:23,450 --> 02:06:26,420
what algorithm are they using today, or what key are they using today,
2555
02:06:26,420 --> 02:06:29,700
it's probably sufficiently secure for your purposes.
2556
02:06:29,700 --> 02:06:31,160
How do you reverse the process?
2557
02:06:31,160 --> 02:06:34,190
Well, your friend gets this and reverses it by negative 1.
2558
02:06:34,190 --> 02:06:38,630
So I becomes H, J becomes I, and things like punctuation
2559
02:06:38,630 --> 02:06:41,060
remain untouched at least in this scheme.
2560
02:06:41,060 --> 02:06:43,580
So let's consider one final example here.
2561
02:06:43,580 --> 02:06:51,080
If the input to the algorithm is Uijtxbtdt50, and the key
2562
02:06:51,080 --> 02:06:53,090
this time is negative 1.
2563
02:06:53,090 --> 02:06:59,510
Such that now B should become A, and C should become B, and A should become A.
2564
02:06:59,510 --> 02:07:01,130
So we're going in the other direction.
2565
02:07:01,130 --> 02:07:03,030
How might we analyze this?
2566
02:07:03,030 --> 02:07:06,000
Well if we spread all the letters out, and we start from left to right,
2567
02:07:06,000 --> 02:07:11,780
and we start subtracting one letter, U becomes T, I becomes H, J becomes I,
2568
02:07:11,780 --> 02:07:17,220
T becomes S, X becomes W, A, was, D, T--
2569
02:07:17,220 --> 02:07:18,270
this was CS50.
2570
02:07:18,270 --> 02:07:19,470
We'll see you next time.
2571
02:07:19,470 --> 02:07:21,320
[APPLAUSE]
2572
02:07:20,000 --> 02:07:56,000
[MUSIC PLAYING]
216690
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.