Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:00:02,982
1
00:00:02,982 --> 00:00:06,461
[MUSIC PLAYING]
2
00:00:06,461 --> 00:01:12,600
3
00:01:12,600 --> 00:01:13,590
DAVID MALAN: All right.
4
00:01:13,590 --> 00:01:17,130
This is CS50, and this is week 2 wherein we're
5
00:01:17,130 --> 00:01:20,610
going to take a look at a lower level at how things work,
6
00:01:20,610 --> 00:01:24,120
and indeed, among the goals of the course is this bottom-up understanding
7
00:01:24,120 --> 00:01:26,670
so that in a couple of weeks' time, even a few years' time,
8
00:01:26,670 --> 00:01:29,920
when you encounter some new technology, you'll be able to think back hopefully
9
00:01:29,920 --> 00:01:33,180
on some of this week's and this is basic building blocks and primitives
10
00:01:33,180 --> 00:01:36,060
and really just deduce how tomorrow's technologies work.
11
00:01:36,060 --> 00:01:37,685
But along the way, it's going to seem--
12
00:01:37,685 --> 00:01:40,727
it's going to be a little hard, perhaps, to see the forest for the trees,
13
00:01:40,727 --> 00:01:41,380
so to speak.
14
00:01:41,380 --> 00:01:44,783
And so the goal at the end of the day still is going to be problem-solving.
15
00:01:44,783 --> 00:01:47,700
And so we thought we'd begin today with a look at some of the problems
16
00:01:47,700 --> 00:01:50,405
we'll talk about or solve this coming week,
17
00:01:50,405 --> 00:01:53,280
and for that, we have some brave volunteers who have already come up.
18
00:01:53,280 --> 00:01:58,320
If we could turn on some dramatic lighting and meet today's volunteers.
19
00:01:58,320 --> 00:02:00,430
So on my left here, we have--
20
00:02:00,430 --> 00:02:00,930
ALEX: Hi.
21
00:02:00,930 --> 00:02:01,960
My name is Alex.
22
00:02:01,960 --> 00:02:05,340
I'm a first-year at the college and I'm from Chapel Hill, North Carolina.
23
00:02:05,340 --> 00:02:07,080
DAVID MALAN: Welcome to Alex.
24
00:02:07,080 --> 00:02:09,180
And to Alex's right.
25
00:02:09,180 --> 00:02:10,050
SARAH: I'm Sarah.
26
00:02:10,050 --> 00:02:13,230
I'm from Toronto, Canada, and I'm also a first-year student at the college.
27
00:02:13,230 --> 00:02:14,188
DAVID MALAN: Wonderful.
28
00:02:14,188 --> 00:02:15,869
Well, welcome to both Alex and Sarah.
29
00:02:15,869 --> 00:02:18,577
So one of the problems you'll perhaps solve this week for problem
30
00:02:18,577 --> 00:02:22,442
set 2 is to analyze the reading level of a body of text,
31
00:02:22,442 --> 00:02:25,650
whether someone reads at a first grade level, second grade level, third grade
32
00:02:25,650 --> 00:02:28,570
level, all the way up to 12 or 13 or beyond.
33
00:02:28,570 --> 00:02:32,250
What you perhaps never quite thought about, certainly in terms of code,
34
00:02:32,250 --> 00:02:35,310
like how you would analyze some text, some book and figure
35
00:02:35,310 --> 00:02:36,750
out what reading level is it at.
36
00:02:36,750 --> 00:02:40,330
And yet, surely our teachers growing up knew or had an intuitive sense of this.
37
00:02:40,330 --> 00:02:42,450
So let's consider some sample text.
38
00:02:42,450 --> 00:02:45,960
For instance, Alex, what have you been reading lately?
39
00:02:45,960 --> 00:02:52,502
ALEX: One fish, two fish, red fish, blue fish.
40
00:02:52,502 --> 00:02:53,460
DAVID MALAN: Wonderful.
41
00:02:53,460 --> 00:02:58,890
So given that, what grade level would you say Alex is currently reading at?
42
00:02:58,890 --> 00:03:01,500
Feel free to just shout it out.
43
00:03:01,500 --> 00:03:02,730
First, first?
44
00:03:02,730 --> 00:03:07,200
So indeed, you'll see this week, if you run your code on Alex's text,
45
00:03:07,200 --> 00:03:10,410
it actually turns out he reads below a first grade reading level.
46
00:03:10,410 --> 00:03:12,400
But why might that be?
47
00:03:12,400 --> 00:03:16,410
What might your intuition be for why we've
48
00:03:16,410 --> 00:03:19,020
accused Alex of reading at this level?
49
00:03:19,020 --> 00:03:20,990
Feel free to shout out.
50
00:03:20,990 --> 00:03:21,490
Yeah.
51
00:03:21,490 --> 00:03:24,520
So very few syllables, short words, short sentences.
52
00:03:24,520 --> 00:03:27,828
And so there's some heuristics, perhaps, we can infer from that short text,
53
00:03:27,828 --> 00:03:30,370
that that probably means that it's best for younger children.
54
00:03:30,370 --> 00:03:33,370
Now Sarah, by contrast, what have you been reading?
55
00:03:33,370 --> 00:03:35,470
SARAH: Mr. And Mrs. Dursley of Number.
56
00:03:35,470 --> 00:03:38,890
Four Privet Drive were proud to say that they were
57
00:03:38,890 --> 00:03:41,050
perfectly normal, thank you very much.
58
00:03:41,050 --> 00:03:43,480
They were the last people you'd expect to be involved
59
00:03:43,480 --> 00:03:46,390
in anything strange or mysterious because they just
60
00:03:46,390 --> 00:03:47,952
didn't hold with much nonsense.
61
00:03:47,952 --> 00:03:48,910
DAVID MALAN: All right.
62
00:03:48,910 --> 00:03:50,950
Now irrespective of what grade you were in when
63
00:03:50,950 --> 00:03:53,283
you might have read that text, what grade level to Sarah
64
00:03:53,283 --> 00:03:55,230
seemed to be reading at?
65
00:03:55,230 --> 00:03:57,570
So eighth grade, second grade.
66
00:03:57,570 --> 00:03:58,080
OK.
67
00:03:58,080 --> 00:04:01,125
So hearing a bit of everything, so with that, at least according to code,
68
00:04:01,125 --> 00:04:03,240
it would actually be seventh grade.
69
00:04:03,240 --> 00:04:05,130
And what might the intuition there be?
70
00:04:05,130 --> 00:04:07,620
Why is that a higher grade level even though we might
71
00:04:07,620 --> 00:04:09,917
disagree exactly which grade it is?
72
00:04:09,917 --> 00:04:11,250
AUDIENCE: Complicated sentences.
73
00:04:11,250 --> 00:04:12,000
DAVID MALAN: Yeah.
74
00:04:12,000 --> 00:04:14,218
So complicated sentences, longer sentences.
75
00:04:14,218 --> 00:04:17,010
So indeed a lot more words were being spoken by Sarah because there
76
00:04:17,010 --> 00:04:18,519
was so much more there on the page.
77
00:04:18,519 --> 00:04:22,079
So we'll translate these ideas this coming week in problem set 2,
78
00:04:22,079 --> 00:04:25,170
if you tackle this one, through code so that you can ultimately
79
00:04:25,170 --> 00:04:26,910
infer things of these quantitatively.
80
00:04:26,910 --> 00:04:29,190
But to do so, we're going to have to understand text.
81
00:04:29,190 --> 00:04:32,610
So let's first thank our volunteers and then we'll dive in to that lower level.
82
00:04:32,610 --> 00:04:35,337
[APPLAUSE]
83
00:04:35,337 --> 00:04:39,910
84
00:04:39,910 --> 00:04:40,600
Sorry.
85
00:04:40,600 --> 00:04:41,490
You can keep those.
86
00:04:41,490 --> 00:04:42,222
SARAH: Oh, OK.
87
00:04:42,222 --> 00:04:43,180
DAVID MALAN: All right.
88
00:04:43,180 --> 00:04:45,970
So besides that, let's consider one other body of text
89
00:04:45,970 --> 00:04:48,010
perhaps that you might see this week, which
90
00:04:48,010 --> 00:04:50,210
is namely a little something like this.
91
00:04:50,210 --> 00:04:53,860
What I have here on the screen is what we'll start calling today ciphertext.
92
00:04:53,860 --> 00:04:56,530
It's the result of encrypting some piece of information.
93
00:04:56,530 --> 00:05:00,190
And encryption, or more generally, the art and science of cryptography
94
00:05:00,190 --> 00:05:00,908
is all around us.
95
00:05:00,908 --> 00:05:03,700
It's what you're using on the web, on your phones, with your banks.
96
00:05:03,700 --> 00:05:07,000
And anything that tries to keep data secure is using encryption.
97
00:05:07,000 --> 00:05:10,390
But there's going to be different levels of encryption-- strong encryption,
98
00:05:10,390 --> 00:05:11,140
weak encryption.
99
00:05:11,140 --> 00:05:14,590
And what you see here on the screen isn't all that strong,
100
00:05:14,590 --> 00:05:18,190
but we'll see later today how we might decrypt this and actually reveal
101
00:05:18,190 --> 00:05:22,030
what the plaintext is that corresponds to that ciphertext.
102
00:05:22,030 --> 00:05:25,670
But in order to do so, we have to start taking off some training wheels,
103
00:05:25,670 --> 00:05:26,197
so to speak.
104
00:05:26,197 --> 00:05:28,030
And believe it or not, even though your time
105
00:05:28,030 --> 00:05:30,100
would see this past week for the first time,
106
00:05:30,100 --> 00:05:32,230
probably, might have been rather in the weeds.
107
00:05:32,230 --> 00:05:36,072
And much more complicated seemingly than C, it turns out that along the way,
108
00:05:36,072 --> 00:05:37,780
we have been providing and we'll continue
109
00:05:37,780 --> 00:05:39,760
to provide certain training wheels.
110
00:05:39,760 --> 00:05:42,190
For instance, the CS50 Library is one of them,
111
00:05:42,190 --> 00:05:46,240
and even some of the explanations we give of topics for now
112
00:05:46,240 --> 00:05:49,120
in these early weeks will be somewhat simplified-- abstracted away,
113
00:05:49,120 --> 00:05:49,730
if you will.
114
00:05:49,730 --> 00:05:51,730
But the goal ultimately is for you to understand
115
00:05:51,730 --> 00:05:55,060
each and every one of those details so that after CS50, you really
116
00:05:55,060 --> 00:05:58,210
can stand on your own and understand and wrap your mind
117
00:05:58,210 --> 00:06:01,040
around any future technologies as well.
118
00:06:01,040 --> 00:06:05,318
So let's consider first the very first program with which we began last week,
119
00:06:05,318 --> 00:06:06,110
which was this one.
120
00:06:06,110 --> 00:06:09,215
So "hello, world" in C. At the end of the day, it was really the printf
121
00:06:09,215 --> 00:06:11,590
function that was doing the interesting part of the work,
122
00:06:11,590 --> 00:06:14,890
but there was a lot of technical stuff above and below it.
123
00:06:14,890 --> 00:06:19,900
The curly braces, the parentheses, words like void and include, and then
124
00:06:19,900 --> 00:06:21,730
of course, the angled brackets and more.
125
00:06:21,730 --> 00:06:25,870
But at the end of the day, we needed to convert that source code in C
126
00:06:25,870 --> 00:06:30,190
to machine code, the 0's and 1's in binary that the computer understood.
127
00:06:30,190 --> 00:06:32,500
And to do that, of course, we ran--
128
00:06:32,500 --> 00:06:33,700
we compiled the code.
129
00:06:33,700 --> 00:06:37,400
We ran make and then we were able to actually run that code there.
130
00:06:37,400 --> 00:06:39,370
So let me actually go over here to VS Code
131
00:06:39,370 --> 00:06:44,510
and really quickly recreate that hello.c pretty much by transcribing the same.
132
00:06:44,510 --> 00:06:51,970
So I might have here include stdio.h, int main void.
133
00:06:51,970 --> 00:06:54,460
And then in here, I had quite simply, hello,
134
00:06:54,460 --> 00:06:57,430
comma, world with my backslash, endquotes, and more.
135
00:06:57,430 --> 00:07:01,693
Now last time, to compile this, I indeed ran make hello, followed by Enter.
136
00:07:01,693 --> 00:07:03,860
Hopefully you see no errors and that's a good thing.
137
00:07:03,860 --> 00:07:05,980
And if you do dot, slash, hello, you see,
138
00:07:05,980 --> 00:07:07,840
in fact, the results of that program.
139
00:07:07,840 --> 00:07:11,470
But it turns out that make is not actually a compiler
140
00:07:11,470 --> 00:07:12,950
as I alluded to last week.
141
00:07:12,950 --> 00:07:15,520
It's a program that clearly makes your program,
142
00:07:15,520 --> 00:07:19,030
but it itself just automates the process of using an actual compiler.
143
00:07:19,030 --> 00:07:21,290
And there's lots of different compilers out there,
144
00:07:21,290 --> 00:07:24,190
and the one that it's actually using underneath the hood
145
00:07:24,190 --> 00:07:27,640
is a little something called Clang for C Language.
146
00:07:27,640 --> 00:07:30,190
And Clang is a pretty popular compiler nowadays.
147
00:07:30,190 --> 00:07:33,520
There's another one that's been around for ages called GCC,
148
00:07:33,520 --> 00:07:36,330
but these are just specific names for types of compilers
149
00:07:36,330 --> 00:07:38,830
that different people, different companies, different groups
150
00:07:38,830 --> 00:07:40,310
have actually created.
151
00:07:40,310 --> 00:07:44,800
But if you use in week 1 a compiler yourself manually,
152
00:07:44,800 --> 00:07:47,170
you have to understand a little more about what's
153
00:07:47,170 --> 00:07:50,703
going on because it's even more cryptic than what just make alone.
154
00:07:50,703 --> 00:07:53,620
So in fact, let me go back to my terminal window here, let me go ahead
155
00:07:53,620 --> 00:07:58,690
and clear the screen a little bit and just run really the raw compiler
156
00:07:58,690 --> 00:07:59,360
command.
157
00:07:59,360 --> 00:08:01,450
So what make is automating for me let me,
158
00:08:01,450 --> 00:08:03,620
actually do this manually for just a moment.
159
00:08:03,620 --> 00:08:10,450
So if I want to compile hello.c into an executable program I can run,
160
00:08:10,450 --> 00:08:12,220
I can do this.
161
00:08:12,220 --> 00:08:17,110
clang, space, hello.c, and then Enter.
162
00:08:17,110 --> 00:08:20,980
And now there's no output, which is a good thing in this case, no errors,
163
00:08:20,980 --> 00:08:22,010
but notice this.
164
00:08:22,010 --> 00:08:25,450
If I go ahead and type ls, it turns out there's
165
00:08:25,450 --> 00:08:32,140
a file that's been created suddenly in my current folder weirdly called a.out.
166
00:08:32,140 --> 00:08:33,580
That stands for Assembler Output.
167
00:08:33,580 --> 00:08:35,980
And long story short, that's actually the default name
168
00:08:35,980 --> 00:08:39,440
of a program that's created when you just run Clang by itself.
169
00:08:39,440 --> 00:08:41,830
Now that's a pretty bad name for a program
170
00:08:41,830 --> 00:08:44,000
because it doesn't describe what it does.
171
00:08:44,000 --> 00:08:49,870
So better would be here to perhaps do, well, instead of a.out, which, yes,
172
00:08:49,870 --> 00:08:53,950
still prints hello.world, but isn't really a clearly-named program,
173
00:08:53,950 --> 00:08:55,420
it'd be nice to name this hello.
174
00:08:55,420 --> 00:08:56,240
So what could I do?
175
00:08:56,240 --> 00:08:59,740
I could do like we learned last week-- well, I could rename a.out to hello
176
00:08:59,740 --> 00:09:01,820
by using Linux's mv command.
177
00:09:01,820 --> 00:09:04,480
So I'm going to move a.out to become hello.
178
00:09:04,480 --> 00:09:06,370
But that, too, seems kind of tedious.
179
00:09:06,370 --> 00:09:07,720
Now I have three steps.
180
00:09:07,720 --> 00:09:10,750
Like write my code, compile my code, and then rename it
181
00:09:10,750 --> 00:09:12,190
before I can even run it.
182
00:09:12,190 --> 00:09:13,580
We can do better than that.
183
00:09:13,580 --> 00:09:15,580
And so it turns out that certain commands
184
00:09:15,580 --> 00:09:18,220
like clang support what we're going to start today
185
00:09:18,220 --> 00:09:20,380
calling command line arguments.
186
00:09:20,380 --> 00:09:24,010
A command line argument, unlike an argument to a function,
187
00:09:24,010 --> 00:09:27,040
is just an additional word or key phrase that you
188
00:09:27,040 --> 00:09:30,400
type after a command at your prompt in your terminal
189
00:09:30,400 --> 00:09:33,440
window that just modifies the behavior of that command.
190
00:09:33,440 --> 00:09:35,600
It configures it a little more specifically.
191
00:09:35,600 --> 00:09:39,220
So what you're seeing here on the screen is some of a better command with which
192
00:09:39,220 --> 00:09:45,220
to run clang so that now I can specify the output of this command per this o.
193
00:09:45,220 --> 00:09:46,610
So do what I mean by that?
194
00:09:46,610 --> 00:09:48,943
Well, let me go ahead and clear my terminal window again
195
00:09:48,943 --> 00:09:54,955
and more explicitly type clang -o hello hello.c and then Enter.
196
00:09:54,955 --> 00:09:57,580
Nothing, again, appears to happen, but that's a good thing when
197
00:09:57,580 --> 00:10:02,860
you see no errors and now the program I just created is indeed called Hello.
198
00:10:02,860 --> 00:10:07,280
So it achieves really the same exact effect as make did, but what.
199
00:10:07,280 --> 00:10:09,820
I don't have to do with make is type and remember something
200
00:10:09,820 --> 00:10:11,075
as long as this command.
201
00:10:11,075 --> 00:10:12,700
And this, too, is a bit of a white lie.
202
00:10:12,700 --> 00:10:16,420
It turns out, we have preconfigured VS Code in the cloud for you
203
00:10:16,420 --> 00:10:21,310
to also use some other features of Clang that would be even more
204
00:10:21,310 --> 00:10:22,840
tedious for you to write yourselves.
205
00:10:22,840 --> 00:10:28,130
And so really, this is why we distill this as ultimately just running make.
206
00:10:28,130 --> 00:10:31,900
So let me pause here to see first if there's any questions on what I've
207
00:10:31,900 --> 00:10:34,540
done by taking my very first program in C
208
00:10:34,540 --> 00:10:37,720
and just now compiling it first with make, but then starting over
209
00:10:37,720 --> 00:10:40,780
and now manually compiling it with clang with what
210
00:10:40,780 --> 00:10:44,500
we'll call command line arguments. -o, space, hello,
211
00:10:44,500 --> 00:10:46,820
and then the name of the file.
212
00:10:46,820 --> 00:10:47,320
Yeah?
213
00:10:47,320 --> 00:10:48,780
AUDIENCE: What is a.out?
214
00:10:48,780 --> 00:10:49,530
DAVID MALAN: Yeah.
215
00:10:49,530 --> 00:10:51,870
So a.out is a historical name.
216
00:10:51,870 --> 00:10:55,240
It refers to assembler output-- more on that soon.
217
00:10:55,240 --> 00:10:58,080
And it's just the default file name that you get automatically
218
00:10:58,080 --> 00:11:01,350
if you just run the compiler on any file so that you
219
00:11:01,350 --> 00:11:02,970
have just a standard name for it.
220
00:11:02,970 --> 00:11:05,213
But it's not a very well-named program.
221
00:11:05,213 --> 00:11:07,380
Instead of running Microsoft Word on your Mac or PC,
222
00:11:07,380 --> 00:11:09,880
it would be like double-clicking on a.out.
223
00:11:09,880 --> 00:11:11,880
So instead with these command line arguments,
224
00:11:11,880 --> 00:11:17,370
you can customize the output of Clang and call it hello or anything you want.
225
00:11:17,370 --> 00:11:23,020
Other questions on what I've done here with Clang itself, the compiler?
226
00:11:23,020 --> 00:11:23,520
Yeah?
227
00:11:23,520 --> 00:11:25,510
AUDIENCE: What is -o?
228
00:11:25,510 --> 00:11:26,565
DAVID MALAN: So -o--
229
00:11:26,565 --> 00:11:29,440
and you would only know this from reading the manual, taking a class,
230
00:11:29,440 --> 00:11:30,500
means output.
231
00:11:30,500 --> 00:11:35,890
So -o means change Clang's output to be a file called hello
232
00:11:35,890 --> 00:11:38,680
instead of the default, which is a.out.
233
00:11:38,680 --> 00:11:42,400
And this, too, is, again, a detail you would have to look up on a web page,
234
00:11:42,400 --> 00:11:44,810
read the manual, hear someone like me tell you about it.
235
00:11:44,810 --> 00:11:46,893
And in fact, there's even more than these options,
236
00:11:46,893 --> 00:11:48,890
but we'll just scratch the surface here.
237
00:11:48,890 --> 00:11:49,390
All right.
238
00:11:49,390 --> 00:11:53,530
So if we now know this, what more is actually happening underneath the hood?
239
00:11:53,530 --> 00:11:57,250
Well, let's take a closer look at not just this version of my code,
240
00:11:57,250 --> 00:12:01,190
but my slightly more complicated version last week,
241
00:12:01,190 --> 00:12:03,430
which looked a little something like this, wherein
242
00:12:03,430 --> 00:12:07,330
I added in some dynamic input from the user so I could say not hello, world
243
00:12:07,330 --> 00:12:11,810
to everyone, but hello, David or hello to whoever actually runs this program.
244
00:12:11,810 --> 00:12:15,880
So in fact, let me go ahead and change my code here in VS Code just
245
00:12:15,880 --> 00:12:17,770
to match that same code from last week.
246
00:12:17,770 --> 00:12:19,190
So no new code yet.
247
00:12:19,190 --> 00:12:22,820
I'm just going to, in a moment, compile it in a slightly different way.
248
00:12:22,820 --> 00:12:29,020
So I did last week's string, I think, answer equals string, quote-unquote,
249
00:12:29,020 --> 00:12:30,100
"What's your name?"
250
00:12:30,100 --> 00:12:31,540
Just like in Scratch.
251
00:12:31,540 --> 00:12:35,920
And then down here, instead of doing world, I initially wrote answer,
252
00:12:35,920 --> 00:12:37,450
but that didn't go well.
253
00:12:37,450 --> 00:12:41,530
What did I ultimately do instead to print out hello, David or hello,
254
00:12:41,530 --> 00:12:42,940
so-and-so?
255
00:12:42,940 --> 00:12:44,722
Yeah?
256
00:12:44,722 --> 00:12:45,680
Sorry, a little louder?
257
00:12:45,680 --> 00:12:46,430
AUDIENCE: %s?
258
00:12:46,430 --> 00:12:50,478
DAVID MALAN: Yeah, so %s, the so-called format code that printf just knows how
259
00:12:50,478 --> 00:12:51,020
to deal with.
260
00:12:51,020 --> 00:12:52,470
And I had to add one other thing.
261
00:12:52,470 --> 00:12:54,350
Someone else besides %s--
262
00:12:54,350 --> 00:12:54,850
yeah?
263
00:12:54,850 --> 00:12:56,050
AUDIENCE: The name of the variable.
264
00:12:56,050 --> 00:12:58,870
DAVID MALAN: The name of the variable that I want to plug into that
265
00:12:58,870 --> 00:13:00,190
placeholder %s.
266
00:13:00,190 --> 00:13:01,630
And in this case, it's answer.
267
00:13:01,630 --> 00:13:04,363
Now let me make one refinement only because now we're in week 2
268
00:13:04,363 --> 00:13:06,530
and we're going to start writing more lines of code,
269
00:13:06,530 --> 00:13:10,360
even though Scratch called the return value of the ask puzzle piece,
270
00:13:10,360 --> 00:13:11,560
answer always.
271
00:13:11,560 --> 00:13:14,480
And see, we have full control over what our variables are called.
272
00:13:14,480 --> 00:13:17,410
And now it's probably good not to just generically always call
273
00:13:17,410 --> 00:13:19,870
my variable answer if I'm using get_string.
274
00:13:19,870 --> 00:13:21,050
Let's call it what it is.
275
00:13:21,050 --> 00:13:23,680
So this is now just a matter of style, if you will.
276
00:13:23,680 --> 00:13:26,620
Let me change the variable to be name just so
277
00:13:26,620 --> 00:13:29,980
that it's a little clearer to me, to you, to a TF or TA
278
00:13:29,980 --> 00:13:34,000
exactly what that variable represents instead of more generically answer.
279
00:13:34,000 --> 00:13:37,030
All right, so that said, let me go down to my terminal window,
280
00:13:37,030 --> 00:13:41,050
and last week again, I ran make to compile this exact same program.
281
00:13:41,050 --> 00:13:43,270
Now, though, let me go ahead and just use clang.
282
00:13:43,270 --> 00:13:45,490
So clang -o--
283
00:13:45,490 --> 00:13:47,500
I'll still call this version hello--
284
00:13:47,500 --> 00:13:49,330
space, hello.c.
285
00:13:49,330 --> 00:13:51,080
So exact same command as before.
286
00:13:51,080 --> 00:13:54,640
The only thing that's different is I've added a couple of more lines of code
287
00:13:54,640 --> 00:13:56,330
to get the user's input.
288
00:13:56,330 --> 00:13:59,960
Let me hit Enter, and now, darn it, our first error.
289
00:13:59,960 --> 00:14:02,750
So output from clang and make is not a good thing,
290
00:14:02,750 --> 00:14:05,420
and here, we're seeing something particularly cryptic.
291
00:14:05,420 --> 00:14:09,010
So something in function 'main--' undefined reference
292
00:14:09,010 --> 00:14:13,480
to 'get_string,' string and then linker command failed with exit code 1.
293
00:14:13,480 --> 00:14:16,540
So there's actually a lot of jargon in there that will tease apart today,
294
00:14:16,540 --> 00:14:20,338
but my hint is that clearly my problem's in main, although that's not surprising
295
00:14:20,338 --> 00:14:22,130
because there's nothing else going on here.
296
00:14:22,130 --> 00:14:26,830
get_string is an issue, and the issue is that it's an undefined reference.
297
00:14:26,830 --> 00:14:28,990
And yet, notice, I was pretty good.
298
00:14:28,990 --> 00:14:32,920
I added the CS50 header file and I said last week that that's
299
00:14:32,920 --> 00:14:35,920
enough to teach the compiler that functions exist,
300
00:14:35,920 --> 00:14:39,070
but the problem is that even though this does, in fact,
301
00:14:39,070 --> 00:14:43,090
teach Clang that get_string exists, it is not
302
00:14:43,090 --> 00:14:47,530
sufficient information for Clang to go find on the hard drive of the computer
303
00:14:47,530 --> 00:14:51,860
the 0's and 1's that actually implement get_string itself.
304
00:14:51,860 --> 00:14:54,250
So in other words, this include line, per last week,
305
00:14:54,250 --> 00:14:55,333
is a little bit of a hint.
306
00:14:55,333 --> 00:14:59,560
It's a teaser to Clang that you're about to see and use this function somewhere.
307
00:14:59,560 --> 00:15:05,710
But if you actually want to use the 0's and 1's that CS50 wrote some time ago
308
00:15:05,710 --> 00:15:08,740
and bake those into your program so your program actually
309
00:15:08,740 --> 00:15:11,470
knows how to get input from the user, well then,
310
00:15:11,470 --> 00:15:15,440
I'm going to have to go ahead and run a slightly different command.
311
00:15:15,440 --> 00:15:16,250
So let me do this.
312
00:15:16,250 --> 00:15:18,917
Let me clear my terminal window just get rid of that distraction
313
00:15:18,917 --> 00:15:23,020
and let me propose now that we run this command instead.
314
00:15:23,020 --> 00:15:28,510
Almost the same as before, clang -o, space, hello, then hello.c,
315
00:15:28,510 --> 00:15:34,210
but with one additional command line argument at the end, and this is a -l--
316
00:15:34,210 --> 00:15:35,050
not a number 1.
317
00:15:35,050 --> 00:15:39,370
So -lcs with no space in between those two.
318
00:15:39,370 --> 00:15:43,540
Now the l is going to result in all of those 0's and 1's that actually
319
00:15:43,540 --> 00:15:48,350
were in by CS50 being linked into your code, your few lines of code or mine
320
00:15:48,350 --> 00:15:48,850
here.
321
00:15:48,850 --> 00:15:53,530
But that's the second step that the compiler requires in order to know how
322
00:15:53,530 --> 00:15:58,537
to actually execute and rather compile your code and CS50's.
323
00:15:58,537 --> 00:16:00,370
And CS50 is not the only one that does this.
324
00:16:00,370 --> 00:16:04,750
If you use any third party library in C that doesn't come with the language,
325
00:16:04,750 --> 00:16:08,333
you would do -l such and such where whoever--
326
00:16:08,333 --> 00:16:10,000
however they've named their own library.
327
00:16:10,000 --> 00:16:14,298
But you don't have to do it for built in things like we've been using thus far.
328
00:16:14,298 --> 00:16:16,090
All right, so let me go ahead and try this.
329
00:16:16,090 --> 00:16:19,000
I'll go back to VS Code here, and let me go ahead now
330
00:16:19,000 --> 00:16:23,620
and run clang -o hello, then hello.c.
331
00:16:23,620 --> 00:16:26,560
And now instead of just hitting Enter, -lcs50
332
00:16:26,560 --> 00:16:29,590
with no space between the l and the cs50, Enter.
333
00:16:29,590 --> 00:16:33,310
Now nothing bad happens, and now I can do ./hello.
334
00:16:33,310 --> 00:16:34,180
What's your name?
335
00:16:34,180 --> 00:16:37,633
I'll type in David, Enter, and now we see hello, David.
336
00:16:37,633 --> 00:16:40,300
Now honestly, this is where we're really getting into the weeds,
337
00:16:40,300 --> 00:16:42,130
and now this is taking--
338
00:16:42,130 --> 00:16:45,730
this is really just adding nuisance to the process of compiling and running
339
00:16:45,730 --> 00:16:46,460
your code.
340
00:16:46,460 --> 00:16:49,960
And so the reality is, even though this is indeed what is happening,
341
00:16:49,960 --> 00:16:51,880
this is why we used last week and we're going
342
00:16:51,880 --> 00:16:55,240
to continue using this week onward make because it just
343
00:16:55,240 --> 00:16:57,130
automates that whole process for you.
344
00:16:57,130 --> 00:17:00,130
But it's ideal to understand what's going wrong because any of the error
345
00:17:00,130 --> 00:17:02,770
messages you saw for problem set 1, any of the error messages
346
00:17:02,770 --> 00:17:05,859
you see for the next few weeks probably aren't coming from make,
347
00:17:05,859 --> 00:17:08,560
they're coming from Clang underneath the hood
348
00:17:08,560 --> 00:17:10,780
because make is just automating the process.
349
00:17:10,780 --> 00:17:14,060
But with make, you literally just write make and then the name of the program,
350
00:17:14,060 --> 00:17:17,560
you don't have to worry about any of those command line arguments.
351
00:17:17,560 --> 00:17:22,240
Questions, then, on compiling with dash -lcs50 or anything else?
352
00:17:22,240 --> 00:17:23,043
Yeah?
353
00:17:23,043 --> 00:17:24,960
AUDIENCE: What is the benefit of [INAUDIBLE]??
354
00:17:24,960 --> 00:17:26,220
DAVID MALAN: Sorry, what is the benefit of--
355
00:17:26,220 --> 00:17:27,512
AUDIENCE: Using Clang manually.
356
00:17:27,512 --> 00:17:30,000
DAVID MALAN: What is the benefit of using Clang manually?
357
00:17:30,000 --> 00:17:30,870
None, really.
358
00:17:30,870 --> 00:17:33,450
In fact, all main is doing is just say-- make is doing
359
00:17:33,450 --> 00:17:35,055
is saving us some keystrokes.
360
00:17:35,055 --> 00:17:37,680
If you prefer, though, and you just like to be more in control,
361
00:17:37,680 --> 00:17:41,130
you can totally run Clang manually if you remember the various command line
362
00:17:41,130 --> 00:17:42,090
arguments.
363
00:17:42,090 --> 00:17:42,660
Yeah?
364
00:17:42,660 --> 00:17:47,335
AUDIENCE: So why did you have to explain [INAUDIBLE]
365
00:17:47,335 --> 00:17:48,210
DAVID MALAN: Exactly.
366
00:17:48,210 --> 00:17:49,560
Why did I have to explain--
367
00:17:49,560 --> 00:17:53,220
that is, provide a hint to CS50 with the cs50.h header file,
368
00:17:53,220 --> 00:17:55,470
but I didn't have to do that with standardio.h?
369
00:17:55,470 --> 00:17:56,400
Just because.
370
00:17:56,400 --> 00:18:00,990
standardio.h comes with C, just like a few other libraries come
371
00:18:00,990 --> 00:18:03,060
with C that we'll start seeing today.
372
00:18:03,060 --> 00:18:05,410
CS50, though, is not built into C everywhere,
373
00:18:05,410 --> 00:18:07,890
and so you do have to explicitly add that one there.
374
00:18:07,890 --> 00:18:08,767
Yeah?
375
00:18:08,767 --> 00:18:11,970
AUDIENCE: Can you define what command line argument [INAUDIBLE]??
376
00:18:11,970 --> 00:18:15,210
DAVID MALAN: A command line argument is a word or phrase
377
00:18:15,210 --> 00:18:17,740
that you type at the command line--
378
00:18:17,740 --> 00:18:22,200
a.k.a., your terminal-- in order to influence the behavior of a program.
379
00:18:22,200 --> 00:18:22,742
AUDIENCE: OK.
380
00:18:22,742 --> 00:18:24,430
So it's a term for whatever you're giving it.
381
00:18:24,430 --> 00:18:24,565
DAVID MALAN: Yeah.
382
00:18:24,565 --> 00:18:25,660
It changes the defaults.
383
00:18:25,660 --> 00:18:27,790
In our GUI world, Graphical User Interface,
384
00:18:27,790 --> 00:18:29,680
you and I would probably click some boxes,
385
00:18:29,680 --> 00:18:32,350
we would select some menu options to configure a program
386
00:18:32,350 --> 00:18:33,460
to behave in the same way.
387
00:18:33,460 --> 00:18:36,850
At a command line interface, you have to just say everything all at once,
388
00:18:36,850 --> 00:18:39,600
and that's why we have command line arguments.
389
00:18:39,600 --> 00:18:40,605
Yeah?
390
00:18:40,605 --> 00:18:43,243
AUDIENCE: Is make [INAUDIBLE]
391
00:18:43,243 --> 00:18:43,910
DAVID MALAN: No.
392
00:18:43,910 --> 00:18:45,470
Make is not just for CS50.
393
00:18:45,470 --> 00:18:50,480
It's used globally in any project really nowadays using C, C++,
394
00:18:50,480 --> 00:18:52,020
even other languages as well.
395
00:18:52,020 --> 00:18:54,140
In fact, most every command you see in this class,
396
00:18:54,140 --> 00:18:57,530
unless it has 5-0 at the end of it, is globally used.
397
00:18:57,530 --> 00:19:00,758
Only those-- a suffix with 50 are, indeed, course-specific.
398
00:19:00,758 --> 00:19:03,050
And even those we'll gradually take training wheels off
399
00:19:03,050 --> 00:19:06,890
of so that exactly what those commands are doing as well.
400
00:19:06,890 --> 00:19:09,053
All right, so what is it that we've just done?
401
00:19:09,053 --> 00:19:11,720
Everything we've just done, of course, I keep calling compiling,
402
00:19:11,720 --> 00:19:13,580
but let's just go down one rabbit hole so
403
00:19:13,580 --> 00:19:15,967
that you understand that when you compile code,
404
00:19:15,967 --> 00:19:18,050
there's actually a whole bunch of steps, happening
405
00:19:18,050 --> 00:19:21,800
and this is going to enable a lot of features, like companies can
406
00:19:21,800 --> 00:19:26,060
write code and then convert it to run it on Macs and PCs alike
407
00:19:26,060 --> 00:19:27,240
or phones or the like.
408
00:19:27,240 --> 00:19:30,320
So it's not just a matter of converting source code to machine code,
409
00:19:30,320 --> 00:19:34,610
there's actually four steps involved in what you and I, as of last week,
410
00:19:34,610 --> 00:19:35,840
know as compiling.
411
00:19:35,840 --> 00:19:39,033
And these aren't terms that you'll have to keep in mind constantly
412
00:19:39,033 --> 00:19:41,450
because again, we're going to abstract a lot of this away.
413
00:19:41,450 --> 00:19:43,492
But just so we've gone down the rabbit hole once,
414
00:19:43,492 --> 00:19:45,890
let's consider each of these four steps that
415
00:19:45,890 --> 00:19:49,850
have been happening for you for a week automatically, the first of which
416
00:19:49,850 --> 00:19:51,080
is called preprocessing.
417
00:19:51,080 --> 00:19:52,260
So what does this mean?
418
00:19:52,260 --> 00:19:54,450
Well, let's consider that same program as before.
419
00:19:54,450 --> 00:19:57,830
So notice that two of the lines of code start with a hash mark.
420
00:19:57,830 --> 00:20:02,338
That is a special symbol in C, and it's a so-called preprocessor directive.
421
00:20:02,338 --> 00:20:04,130
You don't need to memorize terms like that,
422
00:20:04,130 --> 00:20:07,005
but it just means that it's a little different from every other line.
423
00:20:07,005 --> 00:20:08,960
And anything with a hash symbol here should
424
00:20:08,960 --> 00:20:13,315
be preprocessed-- that is, analyzed initially before anything else happens.
425
00:20:13,315 --> 00:20:17,100
So let's consider these two lines up top, what exactly is happening.
426
00:20:17,100 --> 00:20:19,220
Well, it turns out with these two lines, you
427
00:20:19,220 --> 00:20:23,390
have two header files, of course, cs50.h and stdio.h.
428
00:20:23,390 --> 00:20:27,980
Where are those files, because they've never been in VS Code for you,
429
00:20:27,980 --> 00:20:28,550
seemingly.
430
00:20:28,550 --> 00:20:31,940
If you type LS-- if you open up the File Explorer in the GUI,
431
00:20:31,940 --> 00:20:35,900
you have never seen, probably, cs50.h or stdio.h.
432
00:20:35,900 --> 00:20:39,620
They just work, but that's because there's a folder somewhere
433
00:20:39,620 --> 00:20:43,340
on the hard drive that you're using on your Mac or PC
434
00:20:43,340 --> 00:20:45,690
or somewhere in the cloud, as in our case.
435
00:20:45,690 --> 00:20:50,210
And inside of this folder, traditionally called /usr/include.
436
00:20:50,210 --> 00:20:51,857
And user is deliberately misspelled.
437
00:20:51,857 --> 00:20:54,440
It's just slightly more succinct, although it's a little weird
438
00:20:54,440 --> 00:20:55,760
why we drop that one letter.
439
00:20:55,760 --> 00:21:01,760
But usr/include is just a folder on the server that contains cs50.h, stdio.h,
440
00:21:01,760 --> 00:21:03,990
and a bunch of other things as well.
441
00:21:03,990 --> 00:21:08,030
So in fact, if you type in VS Code, in your terminal window,
442
00:21:08,030 --> 00:21:13,310
when you're using code spaces in the cloud and type LS space /usr/include,
443
00:21:13,310 --> 00:21:15,470
you can see all of the files in that folder.
444
00:21:15,470 --> 00:21:17,580
But we've preinstalled all of that stuff for you.
445
00:21:17,580 --> 00:21:20,390
So let's consider what's actually in those files here.
446
00:21:20,390 --> 00:21:25,370
If I highlight these two lines up top that start with hash include, well,
447
00:21:25,370 --> 00:21:30,530
I kind of hinted last week that what's in that first file is a hint as to what
448
00:21:30,530 --> 00:21:32,660
functions CS50 wrote for you.
449
00:21:32,660 --> 00:21:35,540
So you can kind of think of these include lines
450
00:21:35,540 --> 00:21:38,300
as being temporary placeholders for what's
451
00:21:38,300 --> 00:21:41,000
going to become like a global find and replace.
452
00:21:41,000 --> 00:21:44,270
That is the first thing clang is going to do is to preprocess this file.
453
00:21:44,270 --> 00:21:47,300
It's going to look for any line that starts with hash include.
454
00:21:47,300 --> 00:21:50,960
And if it sees that, it's going to essentially go into that file,
455
00:21:50,960 --> 00:21:55,190
like cs50.h, and then just copy and paste the contents of that file
456
00:21:55,190 --> 00:21:56,443
magically there for you.
457
00:21:56,443 --> 00:21:58,110
You don't see it visually on the screen.
458
00:21:58,110 --> 00:22:00,060
But it's happening behind the scenes.
459
00:22:00,060 --> 00:22:03,230
And so really, what's happening with this first line
460
00:22:03,230 --> 00:22:09,380
is that somewhere in cs50.h is the declaration of getString
461
00:22:09,380 --> 00:22:11,690
like we talked last week, and it probably
462
00:22:11,690 --> 00:22:13,215
looks a little something like this.
463
00:22:13,215 --> 00:22:15,590
And we didn't spend much time on this yet this past week,
464
00:22:15,590 --> 00:22:17,030
but we will in time more.
465
00:22:17,030 --> 00:22:21,470
Notice that this is how a function is declared.
466
00:22:21,470 --> 00:22:23,677
That is, it is decreed to exist.
467
00:22:23,677 --> 00:22:25,760
The name of the function, of course, is getString.
468
00:22:25,760 --> 00:22:28,310
Inside of the parentheses are its arguments.
469
00:22:28,310 --> 00:22:31,580
In this case, there's one argument to getString, I claim today,
470
00:22:31,580 --> 00:22:33,080
but you've known this implicitly.
471
00:22:33,080 --> 00:22:34,160
And it's a prompt.
472
00:22:34,160 --> 00:22:36,860
It's the prompt that the human sees when you use getString.
473
00:22:36,860 --> 00:22:37,790
What is that prompt?
474
00:22:37,790 --> 00:22:41,060
Well, it's a string of text, like quote unquote, "what's your name?"
475
00:22:41,060 --> 00:22:43,080
or anything else that I asked last week.
476
00:22:43,080 --> 00:22:46,610
Meanwhile, getString, as we know from last week, has a return value.
477
00:22:46,610 --> 00:22:48,140
It returns something to you.
478
00:22:48,140 --> 00:22:49,610
And that, too, is a string.
479
00:22:49,610 --> 00:22:52,120
So again, this is also called a functions prototype.
480
00:22:52,120 --> 00:22:53,870
It's the thing toward the end of last week
481
00:22:53,870 --> 00:22:57,560
that I just copied and pasted from the bottom of my file to the top,
482
00:22:57,560 --> 00:23:02,030
just so that it was like this teaser for clang as to what would exist later.
483
00:23:02,030 --> 00:23:07,670
So you can think, then, of these include lines as just kind of combining all
484
00:23:07,670 --> 00:23:11,360
of those function declarations in some separate file called cs50.h,
485
00:23:11,360 --> 00:23:14,780
so that you yourself don't have to type them every time you use the library--
486
00:23:14,780 --> 00:23:18,470
or worse, so that you, yourself, don't have to copy and paste those lines.
487
00:23:18,470 --> 00:23:22,520
This is what clang is doing for you in its first step of preprocessing.
488
00:23:22,520 --> 00:23:27,470
Second, and last in this example, what happens when clang preprocesses
489
00:23:27,470 --> 00:23:29,175
this second include line?
490
00:23:29,175 --> 00:23:31,550
Well, the only other function we care about in this story
491
00:23:31,550 --> 00:23:33,650
is printf, of course, which comes with C.
492
00:23:33,650 --> 00:23:39,440
So essentially, you can think of printf's prototype or declaration
493
00:23:39,440 --> 00:23:40,820
as just being this.
494
00:23:40,820 --> 00:23:42,870
Printf is the name of the function.
495
00:23:42,870 --> 00:23:47,370
It takes a string that you want to format like, Hello comma world,
496
00:23:47,370 --> 00:23:49,110
or Hello comma %s.
497
00:23:49,110 --> 00:23:52,120
And then with dot, dot, dot, this actually has technical meaning.
498
00:23:52,120 --> 00:23:55,770
It means, of course, that you can plug-in 0 variables, 1 variable, 2
499
00:23:55,770 --> 00:23:56,340
or 10.
500
00:23:56,340 --> 00:23:58,530
So dot, dot, dot means some number of variables.
501
00:23:58,530 --> 00:24:00,072
Now we haven't talked about this yet.
502
00:24:00,072 --> 00:24:01,410
And we won't really, in general.
503
00:24:01,410 --> 00:24:05,490
printf actually returns a value, a number, that is an integer.
504
00:24:05,490 --> 00:24:07,420
But more on that perhaps another time.
505
00:24:07,420 --> 00:24:10,920
It's generally not something the programmer tends to look at.
506
00:24:10,920 --> 00:24:14,250
But that's all we mean by preprocessing, so that at the end of this process,
507
00:24:14,250 --> 00:24:18,030
even though there's more lines of code in cs50.h and stdio.h,
508
00:24:18,030 --> 00:24:21,330
what's really just happening is that clang, in preprocessing
509
00:24:21,330 --> 00:24:25,380
the file, copies and pastes the contents of those files into your code
510
00:24:25,380 --> 00:24:29,160
so that now your code knows about everything-- getString, printf,
511
00:24:29,160 --> 00:24:31,060
and anything else.
512
00:24:31,060 --> 00:24:35,230
Any questions, then, on that first step, preprocessing?
513
00:24:35,230 --> 00:24:35,920
Yes?
514
00:24:35,920 --> 00:24:49,195
AUDIENCE: [INAUDIBLE]
515
00:24:49,195 --> 00:24:50,320
DAVID MALAN: Good question.
516
00:24:50,320 --> 00:24:52,720
When you include a file, does it only include what
517
00:24:52,720 --> 00:24:54,880
you need or does it include everything?
518
00:24:54,880 --> 00:24:56,420
Think of it as including everything.
519
00:24:56,420 --> 00:24:59,020
So if it's a big file, that's a lot of code at the very top.
520
00:24:59,020 --> 00:25:01,880
And that's why, if you think back to all of the zeros and ones
521
00:25:01,880 --> 00:25:03,880
I showed a little bit ago, as well as last week,
522
00:25:03,880 --> 00:25:06,130
there's a lot of zeros and ones that end up
523
00:25:06,130 --> 00:25:08,892
on the screen as a result of just writing, Hello, world.
524
00:25:08,892 --> 00:25:10,600
A lot of those zeros and ones are perhaps
525
00:25:10,600 --> 00:25:13,390
coming from code that you didn't actually, necessarily need.
526
00:25:13,390 --> 00:25:15,340
But some of it is perhaps there, but there
527
00:25:15,340 --> 00:25:17,740
are ways to optimize that as well.
528
00:25:17,740 --> 00:25:22,395
All right, so step two of compiling is, confusingly, called compiling.
529
00:25:22,395 --> 00:25:24,520
It's just, this is the term that most everyone uses
530
00:25:24,520 --> 00:25:27,940
to describe the whole process, instead of just this one step.
531
00:25:27,940 --> 00:25:32,140
But once a program has been preprocessed behind the scenes
532
00:25:32,140 --> 00:25:35,865
by the compiler for you, it looks now a little something like this.
533
00:25:35,865 --> 00:25:38,740
And I've put dot, dot, dot just to imply that, yes, to your question,
534
00:25:38,740 --> 00:25:39,820
there's more stuff above it.
535
00:25:39,820 --> 00:25:40,987
There's more stuff below it.
536
00:25:40,987 --> 00:25:43,070
It's just not interesting right now for us.
537
00:25:43,070 --> 00:25:44,860
So now we have just C code.
538
00:25:44,860 --> 00:25:46,960
There's no more preprocessor directives.
539
00:25:46,960 --> 00:25:49,840
At this point, all of the hash symbols and those lines of code
540
00:25:49,840 --> 00:25:52,670
have been preprocessed and converted to something else.
541
00:25:52,670 --> 00:25:56,380
And so now-- and this is where things get a little spooky looking.
542
00:25:56,380 --> 00:26:00,370
Here now is what happens when clang, or any compiler,
543
00:26:00,370 --> 00:26:03,310
literally compiles code like this.
544
00:26:03,310 --> 00:26:08,720
It converts it from this in C to this in assembly code.
545
00:26:08,720 --> 00:26:10,720
So this is among the scarier languages.
546
00:26:10,720 --> 00:26:12,580
I, myself, don't really have fond memories.
547
00:26:12,580 --> 00:26:14,805
This is not a language that many people program in.
548
00:26:14,805 --> 00:26:16,930
If you take a subsequent class in computer science,
549
00:26:16,930 --> 00:26:19,600
in systems, a higher level class, you might actually
550
00:26:19,600 --> 00:26:21,430
learn this or some variant thereof.
551
00:26:21,430 --> 00:26:23,232
But there's at least a few people out there
552
00:26:23,232 --> 00:26:24,940
that need to know this stuff because this
553
00:26:24,940 --> 00:26:29,320
is closer to what the computers themselves, nowadays, understand.
554
00:26:29,320 --> 00:26:34,600
The Intel CPUs or the AMD CPUs, the brains of today's computers and phones
555
00:26:34,600 --> 00:26:37,960
understand stuff that looks more like this and less like C.
556
00:26:37,960 --> 00:26:42,430
Now it's completely esoteric, but let me just highlight a few phrases.
557
00:26:42,430 --> 00:26:44,630
There's some stuff that's a little familiar.
558
00:26:44,630 --> 00:26:47,620
There is mention of main at the top there in yellow.
559
00:26:47,620 --> 00:26:49,750
There is mention of getString toward the bottom.
560
00:26:49,750 --> 00:26:52,070
There is mention of printf down below.
561
00:26:52,070 --> 00:26:55,600
So this is just another programming language called assembly language,
562
00:26:55,600 --> 00:26:57,010
that decades ago, humans--
563
00:26:57,010 --> 00:26:58,450
myself included in school--
564
00:26:58,450 --> 00:27:00,130
did write code in.
565
00:27:00,130 --> 00:27:02,630
And absolutely, some people still write this code,
566
00:27:02,630 --> 00:27:06,070
especially since you can write very, very efficient code.
567
00:27:06,070 --> 00:27:08,590
But it's a lot more arcane.
568
00:27:08,590 --> 00:27:11,380
It's a lot less user friendly.
569
00:27:11,380 --> 00:27:14,650
So you'll see in yellow now, these are the so-called instructions
570
00:27:14,650 --> 00:27:18,460
that a computer's brain or CPU understands, pushing values
571
00:27:18,460 --> 00:27:23,630
around, moving them, subtracting values, calling functions, and move, move,
572
00:27:23,630 --> 00:27:24,130
move.
573
00:27:24,130 --> 00:27:27,400
So really, the low-level operations that computers understand
574
00:27:27,400 --> 00:27:31,030
tend to be arithmetic operations-- subtraction, addition,
575
00:27:31,030 --> 00:27:34,120
and the like-- moving things in and out of memory.
576
00:27:34,120 --> 00:27:37,510
It's just a lot more tedious for folks like us to write code like this.
577
00:27:37,510 --> 00:27:40,450
This is why you and I tend to write stuff like this.
578
00:27:40,450 --> 00:27:44,080
And ideally, still, people like you and I tend to drag and drop puzzle pieces
579
00:27:44,080 --> 00:27:46,520
that sort of abstract all of that away further.
580
00:27:46,520 --> 00:27:49,420
But for now, this is, again, called assembly language.
581
00:27:49,420 --> 00:27:54,310
It is what happens when the compiler literally compiles your code.
582
00:27:54,310 --> 00:27:57,010
But of course, this, still not zeros and ones.
583
00:27:57,010 --> 00:27:58,580
So we got two steps to go.
584
00:27:58,580 --> 00:28:02,270
So when a compiler proceeds to step three,
585
00:28:02,270 --> 00:28:05,530
this is where things get converted to machine code.
586
00:28:05,530 --> 00:28:08,500
And when a compiler assembles your code for you,
587
00:28:08,500 --> 00:28:14,260
it converts what we just saw on the screen here to actual zeros and ones--
588
00:28:14,260 --> 00:28:18,550
the so-called machine code that your phone or your computer understands.
589
00:28:18,550 --> 00:28:22,120
But it's worth noting that these are not necessarily all
590
00:28:22,120 --> 00:28:24,280
of the zeros and ones of your program.
591
00:28:24,280 --> 00:28:29,980
Yes, they are the zeros and ones that correspond to your Hello program
592
00:28:29,980 --> 00:28:33,250
or printf and getString and the like, but notice
593
00:28:33,250 --> 00:28:36,940
that here, we need one final step.
594
00:28:36,940 --> 00:28:40,100
In those zeros and ones are only your lines of code.
595
00:28:40,100 --> 00:28:43,540
But what about CS50's lines of code that we wrote to implement getString?
596
00:28:43,540 --> 00:28:46,990
What about the lines of code that humans wrote decades ago to implement printf?
597
00:28:46,990 --> 00:28:50,020
Those are somewhere on this hard drive, like on my Mac, my PC,
598
00:28:50,020 --> 00:28:54,460
or somewhere in the cloud, but we need to combine all of those zeros and ones
599
00:28:54,460 --> 00:29:01,390
together and link my code with CS50's code with standard I/O's code,
600
00:29:01,390 --> 00:29:02,420
all together.
601
00:29:02,420 --> 00:29:05,110
And so what happens in the last step, ultimately,
602
00:29:05,110 --> 00:29:07,960
is that if we have my code here in yellow,
603
00:29:07,960 --> 00:29:11,440
and then the code that CS50 wrote, and the code that the authors of C
604
00:29:11,440 --> 00:29:15,940
itself wrote, what really is happening is that somewhere, we have not only
605
00:29:15,940 --> 00:29:19,960
hello.c, which, obviously, I wrote, and wrote with us live here,
606
00:29:19,960 --> 00:29:24,550
there's also, let's assume, somewhere on the computer, a cs50.c file
607
00:29:24,550 --> 00:29:28,210
that, coincidentally, I and CS50 staff wrote years ago.
608
00:29:28,210 --> 00:29:30,790
And also, somewhere on the computer, there's another file.
609
00:29:30,790 --> 00:29:34,120
Let me oversimplify by just calling it stdio.c.
610
00:29:34,120 --> 00:29:36,850
In practice, it's probably specifically called printf.c.
611
00:29:36,850 --> 00:29:39,460
But they're somewhere, these two other files.
612
00:29:39,460 --> 00:29:44,110
And so this last step called linking takes my zeros and ones
613
00:29:44,110 --> 00:29:48,100
from the code I just wrote, namely this code on the screen here.
614
00:29:48,100 --> 00:29:50,810
It then grabs the zeros and ones that CS50 wrote.
615
00:29:50,810 --> 00:29:53,480
And it grabs the zeros and ones that the authors of C wrote,
616
00:29:53,480 --> 00:29:56,240
in order to implement the standard I/O library.
617
00:29:56,240 --> 00:30:00,750
And lastly, voila, links them all together.
618
00:30:00,750 --> 00:30:03,980
And this is the same blob of zeros and ones that we saw earlier.
619
00:30:03,980 --> 00:30:08,090
It's just now the result of preprocessing your code,
620
00:30:08,090 --> 00:30:12,620
compiling your code, assembling your code, linking your code, and my God,
621
00:30:12,620 --> 00:30:15,830
at this point, like if there were any fun in programming for you yet,
622
00:30:15,830 --> 00:30:19,620
we've just taken it all away, we just call this whole process compiling.
623
00:30:19,620 --> 00:30:20,120
Why?
624
00:30:20,120 --> 00:30:22,490
Because now that we know those steps exist--
625
00:30:22,490 --> 00:30:25,370
and smart people solve that problem for us--
626
00:30:25,370 --> 00:30:27,890
you and I can kind of operate at this level of abstraction
627
00:30:27,890 --> 00:30:32,420
and just assume that compiling converts source code to machine code.
628
00:30:32,420 --> 00:30:36,350
Questions, though, on any of these intermediate steps?
629
00:30:36,350 --> 00:30:37,360
Yeah?
630
00:30:37,360 --> 00:30:41,958
AUDIENCE: For linking, are different parts, like [INAUDIBLE]??
631
00:30:41,958 --> 00:30:50,072
632
00:30:50,072 --> 00:30:51,280
DAVID MALAN: A good question.
633
00:30:51,280 --> 00:30:53,238
So where are all of these zeros and one stored?
634
00:30:53,238 --> 00:30:56,400
Because you and I, we've been using a browser, right? code.cs50.io,
635
00:30:56,400 --> 00:30:58,330
of course, is this web-based user interface.
636
00:30:58,330 --> 00:31:00,497
But again, recall from last week, even though you're
637
00:31:00,497 --> 00:31:05,640
using a web browser to access VS Code, that web-based version of VS code
638
00:31:05,640 --> 00:31:09,000
is connected to an actual server somewhere in the cloud.
639
00:31:09,000 --> 00:31:13,170
And on that server, you have your own account and your own files, and really,
640
00:31:13,170 --> 00:31:15,360
your own hard drive, virtually in the cloud.
641
00:31:15,360 --> 00:31:18,872
Think of it a little like Dropbox or Box or Google Drive or OneDrive
642
00:31:18,872 --> 00:31:19,830
or something like that.
643
00:31:19,830 --> 00:31:23,310
So you have a hard drive somewhere out there that we've provisioned for you.
644
00:31:23,310 --> 00:31:27,930
And it's on that hard drive that you have your code that you just wrote,
645
00:31:27,930 --> 00:31:32,700
or I just wrote, cs50.c, stdio.c, and all of the other code
646
00:31:32,700 --> 00:31:36,967
that implements the math functions and everything else that C supports.
647
00:31:36,967 --> 00:31:37,550
Good question.
648
00:31:37,550 --> 00:31:38,964
Yeah?
649
00:31:38,964 --> 00:31:45,425
AUDIENCE: So, say in the CS50 library, the line [INAUDIBLE]
650
00:31:45,425 --> 00:31:49,401
do we do the same exact thing [INAUDIBLE]
651
00:31:49,401 --> 00:31:51,935
copy paste them all the way over?
652
00:31:51,935 --> 00:31:53,060
DAVID MALAN: Good question.
653
00:31:53,060 --> 00:31:57,110
That hash includes cs50.h line at the top of my code.
654
00:31:57,110 --> 00:32:01,310
If I just replace that with the contents of cs50.c, would that work?
655
00:32:01,310 --> 00:32:03,590
Short answer, yes, that would work.
656
00:32:03,590 --> 00:32:05,400
You could copy all of the code there.
657
00:32:05,400 --> 00:32:08,577
However, there's some order of operations that might come into play.
658
00:32:08,577 --> 00:32:10,910
And so it's probably not quite as simple as copy, paste.
659
00:32:10,910 --> 00:32:13,190
But conceptually, yes, that's what's happening.
660
00:32:13,190 --> 00:32:19,370
Now with that said, in cs50.h, are only the prototypes of the functions,
661
00:32:19,370 --> 00:32:23,628
the hints as to how the functions look, what their return type is,
662
00:32:23,628 --> 00:32:25,670
what their name is, and what their arguments are.
663
00:32:25,670 --> 00:32:29,867
It's in the dot c file that actual code tends to be written.
664
00:32:29,867 --> 00:32:32,450
And this is a little confusing now because you and I have only
665
00:32:32,450 --> 00:32:33,920
written code in dot c files.
666
00:32:33,920 --> 00:32:35,690
But in the next few weeks, you'll actually
667
00:32:35,690 --> 00:32:37,940
start writing some of your own dot h files
668
00:32:37,940 --> 00:32:40,460
as well, just like CS50, just like standard I/O.
669
00:32:40,460 --> 00:32:44,150
But in essence, that line of code just makes it easier to use and reuse
670
00:32:44,150 --> 00:32:46,020
code that's already been written.
671
00:32:46,020 --> 00:32:47,750
And that's the whole point of a library.
672
00:32:47,750 --> 00:32:50,327
AUDIENCE: Does linking them [INAUDIBLE]?
673
00:32:50,327 --> 00:32:51,910
DAVID MALAN: Say that a little louder.
674
00:32:51,910 --> 00:32:54,472
AUDIENCE: Does linking happen when you use the compiler?
675
00:32:54,472 --> 00:32:55,180
DAVID MALAN: Yes.
676
00:32:55,180 --> 00:32:56,980
Does linking happen when you compile your code?
677
00:32:56,980 --> 00:32:57,480
Yes.
678
00:32:57,480 --> 00:33:02,320
When you run make, as we have been doing the past week now,
679
00:33:02,320 --> 00:33:04,570
all four of these steps are happening.
680
00:33:04,570 --> 00:33:07,780
Preprocessing converts the hash include lines to something else.
681
00:33:07,780 --> 00:33:10,600
Compiling technically converts it to assembly
682
00:33:10,600 --> 00:33:14,290
code, which the Mac, the PC, the server more closely understands.
683
00:33:14,290 --> 00:33:18,850
Assembly converts that language to binary machine code that this computer
684
00:33:18,850 --> 00:33:20,080
actually understands.
685
00:33:20,080 --> 00:33:22,540
And then linking combines everything together.
686
00:33:22,540 --> 00:33:27,550
And in fact, if you think back a few minutes ago to when I did this -lcs50,
687
00:33:27,550 --> 00:33:30,070
the reason I had to add that, and the reason
688
00:33:30,070 --> 00:33:32,860
my code did not compile at first, was because I
689
00:33:32,860 --> 00:33:38,650
forgot to tell clang to link in CS50's zeros and ones per that last step.
690
00:33:38,650 --> 00:33:42,147
I don't need to do -lstdio because it comes with C,
691
00:33:42,147 --> 00:33:44,480
so that would just be tedious for everyone in the world.
692
00:33:44,480 --> 00:33:47,140
But CS50 does not come with C, so we link that in.
693
00:33:47,140 --> 00:33:49,780
And to be clear, too, we won't always use CS50's library.
694
00:33:49,780 --> 00:33:53,072
That'll be yet another pair of training wheels we take off in the coming weeks.
695
00:33:53,072 --> 00:33:55,000
But for now, it makes a few things simpler.
696
00:33:55,000 --> 00:33:57,284
Yeah?
697
00:33:57,284 --> 00:33:59,750
AUDIENCE: What is the [INAUDIBLE]?
698
00:33:59,750 --> 00:34:08,878
699
00:34:08,878 --> 00:34:10,170
DAVID MALAN: Short answer, yes.
700
00:34:10,170 --> 00:34:12,870
So what do the zeros and ones, the machine code, translate to?
701
00:34:12,870 --> 00:34:15,690
Yes, there is a one-to-one relationship between the machine
702
00:34:15,690 --> 00:34:17,340
code and the assembly code.
703
00:34:17,340 --> 00:34:21,510
Assembly code, it's not really English, but at least it's symbols I recognize.
704
00:34:21,510 --> 00:34:22,800
It's not zeros and ones.
705
00:34:22,800 --> 00:34:24,810
Machine code, of course, is just zeros and ones.
706
00:34:24,810 --> 00:34:27,960
So back in the day, before C existed, people
707
00:34:27,960 --> 00:34:30,630
were programming only in assembly code.
708
00:34:30,630 --> 00:34:34,469
Before assembly code existed, people were coding in zeros and ones.
709
00:34:34,469 --> 00:34:36,719
And you can imagine just how painful that was,
710
00:34:36,719 --> 00:34:39,027
and so each of these languages makes life, for us,
711
00:34:39,027 --> 00:34:40,110
sort of easier and easier.
712
00:34:40,110 --> 00:34:42,330
In a few weeks, we'll transition to Python, which
713
00:34:42,330 --> 00:34:45,300
will, in turn, make C even simpler--
714
00:34:45,300 --> 00:34:48,090
or coding, in general, simpler to do too.
715
00:34:48,090 --> 00:34:53,346
All right, so with that said, what now can we--
716
00:34:53,346 --> 00:34:55,060
what could go wrong with this?
717
00:34:55,060 --> 00:34:58,140
Well, it turns out that besides compiling, technically speaking,
718
00:34:58,140 --> 00:34:59,233
there's decompiling.
719
00:34:59,233 --> 00:35:01,150
And we've not done this, and we won't do this.
720
00:35:01,150 --> 00:35:04,080
But it's worth considering for just a moment.
721
00:35:04,080 --> 00:35:07,560
If you were to not compile your code, but decompile it--
722
00:35:07,560 --> 00:35:11,340
as the word suggests, this just means reversing the process, converting it,
723
00:35:11,340 --> 00:35:14,580
ideally, from machine code-- zeros and ones--
724
00:35:14,580 --> 00:35:19,870
maybe back to C. Now this would be cool, perhaps, if all you have is a program,
725
00:35:19,870 --> 00:35:22,080
you can convert it and see the actual source code.
726
00:35:22,080 --> 00:35:25,320
What might a downside be, if anyone on the internet
727
00:35:25,320 --> 00:35:28,650
is able to decompile code on their machine?
728
00:35:28,650 --> 00:35:29,160
Yeah?
729
00:35:29,160 --> 00:35:30,270
AUDIENCE: [INAUDIBLE]
730
00:35:30,270 --> 00:35:34,130
DAVID MALAN: OK, so it's easier to find bugs in the code that--
731
00:35:34,130 --> 00:35:35,430
oh, to exploit.
732
00:35:35,430 --> 00:35:38,417
So it might be easier to hack into the software
733
00:35:38,417 --> 00:35:41,000
by finding mistakes you and I made because, literally, they're
734
00:35:41,000 --> 00:35:43,370
staring at you in code, whereas the zeros and ones make
735
00:35:43,370 --> 00:35:45,080
it way less obvious.
736
00:35:45,080 --> 00:35:48,140
Other downsides of what I called decompiling?
737
00:35:48,140 --> 00:35:49,970
Yeah?
738
00:35:49,970 --> 00:35:53,690
AUDIENCE: If stuff is copyrighted or you don't even know how to get it--
739
00:35:53,690 --> 00:35:54,440
DAVID MALAN: Yeah.
740
00:35:54,440 --> 00:35:55,948
AUDIENCE: [INAUDIBLE]
741
00:35:55,948 --> 00:35:57,740
DAVID MALAN: Yeah, if your code, your work,
742
00:35:57,740 --> 00:36:00,950
is your intellectual property, copyrighted or otherwise, that's
743
00:36:00,950 --> 00:36:03,660
kind of obnoxious that someone can just run a command, and boom,
744
00:36:03,660 --> 00:36:05,577
they can see the original code that you wrote.
745
00:36:05,577 --> 00:36:08,490
Now, it turns out it's not quite as simple as that.
746
00:36:08,490 --> 00:36:11,720
And so even though, yes, you could take a program like Hello,
747
00:36:11,720 --> 00:36:15,080
or even Microsoft Word, and convert it from zeros and ones
748
00:36:15,080 --> 00:36:19,400
back to some form of source code-- be it in C or Java
749
00:36:19,400 --> 00:36:22,820
or Python or something else, whatever it was originally written in-- odds
750
00:36:22,820 --> 00:36:25,800
are it's going to be an utter mess to look at.
751
00:36:25,800 --> 00:36:26,300
Why?
752
00:36:26,300 --> 00:36:30,390
Because things variable names are not retained in the zeros and ones,
753
00:36:30,390 --> 00:36:30,890
typically.
754
00:36:30,890 --> 00:36:33,980
Function names might not be retained in the zeros and ones.
755
00:36:33,980 --> 00:36:36,350
The code is, the logic is, but the computer
756
00:36:36,350 --> 00:36:38,510
doesn't care what pretty variables you chose
757
00:36:38,510 --> 00:36:41,060
and how nicely named your functions were, it just
758
00:36:41,060 --> 00:36:42,890
needs to know them as zeros and ones.
759
00:36:42,890 --> 00:36:46,370
Moreover, if you think about last week, we introduced things like loops in C.
760
00:36:46,370 --> 00:36:49,745
And besides for loops, there's what other kind of loop, for instance?
761
00:36:49,745 --> 00:36:50,620
AUDIENCE: [INAUDIBLE]
762
00:36:50,620 --> 00:36:53,412
DAVID MALAN: So, a while loop-- and even though they look different
763
00:36:53,412 --> 00:36:55,920
and you have to write different code, they achieve exactly
764
00:36:55,920 --> 00:36:59,910
the same functionality, which is to say, when you compile a for loop
765
00:36:59,910 --> 00:37:04,140
or you compile a while loop, if they logically do the same thing,
766
00:37:04,140 --> 00:37:07,420
they might end up looking identical as zeros and ones.
767
00:37:07,420 --> 00:37:09,780
And so, therefore, it's not necessarily predictable
768
00:37:09,780 --> 00:37:11,820
that you'll get back the original code, why?
769
00:37:11,820 --> 00:37:15,110
Because the zeros and ones might not know, so to speak,
770
00:37:15,110 --> 00:37:16,860
whether it was a for loop or a while loop,
771
00:37:16,860 --> 00:37:19,350
so maybe compiling will show you one or the other.
772
00:37:19,350 --> 00:37:21,870
And honestly, decompiling, while possible-- and it's
773
00:37:21,870 --> 00:37:24,570
one way of reverse engineering someone's product.
774
00:37:24,570 --> 00:37:28,662
Odds are, if you're good enough to start reading code that's been decompiled
775
00:37:28,662 --> 00:37:30,870
and reading through the messiness of it, odds are you
776
00:37:30,870 --> 00:37:34,020
have the talent probably to just write that same program from scratch
777
00:37:34,020 --> 00:37:34,650
yourself.
778
00:37:34,650 --> 00:37:36,870
Now, that's an overstatement, perhaps, but it's not
779
00:37:36,870 --> 00:37:40,410
quite as easy or threatening as you might first think.
780
00:37:40,410 --> 00:37:43,290
So in general, once code is compiled, it's
781
00:37:43,290 --> 00:37:48,290
pretty challenging, time consuming, costly to reverse engineer it, much
782
00:37:48,290 --> 00:37:50,040
like it would be in the real world, right?
783
00:37:50,040 --> 00:37:52,860
Like all of us have some kind of phone, probably, nowadays in our pocket.
784
00:37:52,860 --> 00:37:55,193
There's nothing stopping you from opening it up somehow,
785
00:37:55,193 --> 00:37:57,060
poking around, recreating what's there.
786
00:37:57,060 --> 00:37:59,130
That's a huge amount of effort, most likely.
787
00:37:59,130 --> 00:38:01,880
And at that point, maybe you should just invent the phone, instead
788
00:38:01,880 --> 00:38:03,310
of trying to reverse engineer it.
789
00:38:03,310 --> 00:38:06,330
So same kind of idea in the physical world.
790
00:38:06,330 --> 00:38:13,050
Any questions, then, on compiling, or even decompiling in these forms?
791
00:38:13,050 --> 00:38:17,160
All right, so odds are, at this point, not only I, but you have made mistakes.
792
00:38:17,160 --> 00:38:19,050
And you've written buggy code--
793
00:38:19,050 --> 00:38:22,350
a bug in a code is just a mistake, a logical error
794
00:38:22,350 --> 00:38:26,490
or otherwise, where the code just does not behave correctly as you intend.
795
00:38:26,490 --> 00:38:29,880
And up until now, odds are, your debugging techniques
796
00:38:29,880 --> 00:38:32,910
have been to maybe look back at what I did in class, maybe
797
00:38:32,910 --> 00:38:35,320
ask a question online or in-person.
798
00:38:35,320 --> 00:38:38,190
But ultimately, it'd be nice if you had some tools of your own
799
00:38:38,190 --> 00:38:39,570
with which to debug code.
800
00:38:39,570 --> 00:38:41,587
And this, honestly, is a lifelong skill.
801
00:38:41,587 --> 00:38:43,170
You're not going to emerge from CS50--
802
00:38:43,170 --> 00:38:44,490
and even 20 years from now, you're not going
803
00:38:44,490 --> 00:38:47,910
to be writing-- if you're writing code at all-- correct code all of the time.
804
00:38:47,910 --> 00:38:50,820
Like, all of us on the staff continue to write bugs.
805
00:38:50,820 --> 00:38:54,120
Hopefully, they get a little more sophisticated, and not sort of like,
806
00:38:54,120 --> 00:38:55,540
oops, I missed a semicolon.
807
00:38:55,540 --> 00:38:57,660
But even those kinds of mistakes, we make too.
808
00:38:57,660 --> 00:39:00,150
But there's tools out there and techniques
809
00:39:00,150 --> 00:39:03,550
that can make your life easier when it comes to solving those problems.
810
00:39:03,550 --> 00:39:06,360
Now, the term bug has actually been around for decades.
811
00:39:06,360 --> 00:39:11,790
But a fun story to tell is that the first documented actual bug was
812
00:39:11,790 --> 00:39:13,650
actually somehow connected to Harvard.
813
00:39:13,650 --> 00:39:18,870
In fact, this is the logbook relating to the Harvard Mark II computer
814
00:39:18,870 --> 00:39:22,890
from 1947, whereby if you read the notes here-- and I'll Zoom in-- this
815
00:39:22,890 --> 00:39:27,630
was an actual moth discovered inside of this big mainframe computer that
816
00:39:27,630 --> 00:39:29,160
was causing some kind of problems.
817
00:39:29,160 --> 00:39:30,450
And the engineers there at the time actually
818
00:39:30,450 --> 00:39:33,610
thought it was funny that, wow, physical bug actually explains the issue.
819
00:39:33,610 --> 00:39:36,450
And it's been forever taped to this sheet of paper, which I believe
820
00:39:36,450 --> 00:39:39,090
now is on display in the Smithsonian.
821
00:39:39,090 --> 00:39:43,260
With that said, this is just representative, too, of a logical bug.
822
00:39:43,260 --> 00:39:45,390
And that story is actually--
823
00:39:45,390 --> 00:39:49,170
that story was often retold by a famous mathematician, then computer scientist
824
00:39:49,170 --> 00:39:53,640
really, Dr. Grace Hopper, who actually worked not only on the Harvard Mark II
825
00:39:53,640 --> 00:39:57,210
computer, but its predecessor, the Harvard Mark I.
826
00:39:57,210 --> 00:40:01,020
And if you ever spent time, yet, in the engineering building across the river
827
00:40:01,020 --> 00:40:04,103
here, you can actually see much of this computer, which
828
00:40:04,103 --> 00:40:07,020
is along the wall when you first walk into the Science and Engineering
829
00:40:07,020 --> 00:40:07,530
Complex.
830
00:40:07,530 --> 00:40:09,530
And indeed, as you've probably heard growing up,
831
00:40:09,530 --> 00:40:11,070
this is a mainframe computer.
832
00:40:11,070 --> 00:40:15,210
This is what Macs and PCs, so to speak, looked like back in the day,
833
00:40:15,210 --> 00:40:18,240
with very physical things that essentially implemented the zeros
834
00:40:18,240 --> 00:40:21,900
and ones that you and I take for granted now being miniaturized in our laptops
835
00:40:21,900 --> 00:40:22,410
and phones.
836
00:40:22,410 --> 00:40:23,910
So there's a piece of history there.
837
00:40:23,910 --> 00:40:27,390
If you visit that side of campus sometime, do take a look.
838
00:40:27,390 --> 00:40:30,480
But let's consider, then, how we solve not, of course, physical bugs,
839
00:40:30,480 --> 00:40:31,350
but logical bugs.
840
00:40:31,350 --> 00:40:33,600
And let's consider something like this from last week,
841
00:40:33,600 --> 00:40:38,820
whereby, we were trying very simply to print like this column of three bricks
842
00:40:38,820 --> 00:40:40,320
using hashtags of sorts.
843
00:40:40,320 --> 00:40:44,400
So let me go over here in just a moment to VS Code.
844
00:40:44,400 --> 00:40:47,080
And I'm going to go ahead and open a program I wrote in advance.
845
00:40:47,080 --> 00:40:49,455
And I'm bringing it to class because there's a bug in it,
846
00:40:49,455 --> 00:40:51,510
and I'd like to figure out how to solve this bug.
847
00:40:51,510 --> 00:40:56,160
So let me open up a buggy0.c, which is version 0 of my code.
848
00:40:56,160 --> 00:40:58,200
And let's just take a quick peek at what's here.
849
00:40:58,200 --> 00:40:58,950
It's pretty short.
850
00:40:58,950 --> 00:41:03,750
It includes only stdio.h, it uses printf, it uses a for loop,
851
00:41:03,750 --> 00:41:07,797
and the goal, quite simply, is to print out that column of three bricks.
852
00:41:07,797 --> 00:41:11,130
Now, it's short enough that some of you, if you're getting comfy already with C,
853
00:41:11,130 --> 00:41:13,360
you might already see the logical bug.
854
00:41:13,360 --> 00:41:16,200
It's not a syntax error, like it will compile and run.
855
00:41:16,200 --> 00:41:17,280
But there's a bug there.
856
00:41:17,280 --> 00:41:22,320
And suppose that I'm very new to C, I'm very uncomfortable with C, it's 2:00 AM
857
00:41:22,320 --> 00:41:26,130
and I just can't see the bug, what are my recourses here for actually
858
00:41:26,130 --> 00:41:27,745
finding a mistake like this?
859
00:41:27,745 --> 00:41:29,370
Well, first, let's look at the symptom.
860
00:41:29,370 --> 00:41:31,740
Let me go down to my terminal window.
861
00:41:31,740 --> 00:41:36,120
I'm going to use make buggy0 because, again, the file is called buggyo.c.
862
00:41:36,120 --> 00:41:37,260
I'm not going to use clang.
863
00:41:37,260 --> 00:41:39,880
In fact, I'm never really going to use clang manually here on out.
864
00:41:39,880 --> 00:41:42,430
I'm just going to use make because it makes our lives easier.
865
00:41:42,430 --> 00:41:43,560
It does compile.
866
00:41:43,560 --> 00:41:45,390
No errors, so it's not syntax.
867
00:41:45,390 --> 00:41:47,670
It's not something silly like a missing semicolon.
868
00:41:47,670 --> 00:41:53,190
But when I run ./buggy0, I, of course, see one, two, three, four--
869
00:41:53,190 --> 00:41:57,990
and this, of course, does not match the one, two, three bricks that I actually
870
00:41:57,990 --> 00:41:59,610
intended for that column.
871
00:41:59,610 --> 00:42:02,970
And yet, I'm starting counting at 0, as I usually do.
872
00:42:02,970 --> 00:42:03,930
I've got three.
873
00:42:03,930 --> 00:42:05,280
I'm going up to three.
874
00:42:05,280 --> 00:42:06,780
So where is my logical error?
875
00:42:06,780 --> 00:42:10,150
If it hasn't obviously jumped out at you already, well, how can I solve this?
876
00:42:10,150 --> 00:42:13,080
Well, first and foremost, perhaps the best technique
877
00:42:13,080 --> 00:42:16,080
for solving bugs, at least early on, is just use printf.
878
00:42:16,080 --> 00:42:20,020
Like thus far, we've used sprint say, Hello, and other things on the screen.
879
00:42:20,020 --> 00:42:22,530
But printf is just a function for printing anything.
880
00:42:22,530 --> 00:42:24,570
And there's no reason you can't temporarily
881
00:42:24,570 --> 00:42:27,900
use printf to print out the contents of variables,
882
00:42:27,900 --> 00:42:29,850
what's going on inside of your program, just
883
00:42:29,850 --> 00:42:31,350
to figure out where your mistake is.
884
00:42:31,350 --> 00:42:32,940
And then you can delete that line of code later.
885
00:42:32,940 --> 00:42:34,600
It doesn't have to stay there forever.
886
00:42:34,600 --> 00:42:35,740
So let me do this.
887
00:42:35,740 --> 00:42:39,450
Instead of just printing out in VS Code the hash symbol,
888
00:42:39,450 --> 00:42:45,690
let me do a little safety check here and print out the value of i.
889
00:42:45,690 --> 00:42:49,170
So let me go ahead and say something like, i is--
890
00:42:49,170 --> 00:42:51,610
now I want to say i is this.
891
00:42:51,610 --> 00:42:54,540
But, of course, this is not how I print out the value of i.
892
00:42:54,540 --> 00:42:58,930
If I want to print out the value of i, what should I put here?
893
00:42:58,930 --> 00:43:02,160
So %i for integer, instead of %s for string.
894
00:43:02,160 --> 00:43:03,410
So they're still placeholders.
895
00:43:03,410 --> 00:43:04,930
But we use %s for integers.
896
00:43:04,930 --> 00:43:08,450
And now if I want to print out i, I just need the comma as the second argument,
897
00:43:08,450 --> 00:43:09,250
and then i.
898
00:43:09,250 --> 00:43:13,000
All right, let me go ahead and back to my terminal window.
899
00:43:13,000 --> 00:43:15,760
Let me recompile the program because I've changed it.
900
00:43:15,760 --> 00:43:18,880
That still works fine, ./buggy0.
901
00:43:18,880 --> 00:43:22,540
And now, let me increase the size of my terminal window here.
902
00:43:22,540 --> 00:43:25,510
You just see some diagnostic information, if you will.
903
00:43:25,510 --> 00:43:26,560
This is not the goal.
904
00:43:26,560 --> 00:43:29,393
This is not what you should be submitting for this homework problem,
905
00:43:29,393 --> 00:43:30,070
were it one.
906
00:43:30,070 --> 00:43:33,730
But it is helping us diagnostically know that, OK, when i is zero,
907
00:43:33,730 --> 00:43:34,450
here's a hash.
908
00:43:34,450 --> 00:43:36,182
When i is 1, here's a hash.
909
00:43:36,182 --> 00:43:37,390
When i is two, here's a hash.
910
00:43:37,390 --> 00:43:39,017
When i is 3, here's a hash.
911
00:43:39,017 --> 00:43:39,850
Well, wait a minute.
912
00:43:39,850 --> 00:43:41,530
That's one, two, three, four.
913
00:43:41,530 --> 00:43:44,360
So clearly, I'm printing it one too many times.
914
00:43:44,360 --> 00:43:48,130
So let me look back at the code here by shrinking my terminal window.
915
00:43:48,130 --> 00:43:53,080
And let me just ask the group, where is, in fact, the mistake?
916
00:43:53,080 --> 00:43:56,080
Or what, equivalently, would be the solution?
917
00:43:56,080 --> 00:43:57,561
Yeah, in the middle.
918
00:43:57,561 --> 00:44:00,020
AUDIENCE: [INAUDIBLE]
919
00:44:00,020 --> 00:44:03,550
DAVID MALAN: Yeah, instead of less than or equal to, use just less than.
920
00:44:03,550 --> 00:44:05,300
So you've got to kind of pick a lane here.
921
00:44:05,300 --> 00:44:08,630
If you're going to start counting from 0, you generally use less than,
922
00:44:08,630 --> 00:44:10,880
and go up to, but not through the value.
923
00:44:10,880 --> 00:44:13,970
Or if you prefer, like in the human world, counting from 1 on up,
924
00:44:13,970 --> 00:44:17,300
you can use less than or equal to, but you have to be consistent.
925
00:44:17,300 --> 00:44:19,790
And in general, as a programmer, just always start
926
00:44:19,790 --> 00:44:22,610
counting from 0 if you're doing something canonical like this.
927
00:44:22,610 --> 00:44:25,160
But the solution is, indeed, just to change this
928
00:44:25,160 --> 00:44:27,860
by changing the greater less than or equal to the less than.
929
00:44:27,860 --> 00:44:34,340
If I recompile this program with make buggy0, and then do .buggy0 again--
930
00:44:34,340 --> 00:44:36,500
and let me increase the size of my terminal window.
931
00:44:36,500 --> 00:44:39,050
Now, you see, OK, almost the same output.
932
00:44:39,050 --> 00:44:44,330
But indeed, i starts at 0 and goes up to, but not through, three.
933
00:44:44,330 --> 00:44:48,920
All right, so printf, in short, can be your first diagnostic tool.
934
00:44:48,920 --> 00:44:51,500
Instead of just staring at the screen or raising your hand--
935
00:44:51,500 --> 00:44:55,490
I mean, use printf to see, literally, what's going on inside of your program
936
00:44:55,490 --> 00:44:57,287
by just printing out things of interest.
937
00:44:57,287 --> 00:44:59,120
And then once you've solved the problem, you
938
00:44:59,120 --> 00:45:02,840
can go back into your code, as I'll do here, by shrinking my terminal window.
939
00:45:02,840 --> 00:45:04,610
I'll delete the printf line.
940
00:45:04,610 --> 00:45:07,100
And now I'm ready to share this program with the world
941
00:45:07,100 --> 00:45:08,870
or submit it as homework or the like.
942
00:45:08,870 --> 00:45:11,390
It's just meant there to be temporary.
943
00:45:11,390 --> 00:45:15,440
Any questions on printf as a debugging tool?
944
00:45:15,440 --> 00:45:18,010
945
00:45:18,010 --> 00:45:18,510
No?
946
00:45:18,510 --> 00:45:20,970
All right, well, that only gets us so far.
947
00:45:20,970 --> 00:45:23,430
And honestly, as your programs grow and grow and grow,
948
00:45:23,430 --> 00:45:25,180
it's going to actually get really annoying
949
00:45:25,180 --> 00:45:28,860
to start going in and adding printf's, then removing them, and figuring out,
950
00:45:28,860 --> 00:45:31,860
if you've got multiple printf's, well, which one printed what?
951
00:45:31,860 --> 00:45:34,560
It just gets messy, eventually, to rely on printf alone.
952
00:45:34,560 --> 00:45:37,740
So being a computer scientist, computer scientists
953
00:45:37,740 --> 00:45:41,040
have written software to make it easier to debug code.
954
00:45:41,040 --> 00:45:44,040
That software is what we would generally call a debugger, which
955
00:45:44,040 --> 00:45:47,040
would be the second tool of the trade that you can use to actually solve
956
00:45:47,040 --> 00:45:48,610
problems in your code.
957
00:45:48,610 --> 00:45:52,690
Now, in the world of VS code, there's actually a debugger built in.
958
00:45:52,690 --> 00:45:54,840
So the graphical user interface you're about to see
959
00:45:54,840 --> 00:45:58,260
in VS Code isn't specific to CS50, it actually comes with VS Code.
960
00:45:58,260 --> 00:46:01,230
And it supports C, and C++, and Java, and Python,
961
00:46:01,230 --> 00:46:03,030
and lots of other languages too.
962
00:46:03,030 --> 00:46:05,640
But it's, admittedly, a little complicated
963
00:46:05,640 --> 00:46:07,650
to just start using the debugger.
964
00:46:07,650 --> 00:46:10,200
You have to create a configuration file and do
965
00:46:10,200 --> 00:46:13,480
some annoying steps that just get in the way of solving real problems.
966
00:46:13,480 --> 00:46:17,070
So we have automated the process for you of just starting the debugger.
967
00:46:17,070 --> 00:46:19,680
And thereafter, it's sort of industry standard how you use it.
968
00:46:19,680 --> 00:46:23,380
But we save you the headache of having to create those configuration files.
969
00:46:23,380 --> 00:46:25,330
So, suppose I want to do this.
970
00:46:25,330 --> 00:46:27,600
Suppose I want to try to debug this program
971
00:46:27,600 --> 00:46:30,330
step by step using special software.
972
00:46:30,330 --> 00:46:31,810
Well, how can I do that?
973
00:46:31,810 --> 00:46:36,240
Well, let me propose that if I revert this back to the original version
974
00:46:36,240 --> 00:46:40,530
where i was less than or equal to 3, I'm pretty sure that I
975
00:46:40,530 --> 00:46:41,790
was printing too many hashes.
976
00:46:41,790 --> 00:46:43,350
So I'm going to do this-- and you might have done this
977
00:46:43,350 --> 00:46:45,160
accidentally or never at all.
978
00:46:45,160 --> 00:46:49,500
But notice if you hover over the gutter, so to speak, in VS Code, the part of it
979
00:46:49,500 --> 00:46:52,590
all the way to the left of the editor, you see this sort of grayed
980
00:46:52,590 --> 00:46:54,390
out red dot.
981
00:46:54,390 --> 00:46:57,240
If you click there, it becomes a brighter red dot.
982
00:46:57,240 --> 00:46:59,670
And this represents what we're going to call a breakpoint.
983
00:46:59,670 --> 00:47:03,090
And this is just a visual indicator that you've put like a stop sign equivalent
984
00:47:03,090 --> 00:47:06,270
there, and you're telling the debugger in a moment, stop
985
00:47:06,270 --> 00:47:07,350
running my code there.
986
00:47:07,350 --> 00:47:07,920
Why?
987
00:47:07,920 --> 00:47:11,610
Because I prefer to step through my code at sort of a human speed,
988
00:47:11,610 --> 00:47:14,380
and not as computer speed where it runs all at once.
989
00:47:14,380 --> 00:47:16,750
So I've set my breakpoint, which is step one.
990
00:47:16,750 --> 00:47:18,580
And then step two is quite simply this.
991
00:47:18,580 --> 00:47:23,190
Instead of running the program itself, run the command called debug50,
992
00:47:23,190 --> 00:47:26,010
and then ./buggy0.
993
00:47:26,010 --> 00:47:29,220
And now this will start your program, but inside
994
00:47:29,220 --> 00:47:31,200
of the debugger, which is a special program
995
00:47:31,200 --> 00:47:33,060
that smart people wrote that will empower
996
00:47:33,060 --> 00:47:38,190
you to now step through your code line by line, and again, at your own comfort
997
00:47:38,190 --> 00:47:38,970
pace.
998
00:47:38,970 --> 00:47:43,080
I'm going to hit Enter, some stuff's going to happen on the screen-- whoops.
999
00:47:43,080 --> 00:47:45,767
Notice, this is a common mistake that I made accidentally here.
1000
00:47:45,767 --> 00:47:47,100
Looks like I've changed my code.
1001
00:47:47,100 --> 00:47:49,892
I did because I went in and changed the less than or equal to sign.
1002
00:47:49,892 --> 00:47:52,860
So let me go ahead and rerun make buggy0--
1003
00:47:52,860 --> 00:47:53,520
Enter.
1004
00:47:53,520 --> 00:47:55,590
Good, now let me rerun debug50--
1005
00:47:55,590 --> 00:47:57,810
Enter.
1006
00:47:57,810 --> 00:47:59,760
And now some stuff just happened on the screen
1007
00:47:59,760 --> 00:48:03,270
and it takes a moment to get started but once it's started you'll
1008
00:48:03,270 --> 00:48:06,010
see this you'll still see your code.
1009
00:48:06,010 --> 00:48:09,410
But you'll see this yellow highlight, which you've probably not seen before.
1010
00:48:09,410 --> 00:48:11,910
And notice that it's specifically highlighting the same line
1011
00:48:11,910 --> 00:48:13,440
that I set a breakpoint on.
1012
00:48:13,440 --> 00:48:13,950
Why?
1013
00:48:13,950 --> 00:48:18,870
That just means the debugger has executed all of these lines,
1014
00:48:18,870 --> 00:48:20,670
except for line 7.
1015
00:48:20,670 --> 00:48:23,340
It has broken at-- not in a bad way.
1016
00:48:23,340 --> 00:48:27,580
But it has paused execution on line 7, so it hasn't yet printed any hashes.
1017
00:48:27,580 --> 00:48:30,450
And you can see that-- no hashes in the terminal window yet.
1018
00:48:30,450 --> 00:48:31,980
It's paused execution.
1019
00:48:31,980 --> 00:48:35,190
But what's interesting with the debugger is the stuff
1020
00:48:35,190 --> 00:48:37,410
over here on the left-hand side.
1021
00:48:37,410 --> 00:48:39,960
In the debugger here, you'll see, under variables,
1022
00:48:39,960 --> 00:48:41,910
all of your so-called local variables.
1023
00:48:41,910 --> 00:48:44,160
And we haven't really made a distinction between local
1024
00:48:44,160 --> 00:48:45,327
and something called global.
1025
00:48:45,327 --> 00:48:48,000
But for now, local variables just means all of the variables
1026
00:48:48,000 --> 00:48:49,390
that exist in your function.
1027
00:48:49,390 --> 00:48:52,110
So i currently has a value of 0.
1028
00:48:52,110 --> 00:48:53,410
OK, and that makes sense.
1029
00:48:53,410 --> 00:48:57,360
So now, how do I step through my code and see what it's doing?
1030
00:48:57,360 --> 00:48:59,610
Well, at the top of the screen here, you'll
1031
00:48:59,610 --> 00:49:02,250
see some playback icons, kind of like a video player,
1032
00:49:02,250 --> 00:49:03,630
but they have special meaning.
1033
00:49:03,630 --> 00:49:07,892
This first one will just play the rest of your program all the way to the end.
1034
00:49:07,892 --> 00:49:10,350
So you only click that if you've sort of solved the problem
1035
00:49:10,350 --> 00:49:13,110
and you just want to run it to completion like before.
1036
00:49:13,110 --> 00:49:14,370
But the next three--
1037
00:49:14,370 --> 00:49:16,920
or next two, really, are really the juiciest.
1038
00:49:16,920 --> 00:49:19,710
The second one here, if you hover over it, eventually,
1039
00:49:19,710 --> 00:49:21,930
you'll see that it's called Step Over.
1040
00:49:21,930 --> 00:49:25,170
Step Over means that the debugger will run
1041
00:49:25,170 --> 00:49:28,630
this currently highlighted line of code, but it's not going to dive into it.
1042
00:49:28,630 --> 00:49:30,660
So if it's a function like printf, it's not
1043
00:49:30,660 --> 00:49:32,827
going to start stepping through printf line by line.
1044
00:49:32,827 --> 00:49:33,327
Why?
1045
00:49:33,327 --> 00:49:36,420
Because I can pretty much assume printf, written decades ago, is correct.
1046
00:49:36,420 --> 00:49:38,050
Problem's probably with me.
1047
00:49:38,050 --> 00:49:42,690
But this next line, if I did really want to step into the printf code
1048
00:49:42,690 --> 00:49:46,110
to figure out how it works or find some problem in it all these years later,
1049
00:49:46,110 --> 00:49:48,810
you can step into printf, and then the screen would change,
1050
00:49:48,810 --> 00:49:50,910
and you'd see each of the lines for printf,
1051
00:49:50,910 --> 00:49:54,250
line by line-- at least if you have the source code for printf installed.
1052
00:49:54,250 --> 00:49:56,490
All right, I'm going to use the first one, Step Over.
1053
00:49:56,490 --> 00:49:59,130
And watch as the yellow highlight moves.
1054
00:49:59,130 --> 00:50:03,060
And watch as, in the terminal window, there's a hash symbol.
1055
00:50:03,060 --> 00:50:03,780
Here we go.
1056
00:50:03,780 --> 00:50:05,130
There's one hash.
1057
00:50:05,130 --> 00:50:07,230
Now, notice line 5 is highlighted.
1058
00:50:07,230 --> 00:50:09,480
That means it has paused on line 5.
1059
00:50:09,480 --> 00:50:11,350
Line 5 has not yet been executed.
1060
00:50:11,350 --> 00:50:12,600
So what does that mean?
1061
00:50:12,600 --> 00:50:16,320
The value of i, per the top left-hand corner, is still 0.
1062
00:50:16,320 --> 00:50:18,920
But as soon as I click Step Over again, watch
1063
00:50:18,920 --> 00:50:24,470
what happens at the top left, where i is a variable on the screen.
1064
00:50:24,470 --> 00:50:26,420
Now i-- and it flashed briefly--
1065
00:50:26,420 --> 00:50:27,920
has a value of 1.
1066
00:50:27,920 --> 00:50:30,650
And now if I step over again, watch the terminal window.
1067
00:50:30,650 --> 00:50:32,120
There's my second hash.
1068
00:50:32,120 --> 00:50:36,380
Now, let me click Step Over on for loop, watch the variable at top left.
1069
00:50:36,380 --> 00:50:38,567
Now 1 goes to 2.
1070
00:50:38,567 --> 00:50:39,650
Now let me click it again.
1071
00:50:39,650 --> 00:50:43,220
Third hash-- and here's where the logical error is perhaps revealed.
1072
00:50:43,220 --> 00:50:45,210
Let me go ahead and step over the loop.
1073
00:50:45,210 --> 00:50:46,520
Now i is 3.
1074
00:50:46,520 --> 00:50:49,280
Wait a minute, I'm still going to print out a hash.
1075
00:50:49,280 --> 00:50:49,810
There it is.
1076
00:50:49,810 --> 00:50:50,810
There's the fourth hash.
1077
00:50:50,810 --> 00:50:53,852
And at this point, hopefully, the light bulb, proverbially, has gone off.
1078
00:50:53,852 --> 00:50:55,020
I realize, oh, I screwed up.
1079
00:50:55,020 --> 00:50:58,580
I can either stop the program altogether with the red square,
1080
00:50:58,580 --> 00:51:01,100
or I can just let it run all the way to the end, which
1081
00:51:01,100 --> 00:51:02,493
just terminates everything.
1082
00:51:02,493 --> 00:51:05,660
At this point, I just want to get back into my code and start fixing things.
1083
00:51:05,660 --> 00:51:07,700
And you can close, for instance, as I will here,
1084
00:51:07,700 --> 00:51:10,670
the File Explorer, just to hide the panel that opened.
1085
00:51:10,670 --> 00:51:12,320
So that's debug50.
1086
00:51:12,320 --> 00:51:15,920
But it's not a CS50 thing, that just starts the debugger for you, which
1087
00:51:15,920 --> 00:51:19,520
is something you'd find in most any programming environment nowadays.
1088
00:51:19,520 --> 00:51:23,670
Questions on debugging?
1089
00:51:23,670 --> 00:51:24,170
Questions?
1090
00:51:24,170 --> 00:51:24,670
Yeah?
1091
00:51:24,670 --> 00:51:27,295
AUDIENCE: Where does it tell you where it went wrong?
1092
00:51:27,295 --> 00:51:28,420
DAVID MALAN: Good question.
1093
00:51:28,420 --> 00:51:30,310
Where does it tell you where it went wrong?
1094
00:51:30,310 --> 00:51:33,190
So, sadly, it does not tell you any of that.
1095
00:51:33,190 --> 00:51:37,570
The onus is still on you, the human, to use this tool productively to walk
1096
00:51:37,570 --> 00:51:39,580
through your code at a saner pace.
1097
00:51:39,580 --> 00:51:42,070
But your brain is the one that still needs to solve it.
1098
00:51:42,070 --> 00:51:45,190
And I don't doubt, down the line, with artificial intelligence and more,
1099
00:51:45,190 --> 00:51:47,350
programs like this will get all the more helpful,
1100
00:51:47,350 --> 00:51:49,160
and start answering questions like that for us.
1101
00:51:49,160 --> 00:51:51,340
And there are other tools we'll introduce you this semester
1102
00:51:51,340 --> 00:51:52,990
that are even more powerful than this.
1103
00:51:52,990 --> 00:51:56,770
But for now, it's just a tool, really, to slow things down and not
1104
00:51:56,770 --> 00:51:57,820
have to change your code.
1105
00:51:57,820 --> 00:52:01,420
The fact that I had that panel on the left that just showed me i's changing
1106
00:52:01,420 --> 00:52:04,150
value is just an alternative to printf, and I can
1107
00:52:04,150 --> 00:52:06,820
step through it a little more slowly.
1108
00:52:06,820 --> 00:52:10,580
Other questions on debugging?
1109
00:52:10,580 --> 00:52:11,080
No?
1110
00:52:11,080 --> 00:52:14,950
Let me show you one final example with this debugger here.
1111
00:52:14,950 --> 00:52:16,750
And this one, too, I wrote in advance.
1112
00:52:16,750 --> 00:52:18,730
Let me close buggy0.c.
1113
00:52:18,730 --> 00:52:22,327
And let me open up buggy1.c, my second version thereof.
1114
00:52:22,327 --> 00:52:24,160
Let me close my terminal window for a second
1115
00:52:24,160 --> 00:52:26,350
and give you a quick tour of this program, which
1116
00:52:26,350 --> 00:52:28,030
similarly, has a mistake.
1117
00:52:28,030 --> 00:52:32,830
Now, at the top of this program, some familiar includes, cs50.h and stdio.h.
1118
00:52:32,830 --> 00:52:34,730
This is not something we've seen before.
1119
00:52:34,730 --> 00:52:36,190
It's specific to this example--
1120
00:52:36,190 --> 00:52:38,830
a function called getNegativeInt.
1121
00:52:38,830 --> 00:52:41,043
Takes no arguments, and it returns an integer.
1122
00:52:41,043 --> 00:52:41,710
What does it do?
1123
00:52:41,710 --> 00:52:45,040
It literally gets a negative integer, ideally, from the user.
1124
00:52:45,040 --> 00:52:47,200
Fun fact, though, it doesn't correctly.
1125
00:52:47,200 --> 00:52:50,090
That's the bug. getNegativeInt is broken at the moment.
1126
00:52:50,090 --> 00:52:51,470
So what does main do?
1127
00:52:51,470 --> 00:52:54,130
Well, main just calls this function, passing in nothing
1128
00:52:54,130 --> 00:52:55,690
in parentheses, no inputs.
1129
00:52:55,690 --> 00:52:58,240
And it stores the return value in i.
1130
00:52:58,240 --> 00:53:00,260
And then it just prints out i on the screen.
1131
00:53:00,260 --> 00:53:03,910
So honestly, just by eyeballing this, I feel comfortable enough
1132
00:53:03,910 --> 00:53:06,365
with programming in C, I think main is correct.
1133
00:53:06,365 --> 00:53:07,990
Let me just stipulate, main is correct.
1134
00:53:07,990 --> 00:53:09,698
But there is going to be a bug down here.
1135
00:53:09,698 --> 00:53:11,210
Now, what's the bug down here?
1136
00:53:11,210 --> 00:53:14,830
Well, let me look at getNegativeInt's implementation.
1137
00:53:14,830 --> 00:53:18,970
Notice, this first line, 12, is identical to the prototype up here.
1138
00:53:18,970 --> 00:53:22,690
The prototype is sort of stupidly required up here
1139
00:53:22,690 --> 00:53:25,300
because C reads things top to bottom, left to right--
1140
00:53:25,300 --> 00:53:26,690
the compiler technically does.
1141
00:53:26,690 --> 00:53:29,680
So if you reference getNegativeInt here, but you
1142
00:53:29,680 --> 00:53:33,490
don't implement it until down here, and you haven't told C in advance
1143
00:53:33,490 --> 00:53:36,820
that it will exist, again, you get the error we saw last week.
1144
00:53:36,820 --> 00:53:39,010
All right, so how does getNegativeInt work?
1145
00:53:39,010 --> 00:53:40,960
We declare a variable called n.
1146
00:53:40,960 --> 00:53:43,540
We've got to do while loop that does what?
1147
00:53:43,540 --> 00:53:47,110
It uses getInt, which comes with the cs50 library, per last week.
1148
00:53:47,110 --> 00:53:49,480
It prompts the user for negative integer, quote unquote,
1149
00:53:49,480 --> 00:53:51,670
and stores the value in n.
1150
00:53:51,670 --> 00:53:56,800
I then do all of this while n is less than 0, right?
1151
00:53:56,800 --> 00:54:00,400
Remember, we used to do while loop last week to make sure the human cooperates
1152
00:54:00,400 --> 00:54:03,970
and doesn't give us the wrong type of value, be it positive or negative
1153
00:54:03,970 --> 00:54:04,970
or something else.
1154
00:54:04,970 --> 00:54:06,400
And then we return n.
1155
00:54:06,400 --> 00:54:07,570
And there's some subtleties.
1156
00:54:07,570 --> 00:54:12,970
Anyone recall-- or have an intuition for why I've declared n on line 14,
1157
00:54:12,970 --> 00:54:15,790
instead of line 17?
1158
00:54:15,790 --> 00:54:17,620
This is a C specific thing.
1159
00:54:17,620 --> 00:54:23,465
AUDIENCE: [INAUDIBLE]
1160
00:54:23,465 --> 00:54:24,340
DAVID MALAN: Exactly.
1161
00:54:24,340 --> 00:54:27,610
There's this notion of scope in C. And we'll continue to see this over time,
1162
00:54:27,610 --> 00:54:32,590
whereby, a variable only exists inside of the most recent curly braces
1163
00:54:32,590 --> 00:54:33,560
that you've opened.
1164
00:54:33,560 --> 00:54:36,910
So if I've declared n here on line 14, I can use it
1165
00:54:36,910 --> 00:54:40,900
anywhere between lines 13 and 21 because those are the nearest curly braces.
1166
00:54:40,900 --> 00:54:43,540
If by contrast, as you note, if I instead said this,
1167
00:54:43,540 --> 00:54:49,180
int n equals getInt and so forth, and didn't have the current line 14,
1168
00:54:49,180 --> 00:54:53,470
well, n would exist inside of these curly braces, but not here, which
1169
00:54:53,470 --> 00:54:55,340
is too late, and definitely not here.
1170
00:54:55,340 --> 00:54:59,480
So you just have to declare it first, and then use and reuse it as such.
1171
00:54:59,480 --> 00:55:01,545
Now, let me just show you how I can debug this.
1172
00:55:01,545 --> 00:55:03,170
But let me show you the symptoms first.
1173
00:55:03,170 --> 00:55:04,930
Let me open my terminal window.
1174
00:55:04,930 --> 00:55:06,970
Let me run make buggy1.
1175
00:55:06,970 --> 00:55:11,710
Compiles OK, so it's not something silly like a semicolon. ./buggy1,
1176
00:55:11,710 --> 00:55:13,660
and I'm asked for a negative integer.
1177
00:55:13,660 --> 00:55:15,280
All right, let me give it negative 1--
1178
00:55:15,280 --> 00:55:16,710
Enter.
1179
00:55:16,710 --> 00:55:19,920
Well, the main function is supposed to print out what I typed,
1180
00:55:19,920 --> 00:55:20,880
but it clearly didn't.
1181
00:55:20,880 --> 00:55:21,880
It's prompting me again.
1182
00:55:21,880 --> 00:55:23,830
All right, so maybe it'll like negative 2.
1183
00:55:23,830 --> 00:55:24,330
No?
1184
00:55:24,330 --> 00:55:26,380
Maybe negative 3.
1185
00:55:26,380 --> 00:55:27,570
50?
1186
00:55:27,570 --> 00:55:29,160
OK, so it's definitely broken, right?
1187
00:55:29,160 --> 00:55:31,528
It kind of seems logically to be doing the opposite.
1188
00:55:31,528 --> 00:55:33,820
Now, you can perhaps see why this is happening already.
1189
00:55:33,820 --> 00:55:37,170
These are deliberately simple programs for demonstrations sake.
1190
00:55:37,170 --> 00:55:38,470
But let's do this.
1191
00:55:38,470 --> 00:55:41,037
Let me go ahead and set a breakpoint in main,
1192
00:55:41,037 --> 00:55:42,870
even though I'm pretty sure main is correct.
1193
00:55:42,870 --> 00:55:45,810
But it just helps me start my thought process-- start with main,
1194
00:55:45,810 --> 00:55:47,010
and then take it from there.
1195
00:55:47,010 --> 00:55:51,840
Let me run now, debug50 ./buggy1--
1196
00:55:51,840 --> 00:55:52,920
Enter.
1197
00:55:52,920 --> 00:55:53,700
And let's see.
1198
00:55:53,700 --> 00:55:56,880
With that breakpoint now, the GUI is going to reconfigure itself.
1199
00:55:56,880 --> 00:56:00,360
It's going to pause on line 8 because that's the first interesting line
1200
00:56:00,360 --> 00:56:01,260
inside of main.
1201
00:56:01,260 --> 00:56:03,780
So I could have just put the breakpoint on line 8 too.
1202
00:56:03,780 --> 00:56:06,480
It's smart enough to know that if I set it on 6,
1203
00:56:06,480 --> 00:56:09,570
you really mean line 8 because that's the first actual line of code.
1204
00:56:09,570 --> 00:56:11,280
And watch, now, what happens.
1205
00:56:11,280 --> 00:56:15,780
If I step over this line, notice that i, which at the moment
1206
00:56:15,780 --> 00:56:18,090
seems to have a default value of 0--
1207
00:56:18,090 --> 00:56:19,470
more on that another time.
1208
00:56:19,470 --> 00:56:24,750
But if I click Step Over like before, I'm prompted for a negative integer.
1209
00:56:24,750 --> 00:56:25,750
Let me type negative 1--
1210
00:56:25,750 --> 00:56:27,300
Enter.
1211
00:56:27,300 --> 00:56:32,470
And now, notice, there's no additional yellow highlight.
1212
00:56:32,470 --> 00:56:32,970
Why?
1213
00:56:32,970 --> 00:56:35,160
Where am I currently stuck, logically?
1214
00:56:35,160 --> 00:56:37,937
AUDIENCE: [INAUDIBLE]
1215
00:56:37,937 --> 00:56:40,770
DAVID MALAN: Yeah, just logically, I must be in that do, while loop.
1216
00:56:40,770 --> 00:56:43,560
And even if you don't understand it, like that's the only explanation.
1217
00:56:43,560 --> 00:56:46,143
If you keep getting prompted, surely, there's a loop going on.
1218
00:56:46,143 --> 00:56:49,270
There's only one loop in my code, so there's probably a problem there.
1219
00:56:49,270 --> 00:56:52,900
So I can't just set a breakpoint in main, and then wait for this to work.
1220
00:56:52,900 --> 00:56:53,610
So let me just--
1221
00:56:53,610 --> 00:56:56,280
let me stop this with the red square.
1222
00:56:56,280 --> 00:56:58,860
And let me think, all right, instead of--
1223
00:56:58,860 --> 00:57:02,770
I can still set my breakpoint in main, but let me rerun the debugger instead.
1224
00:57:02,770 --> 00:57:05,470
And this time, not step over that line of code,
1225
00:57:05,470 --> 00:57:07,930
let me step into that line of code.
1226
00:57:07,930 --> 00:57:09,270
So watch what happens now.
1227
00:57:09,270 --> 00:57:11,430
Instead of clicking the second icon here,
1228
00:57:11,430 --> 00:57:14,610
let me click the third, whose name is, indeed, Step Into.
1229
00:57:14,610 --> 00:57:17,880
And watch as the yellow highlight does not move to line 9.
1230
00:57:17,880 --> 00:57:21,930
It dives into line 8-- the function on line 8,
1231
00:57:21,930 --> 00:57:25,170
thereby, bringing me down to line 17.
1232
00:57:25,170 --> 00:57:28,270
It's kind of going down into that next function.
1233
00:57:28,270 --> 00:57:31,422
Now, it didn't bother pausing on line 12 or 13 or 14
1234
00:57:31,422 --> 00:57:34,380
because there's nothing intellectually interesting there happening yet.
1235
00:57:34,380 --> 00:57:37,080
The juicy part really starts, it would seem, in line 17.
1236
00:57:37,080 --> 00:57:40,980
So, now notice, n is my variable at the top left.
1237
00:57:40,980 --> 00:57:42,270
If I click--
1238
00:57:42,270 --> 00:57:45,420
I don't want to click Step Into now, though.
1239
00:57:45,420 --> 00:57:48,090
What would go wrong if I click on Step Into--
1240
00:57:48,090 --> 00:57:52,480
or what would it do that I don't think I want to do?
1241
00:57:52,480 --> 00:57:52,990
Yeah?
1242
00:57:52,990 --> 00:57:54,755
AUDIENCE: [INAUDIBLE]
1243
00:57:54,755 --> 00:57:56,630
DAVID MALAN: Yeah, it would step into getInt.
1244
00:57:56,630 --> 00:57:59,620
But I'd like to think that the staff's version of getInt is correct,
1245
00:57:59,620 --> 00:58:02,120
and that's not our problem today, so I want to step over it.
1246
00:58:02,120 --> 00:58:06,710
And watch now at top left that nothing happens yet to the value of n
1247
00:58:06,710 --> 00:58:09,530
until I go to the terminal window now, and I type in something
1248
00:58:09,530 --> 00:58:10,670
like negative 1.
1249
00:58:10,670 --> 00:58:14,600
Now notice, it jumps to line 19, which is the next interesting line.
1250
00:58:14,600 --> 00:58:17,240
Top left, n, indeed, is negative 1.
1251
00:58:17,240 --> 00:58:19,160
And here's where I can now pause as a human
1252
00:58:19,160 --> 00:58:22,760
and think, all right, so while n is less than 0.
1253
00:58:22,760 --> 00:58:25,280
All right, n, per the top left corner, is negative 1.
1254
00:58:25,280 --> 00:58:27,830
So all right, while negative 1 is less than 0,
1255
00:58:27,830 --> 00:58:29,780
well, obviously that's true mathematically.
1256
00:58:29,780 --> 00:58:30,930
So what's going to happen?
1257
00:58:30,930 --> 00:58:32,130
It's a do while loop.
1258
00:58:32,130 --> 00:58:37,285
So when I click on Step Over again, it's going to go to this line
1259
00:58:37,285 --> 00:58:39,410
because it's at the end of the inside of that loop.
1260
00:58:39,410 --> 00:58:42,710
And now here, it's looping through again and again.
1261
00:58:42,710 --> 00:58:44,240
All right, let me do this once more.
1262
00:58:44,240 --> 00:58:45,980
I'm going to step over, all right?
1263
00:58:45,980 --> 00:58:48,777
I'm going to type in negative 2, and it's the exact same thing.
1264
00:58:48,777 --> 00:58:50,360
Now is my chance, on the yellow line--
1265
00:58:50,360 --> 00:58:51,260
OK, wait a minute.
1266
00:58:51,260 --> 00:58:53,450
Negative 2 is obviously less than 0.
1267
00:58:53,450 --> 00:58:56,080
Let me try this one more time.
1268
00:58:56,080 --> 00:58:57,570
Click it once here.
1269
00:58:57,570 --> 00:58:59,040
All right, let me give it 50.
1270
00:58:59,040 --> 00:59:05,020
And now, OK, while 50 is less than 0, that's not true,
1271
00:59:05,020 --> 00:59:08,970
so the loop is over because it's not going to do it while 50 is less than 0.
1272
00:59:08,970 --> 00:59:09,730
That's not true.
1273
00:59:09,730 --> 00:59:12,240
So now watch, when I click Step Over once more,
1274
00:59:12,240 --> 00:59:15,810
it then finishes the loop, even though there's nothing more to do.
1275
00:59:15,810 --> 00:59:17,610
It's now about to return n.
1276
00:59:17,610 --> 00:59:21,360
It jumps back up to main, where I left off on line 9.
1277
00:59:21,360 --> 00:59:23,778
It now prints, in my terminal window, the number 50.
1278
00:59:23,778 --> 00:59:26,070
And hopefully, at this point, to your question earlier,
1279
00:59:26,070 --> 00:59:30,700
my human brain has realized, oh, I'm an idiot, like I flipped my sign there.
1280
00:59:30,700 --> 00:59:32,460
So I probably-- let me stop this.
1281
00:59:32,460 --> 00:59:34,780
I probably want to do something like this.
1282
00:59:34,780 --> 00:59:38,860
If the goal is to get a negative integer, I probably want to say,
1283
00:59:38,860 --> 00:59:45,070
while n is, for instance, greater than or equal to 0 would work.
1284
00:59:45,070 --> 00:59:48,630
So while n is greater than or equal to 0, keep doing this.
1285
00:59:48,630 --> 00:59:50,430
And that's the logic I wanted to express.
1286
00:59:50,430 --> 00:59:53,733
So the debugger just saves me from staring at the screen, raising a hand,
1287
00:59:53,733 --> 00:59:54,900
sort of asking someone else.
1288
00:59:54,900 --> 00:59:58,650
At least in this case, it allows me to go through it at a healthier pace.
1289
00:59:58,650 --> 01:00:03,000
Questions now on debug50, which should be your new friend, even if it's not
1290
01:00:03,000 --> 01:00:04,940
your first instinct after printf?
1291
01:00:04,940 --> 01:00:07,690
1292
01:00:07,690 --> 01:00:09,190
Any questions on debug50?
1293
01:00:09,190 --> 01:00:09,730
No?
1294
01:00:09,730 --> 01:00:13,960
All right, well, there's one last technique we can equip you with here.
1295
01:00:13,960 --> 01:00:17,470
And that is, in addition to printf and a debugger, no joke,
1296
01:00:17,470 --> 01:00:21,400
a rubber duck is actually a reasonably recommended solution
1297
01:00:21,400 --> 01:00:22,720
to finding bugs in your code.
1298
01:00:22,720 --> 01:00:24,640
To your question earlier, the duck two is not
1299
01:00:24,640 --> 01:00:26,390
going to solve the problem for you.
1300
01:00:26,390 --> 01:00:29,710
But if you've wondered why this little guy has been here for so long,
1301
01:00:29,710 --> 01:00:32,080
there's this technique, has its own Wikipedia article
1302
01:00:32,080 --> 01:00:33,760
of called rubber duck debugging.
1303
01:00:33,760 --> 01:00:37,390
The idea of which is that if you're home in your dorm room,
1304
01:00:37,390 --> 01:00:39,520
wrestling with some bug in your code, printf
1305
01:00:39,520 --> 01:00:42,820
didn't quite reveal the source to you, debugger isn't really helping,
1306
01:00:42,820 --> 01:00:46,960
honestly, maybe it would help to just sound out what problem you're having.
1307
01:00:46,960 --> 01:00:50,260
Similar to going to office hours, talking to a TA or a professor,
1308
01:00:50,260 --> 01:00:52,030
just walking through your problems because
1309
01:00:52,030 --> 01:00:54,730
in sort of talking to the duck about the fact
1310
01:00:54,730 --> 01:01:00,550
that you're doing this while n is less than 0, and then if it is--
1311
01:01:00,550 --> 01:01:01,180
wait a minute.
1312
01:01:01,180 --> 01:01:03,820
I'm an idiot, not just for talking to the rubber duck.
1313
01:01:03,820 --> 01:01:05,980
You realize, hopefully, in expressing yourself,
1314
01:01:05,980 --> 01:01:09,910
literally verbally, you probably will hear with non-zero probability,
1315
01:01:09,910 --> 01:01:11,860
like some illogic in your statement.
1316
01:01:11,860 --> 01:01:16,430
And just by sounding things out, you'll realize like, oh, that's my problem.
1317
01:01:16,430 --> 01:01:19,720
And so, frankly, if you have roommates, you can also use a roommate for this.
1318
01:01:19,720 --> 01:01:21,700
But the rubber duck is just sort of a go-to
1319
01:01:21,700 --> 01:01:24,700
when your roommates have no interest in your C problem set,
1320
01:01:24,700 --> 01:01:28,150
talking something through that as such.
1321
01:01:28,150 --> 01:01:29,933
And this is an invaluable technique.
1322
01:01:29,933 --> 01:01:32,350
I admittedly tend not to do it so much with a rubber duck,
1323
01:01:32,350 --> 01:01:34,510
but ideally with colleagues, human colleagues.
1324
01:01:34,510 --> 01:01:38,260
But just talking through things often will help you just realize,
1325
01:01:38,260 --> 01:01:40,360
oh, I said something illogical.
1326
01:01:40,360 --> 01:01:41,860
Now I can go back to the code.
1327
01:01:41,860 --> 01:01:44,650
So don't solve problems by staring at your screen
1328
01:01:44,650 --> 01:01:46,240
endlessly for minutes, for hours.
1329
01:01:46,240 --> 01:01:48,100
At that point, it's time for a break, time
1330
01:01:48,100 --> 01:01:50,475
to walk away, time to talk to the duck, if you've already
1331
01:01:50,475 --> 01:01:52,900
exhausted some of those other tools.
1332
01:01:52,900 --> 01:01:55,330
As an aside, on your way out today at the end of class,
1333
01:01:55,330 --> 01:01:59,020
we have, clearly, plenty of rubber ducks for you.
1334
01:01:59,020 --> 01:02:01,600
And it's become a thing over the years, at least
1335
01:02:01,600 --> 01:02:05,770
among some, to bring the duck with them when they travel and send us photos.
1336
01:02:05,770 --> 01:02:10,480
Here, for instance, is CS50's rubber duck debugger, A.K.A. DDB,
1337
01:02:10,480 --> 01:02:15,940
for Duck Debugger, which is a pun on a geekier program called GDB, the GNU
1338
01:02:15,940 --> 01:02:18,740
Debugger, which is an actual piece of software for debugging.
1339
01:02:18,740 --> 01:02:25,270
This is CS50's debugger in the hills of Puerto Rico, also, here on the sea.
1340
01:02:25,270 --> 01:02:28,310
He made its way to San Francisco here.
1341
01:02:28,310 --> 01:02:30,640
Also, down by Fisherman's Wharf by the sea lions.
1342
01:02:30,640 --> 01:02:31,660
Familiar?
1343
01:02:31,660 --> 01:02:34,570
Here at Stanford, where there's a William Gates Computer Science
1344
01:02:34,570 --> 01:02:38,950
building for computer science, down the road in SF at Google.
1345
01:02:38,950 --> 01:02:41,650
And this is the Trevi Fountain in Rome.
1346
01:02:41,650 --> 01:02:43,810
And lastly, the Colosseum.
1347
01:02:43,810 --> 01:02:46,990
So we'll be curious to see in the coming years where your duck two travels.
1348
01:02:46,990 --> 01:02:49,120
So that, then, was quite a bit.
1349
01:02:49,120 --> 01:02:51,850
Why don't we go ahead here and take a short 5 minute break?
1350
01:02:51,850 --> 01:02:52,760
No snacks yet.
1351
01:02:52,760 --> 01:02:54,400
You're welcome to get up or sit down.
1352
01:02:54,400 --> 01:02:56,620
We'll return in about five.
1353
01:02:56,620 --> 01:03:00,020
All right, so we are back.
1354
01:03:00,020 --> 01:03:04,000
And if the goal, ultimately, today is to have a better understanding of things
1355
01:03:04,000 --> 01:03:06,940
like strings so that we can solve problems with text,
1356
01:03:06,940 --> 01:03:09,190
let's consider some simpler types of data
1357
01:03:09,190 --> 01:03:11,290
first, how we might represent those, and then
1358
01:03:11,290 --> 01:03:14,290
see if that doesn't lead us to a discovery as to how strings,
1359
01:03:14,290 --> 01:03:17,330
and just today's modern software is using things like that.
1360
01:03:17,330 --> 01:03:21,850
So when we talked on week zero about representation of data,
1361
01:03:21,850 --> 01:03:25,930
we had different ways of doing it, in terms of binary and decimal,
1362
01:03:25,930 --> 01:03:27,640
and unary even.
1363
01:03:27,640 --> 01:03:30,520
When we started talking about the same last week in code,
1364
01:03:30,520 --> 01:03:33,980
we started talking about data types instead.
1365
01:03:33,980 --> 01:03:36,820
And these data types were a way of telling
1366
01:03:36,820 --> 01:03:40,000
the computer, like do you want an integer, do you want a character,
1367
01:03:40,000 --> 01:03:44,260
do you want a floating point value, like a real number, or even a string,
1368
01:03:44,260 --> 01:03:45,070
as we've seen?
1369
01:03:45,070 --> 01:03:47,350
But it turns out that computers, of course,
1370
01:03:47,350 --> 01:03:49,930
only have finite amounts of resources.
1371
01:03:49,930 --> 01:03:53,740
Your computer only has a fixed amount of memory or RAM.
1372
01:03:53,740 --> 01:03:55,910
And that actually has very real world implications.
1373
01:03:55,910 --> 01:03:59,630
So for instance, here are some of the data types we've seen thus far.
1374
01:03:59,630 --> 01:04:04,090
And it turns out that each of these in C has a specific number
1375
01:04:04,090 --> 01:04:05,650
of bits allocated to it.
1376
01:04:05,650 --> 01:04:08,350
Now, admittedly, this can vary by system.
1377
01:04:08,350 --> 01:04:10,850
It's not so much the case nowadays, but for many years,
1378
01:04:10,850 --> 01:04:13,100
for decades, computers were getting better and better.
1379
01:04:13,100 --> 01:04:15,392
The earliest computers might have used fewer bits
1380
01:04:15,392 --> 01:04:16,600
for some of these data types.
1381
01:04:16,600 --> 01:04:18,663
More modern computers might use more bits.
1382
01:04:18,663 --> 01:04:21,830
So the numbers you're about to see are pretty much where we are present day.
1383
01:04:21,830 --> 01:04:25,030
So when it comes to these data types, a bool,
1384
01:04:25,030 --> 01:04:29,020
which is true or false, somewhat curiously, uses a whole byte,
1385
01:04:29,020 --> 01:04:32,380
even though that's way overkill because for a bool, true or false,
1386
01:04:32,380 --> 01:04:33,940
you, of course, only need one bit.
1387
01:04:33,940 --> 01:04:36,520
But it turns out, even though it's wasteful to use
1388
01:04:36,520 --> 01:04:39,938
eight bits, or one byte, just to represent true or false,
1389
01:04:39,938 --> 01:04:41,230
it's just easier for computers.
1390
01:04:41,230 --> 01:04:42,820
So a bool tends to be one byte.
1391
01:04:42,820 --> 01:04:47,590
An int, which we've been using a lot, uses 4 bytes, typically, or 32 bits.
1392
01:04:47,590 --> 01:04:50,590
And if I do some quick math from week zero, with 32 bits,
1393
01:04:50,590 --> 01:04:54,040
you have 4 billion possible values, roughly.
1394
01:04:54,040 --> 01:04:56,290
But if you want to represent positive and negative,
1395
01:04:56,290 --> 01:04:59,710
that means you can represent roughly negative 2 billion, all the way up
1396
01:04:59,710 --> 01:05:01,020
to positive 2 billion.
1397
01:05:01,020 --> 01:05:02,770
So that's the range, typically, with ints.
1398
01:05:02,770 --> 01:05:06,820
If that's too few numbers for you, turns out there's things called longs.
1399
01:05:06,820 --> 01:05:10,120
And longs use 64 bits, which allow you to have
1400
01:05:10,120 --> 01:05:13,220
like a quintillion number of possibilities,
1401
01:05:13,220 --> 01:05:15,730
which is a lot, certainly, a lot more than 4 billion.
1402
01:05:15,730 --> 01:05:17,410
So sometimes you might use a long.
1403
01:05:17,410 --> 01:05:18,670
But even that's finite.
1404
01:05:18,670 --> 01:05:21,640
And so as we discussed at the end of last week,
1405
01:05:21,640 --> 01:05:23,980
bad things can happen if you make certain assumptions
1406
01:05:23,980 --> 01:05:27,220
as to the data because of things like integer overflow or the like,
1407
01:05:27,220 --> 01:05:28,330
where things wrap around.
1408
01:05:28,330 --> 01:05:31,538
Then there's a float, which is a real number, something with a decimal point.
1409
01:05:31,538 --> 01:05:36,040
By convention, it's 4 bytes or 32 bits, which gives you, in short,
1410
01:05:36,040 --> 01:05:37,810
only a specific amount of precision.
1411
01:05:37,810 --> 01:05:41,620
It doesn't necessarily dictate how many numbers to the left or to the right.
1412
01:05:41,620 --> 01:05:45,250
In the aggregate, ultimately, you have though,
1413
01:05:45,250 --> 01:05:47,650
4 billion possible permutations still.
1414
01:05:47,650 --> 01:05:50,110
If you need more precision for scientific, for medical,
1415
01:05:50,110 --> 01:05:54,790
for financial applications, you might use 8 bytes, A.K.A. a double,
1416
01:05:54,790 --> 01:05:57,700
which just gives you more digits of precision.
1417
01:05:57,700 --> 01:06:01,360
They eventually get imprecise per the example we looked at last week,
1418
01:06:01,360 --> 01:06:03,610
but it at least gets you further down the line.
1419
01:06:03,610 --> 01:06:07,930
As an aside, in really, really important applications, in finance,
1420
01:06:07,930 --> 01:06:10,030
in medicine, in military operations, and the
1421
01:06:10,030 --> 01:06:12,640
like where you really can't have rounding errors--
1422
01:06:12,640 --> 01:06:17,470
long story short, humans have developed libraries in C and other languages
1423
01:06:17,470 --> 01:06:19,317
that use more, even, than 8 bytes.
1424
01:06:19,317 --> 01:06:22,150
So there are solutions to these problems, but they're always finite.
1425
01:06:22,150 --> 01:06:24,070
You have to pick an upper bound.
1426
01:06:24,070 --> 01:06:27,070
Then there's char, which we saw briefly last week when I asked
1427
01:06:27,070 --> 01:06:29,470
the user for y or n, for yes or no.
1428
01:06:29,470 --> 01:06:32,470
And then there's a string, which I'm going to propose as a question mark
1429
01:06:32,470 --> 01:06:34,360
because a string totally depends.
1430
01:06:34,360 --> 01:06:35,380
Like, Hi!
1431
01:06:35,380 --> 01:06:38,890
H-I, exclamation point, would seem to be three bytes.
1432
01:06:38,890 --> 01:06:41,140
D-A-V-I-D, would seem to be five.
1433
01:06:41,140 --> 01:06:45,400
So the strings, clearly, are variable based on what you or the human type in.
1434
01:06:45,400 --> 01:06:48,140
So we'll see what this means, though, in just a bit.
1435
01:06:48,140 --> 01:06:51,580
This though, is the thing inside of your Mac, your PC, your phone.
1436
01:06:51,580 --> 01:06:53,680
It might not look exactly like this, but this is
1437
01:06:53,680 --> 01:06:56,187
a memory module for a modern computer.
1438
01:06:56,187 --> 01:06:57,520
And let's go ahead and use this.
1439
01:06:57,520 --> 01:06:59,920
Really, it's just representative of the finite amount of memory
1440
01:06:59,920 --> 01:07:01,360
that any computer, indeed, has.
1441
01:07:01,360 --> 01:07:06,160
Let's zoom in on one of these little black chips on the circuit board here.
1442
01:07:06,160 --> 01:07:10,180
Zoom in, and let me propose that this rectangle really represents
1443
01:07:10,180 --> 01:07:14,380
some number of bytes, like tucked inside of this little black circuit
1444
01:07:14,380 --> 01:07:16,750
on the board is maybe, I don't know, a gigabyte,
1445
01:07:16,750 --> 01:07:19,300
a billion bytes, maybe it's 100 bytes-- some number of bytes.
1446
01:07:19,300 --> 01:07:21,258
It totally depends on the computer and how much
1447
01:07:21,258 --> 01:07:22,700
you paid for the stick of memory.
1448
01:07:22,700 --> 01:07:27,850
But if there's a finite number of bytes physically implemented somehow
1449
01:07:27,850 --> 01:07:30,327
digitally inside of this hardware, well, then it
1450
01:07:30,327 --> 01:07:32,410
stands to reason that we could number those bytes.
1451
01:07:32,410 --> 01:07:36,940
We can just arbitrarily decide that the top left corner is byte number
1452
01:07:36,940 --> 01:07:38,800
one, or really byte number zero.
1453
01:07:38,800 --> 01:07:41,170
The one next to it is number one, then number two,
1454
01:07:41,170 --> 01:07:43,450
number 3, dot, dot, dot, number 2 billion
1455
01:07:43,450 --> 01:07:46,090
or whatever it is, however big this memory is.
1456
01:07:46,090 --> 01:07:50,530
So if you use a variable in a C program, that's only one byte.
1457
01:07:50,530 --> 01:07:54,190
Like a char, it might literally be stored in that top left-hand corner
1458
01:07:54,190 --> 01:07:55,120
of the memory.
1459
01:07:55,120 --> 01:07:57,760
In practice, you don't care where, physically, it is.
1460
01:07:57,760 --> 01:07:59,830
But really, the artist's rendition would be
1461
01:07:59,830 --> 01:08:02,872
this-- a char might use one of those single bytes
1462
01:08:02,872 --> 01:08:04,330
somewhere in the computer's memory.
1463
01:08:04,330 --> 01:08:07,450
If you use an int, which is 4 bytes, it would give you
1464
01:08:07,450 --> 01:08:10,840
4 bytes, contiguous-- that is left to right, top to bottom.
1465
01:08:10,840 --> 01:08:13,274
But all 32 bits would be next to each other
1466
01:08:13,274 --> 01:08:16,149
so the computer knows that those, indeed, all belong to the same int.
1467
01:08:16,149 --> 01:08:18,680
If you need a long, or a double for that matter,
1468
01:08:18,680 --> 01:08:21,140
then you might use a full 8 bytes in this case.
1469
01:08:21,140 --> 01:08:23,439
And you just keep using and using this memory,
1470
01:08:23,439 --> 01:08:26,170
kind of like a canvas, almost in Photoshop
1471
01:08:26,170 --> 01:08:29,845
or a spreadsheet where you can just move pixels or you can move data around,
1472
01:08:29,845 --> 01:08:31,720
that's really what your computer's memory is,
1473
01:08:31,720 --> 01:08:36,702
a canvas for storing information in units of bytes or 8 bits.
1474
01:08:36,702 --> 01:08:39,160
Now, we don't need to keep looking at these circuit boards.
1475
01:08:39,160 --> 01:08:41,287
We can abstract it away, as we often do.
1476
01:08:41,287 --> 01:08:43,120
And let's go ahead and zoom in on this grid,
1477
01:08:43,120 --> 01:08:45,740
just to consider some very specific variables.
1478
01:08:45,740 --> 01:08:49,180
So let me zoom in, and now I see fewer, but larger boxes
1479
01:08:49,180 --> 01:08:51,580
on the screen, each of which, again, represents a byte.
1480
01:08:51,580 --> 01:08:55,130
And now let me propose that we play with some actual code.
1481
01:08:55,130 --> 01:08:58,029
So here in C, albeit without a full program,
1482
01:08:58,029 --> 01:09:01,060
are three ints-- score1, score2, score3.
1483
01:09:01,060 --> 01:09:07,359
I have, coincidentally, given myself two scores around 72 and 73,
1484
01:09:07,359 --> 01:09:09,040
and then a pretty low score at 33.
1485
01:09:09,040 --> 01:09:12,048
Of course, last week or two weeks ago, this would have been high.
1486
01:09:12,048 --> 01:09:13,840
But now we're dealing with actual integers.
1487
01:09:13,840 --> 01:09:17,750
So these are three so-so scores on my quizzes or tests or the like.
1488
01:09:17,750 --> 01:09:19,250
So let me go to VS Code here.
1489
01:09:19,250 --> 01:09:22,210
And let's make a program called scores.c.
1490
01:09:22,210 --> 01:09:24,399
So I'm going to write, code scores.c.
1491
01:09:24,399 --> 01:09:26,149
That's going to give me my new file.
1492
01:09:26,149 --> 01:09:28,420
And let me go ahead and implement something like this.
1493
01:09:28,420 --> 01:09:34,149
Include stdio.h, int main(void), and then inside of here,
1494
01:09:34,149 --> 01:09:37,689
let me do int score1 will be 72.
1495
01:09:37,689 --> 01:09:40,029
Int score2 will be 73.
1496
01:09:40,029 --> 01:09:43,149
And int score3 will be 33.
1497
01:09:43,149 --> 01:09:45,460
And then let me just do something like write a program
1498
01:09:45,460 --> 01:09:48,043
to average my three test scores together, something like that.
1499
01:09:48,043 --> 01:09:52,240
So let me do printf, quote unquote, my average is--
1500
01:09:52,240 --> 01:09:56,470
and I'm going to go ahead and do, say, %i, /n.
1501
01:09:56,470 --> 01:09:58,290
And now, let me plug in the results.
1502
01:09:58,290 --> 01:10:00,040
And this is kind of grade school math now.
1503
01:10:00,040 --> 01:10:02,210
How do I compute the average of three values?
1504
01:10:02,210 --> 01:10:09,110
Well, just like on paper, I can do score1 plus score2 plus score3
1505
01:10:09,110 --> 01:10:12,830
in parentheses, because of order of operations, divided by 3,
1506
01:10:12,830 --> 01:10:14,457
since there's three total scores.
1507
01:10:14,457 --> 01:10:16,040
All right, so I think this checks out.
1508
01:10:16,040 --> 01:10:19,040
And indeed, you can use parentheses and operators like plus in your code
1509
01:10:19,040 --> 01:10:23,180
like this in C. Let me go ahead now and do make scores.
1510
01:10:23,180 --> 01:10:24,327
No syntax error.
1511
01:10:24,327 --> 01:10:25,910
So that's good, nothing missing there.
1512
01:10:25,910 --> 01:10:28,850
And now let me do ./scores and see what my test average is.
1513
01:10:28,850 --> 01:10:32,270
All right, it's not great, but I think I still passed.
1514
01:10:32,270 --> 01:10:36,050
And indeed, my average here is 59.
1515
01:10:36,050 --> 01:10:38,360
Is it precisely 59 though?
1516
01:10:38,360 --> 01:10:39,140
Well, let's see.
1517
01:10:39,140 --> 01:10:42,110
Let's actually, instead of using an int, how about we go ahead
1518
01:10:42,110 --> 01:10:44,870
and use something like a floating point value here?
1519
01:10:44,870 --> 01:10:46,250
And let me go ahead and do this.
1520
01:10:46,250 --> 01:10:48,710
So let me recompile my code, make scores.
1521
01:10:48,710 --> 01:10:50,600
Huh, all right, I've got an issue.
1522
01:10:50,600 --> 01:10:52,340
Let me zoom in on my terminal window.
1523
01:10:52,340 --> 01:10:54,710
We've not seen this one, necessarily, before.
1524
01:10:54,710 --> 01:10:56,510
But error on line 9.
1525
01:10:56,510 --> 01:11:00,410
Format specifies type double, which is a lot of precision,
1526
01:11:00,410 --> 01:11:02,180
but the argument has type int.
1527
01:11:02,180 --> 01:11:03,300
So what does this mean?
1528
01:11:03,300 --> 01:11:06,508
Well, it's showing me with these green squiggles that something's bad between
1529
01:11:06,508 --> 01:11:09,060
the %f and this thing over here.
1530
01:11:09,060 --> 01:11:13,020
Well, on the left, I'm implying a float, or a double for that matter.
1531
01:11:13,020 --> 01:11:16,835
On the right, though, what data type are score1, score2, score3?
1532
01:11:16,835 --> 01:11:17,960
All right, so they're ints.
1533
01:11:17,960 --> 01:11:19,583
So clang does not like this.
1534
01:11:19,583 --> 01:11:22,250
The compiler just doesn't like that I'm using ints on the right,
1535
01:11:22,250 --> 01:11:24,170
but I want floats on the left.
1536
01:11:24,170 --> 01:11:26,670
So there's going to be different ways of solving this.
1537
01:11:26,670 --> 01:11:29,870
One way would be to just ignore the problem like I originally did,
1538
01:11:29,870 --> 01:11:32,450
and just go back to %i.
1539
01:11:32,450 --> 01:11:38,330
Or as an aside, %d is often an alternative to %i for a decimal number.
1540
01:11:38,330 --> 01:11:42,358
But we use %i because it sounds like int, so %i is fine here too.
1541
01:11:42,358 --> 01:11:44,150
But I don't want to just avoid the problem.
1542
01:11:44,150 --> 01:11:46,500
I want to actually display a floating point value.
1543
01:11:46,500 --> 01:11:47,730
So how can I fix this?
1544
01:11:47,730 --> 01:11:50,272
Well, it turns out, I can solve this in a few different ways.
1545
01:11:50,272 --> 01:11:53,990
The simplest is just to make sure that at least one number on the right
1546
01:11:53,990 --> 01:11:59,330
is a floating point value, like 3.0 instead of just 3.
1547
01:11:59,330 --> 01:12:01,700
Now I think clang will be happier.
1548
01:12:01,700 --> 01:12:03,320
Let me do make scores--
1549
01:12:03,320 --> 01:12:04,400
Enter.
1550
01:12:04,400 --> 01:12:05,330
And indeed, it's OK.
1551
01:12:05,330 --> 01:12:05,930
Why?
1552
01:12:05,930 --> 01:12:10,050
As soon as you have at least one more precise data type on the right,
1553
01:12:10,050 --> 01:12:13,170
it just treats everything, at that point, as floating point value
1554
01:12:13,170 --> 01:12:14,330
so that the math works out.
1555
01:12:14,330 --> 01:12:17,720
So ./scores, Enter-- and now, there we go, right?
1556
01:12:17,720 --> 01:12:20,390
Some of us might really want that 1/3 of a point.
1557
01:12:20,390 --> 01:12:21,980
Our average was not 59.
1558
01:12:21,980 --> 01:12:25,010
It's 59 1/3, as in this case here.
1559
01:12:25,010 --> 01:12:26,750
All right, so we've solved that there.
1560
01:12:26,750 --> 01:12:30,890
As an aside, though, there's one other technique to show here.
1561
01:12:30,890 --> 01:12:33,320
If you didn't want to change it to 3.0 because that's
1562
01:12:33,320 --> 01:12:36,410
a little weird, because there were literally three scores,
1563
01:12:36,410 --> 01:12:38,760
it's not like that needs to have a decimal point,
1564
01:12:38,760 --> 01:12:43,970
you could also explicitly convert the 3 to a float
1565
01:12:43,970 --> 01:12:46,230
by saying, in parentheses, float.
1566
01:12:46,230 --> 01:12:48,050
This is what's called typecasting.
1567
01:12:48,050 --> 01:12:51,840
And this will just convert the thing right after it to that data type,
1568
01:12:51,840 --> 01:12:52,560
if it's possible.
1569
01:12:52,560 --> 01:12:56,970
So if I do this again, make scores, no errors now. ./scores, and I get,
1570
01:12:56,970 --> 01:12:59,960
in fact, the same result. There's a bit of a rounding issue here,
1571
01:12:59,960 --> 01:13:03,650
but we know the rounding relates to the imprecision from last week.
1572
01:13:03,650 --> 01:13:06,980
For now, let me just be happy with my 59.3 something.
1573
01:13:06,980 --> 01:13:08,360
I'll take that for now.
1574
01:13:08,360 --> 01:13:14,660
But this is as close to a good enough correct answer for me now.
1575
01:13:14,660 --> 01:13:15,942
But how do I--
1576
01:13:15,942 --> 01:13:18,650
think about now, what's going on inside of the computer's memory?
1577
01:13:18,650 --> 01:13:19,310
Well, let's consider.
1578
01:13:19,310 --> 01:13:20,643
Here's that same grid of memory.
1579
01:13:20,643 --> 01:13:22,490
Each box represents a byte.
1580
01:13:22,490 --> 01:13:25,790
Where are score1, score2, and score3 in my memory?
1581
01:13:25,790 --> 01:13:28,790
Well, score1, let me just propose, is at the top left.
1582
01:13:28,790 --> 01:13:32,060
But it's taking up four boxes for 4 bytes.
1583
01:13:32,060 --> 01:13:34,842
Score2 probably ends up right next to it in memory,
1584
01:13:34,842 --> 01:13:36,800
though, this isn't always going to be the case,
1585
01:13:36,800 --> 01:13:38,180
but I've chosen simple examples.
1586
01:13:38,180 --> 01:13:40,910
73 is next to it, also taking up 4 bytes.
1587
01:13:40,910 --> 01:13:45,320
And then lastly, 33 is in score3, down there underneath.
1588
01:13:45,320 --> 01:13:48,343
Now, if we really look at the computer's memory,
1589
01:13:48,343 --> 01:13:50,510
look at it with some kind of microscope or the like,
1590
01:13:50,510 --> 01:13:54,110
there's actually 32 bits, 32 bits, 32 bits
1591
01:13:54,110 --> 01:13:59,308
in each of those four groups of four bytes representing those values.
1592
01:13:59,308 --> 01:14:01,100
But again, for today's purposes onwards, we
1593
01:14:01,100 --> 01:14:03,308
don't really need to think again and again in binary.
1594
01:14:03,308 --> 01:14:05,940
It's just, indeed, these decimal numbers being stored there.
1595
01:14:05,940 --> 01:14:08,240
But I claim now, this isn't the best design.
1596
01:14:08,240 --> 01:14:11,300
Even if you have never programmed before CS50,
1597
01:14:11,300 --> 01:14:13,220
what you're looking at here on the screen,
1598
01:14:13,220 --> 01:14:16,970
as an excerpt, in what sense is this perhaps bad design, even though it's
1599
01:14:16,970 --> 01:14:19,960
a correct way of storing three test scores?
1600
01:14:19,960 --> 01:14:20,960
What's kind of bad here?
1601
01:14:20,960 --> 01:14:21,882
Yeah?
1602
01:14:21,882 --> 01:14:26,220
AUDIENCE: The more scores you have, the more you [INAUDIBLE]..
1603
01:14:26,220 --> 01:14:28,950
DAVID MALAN: Yeah, always do exactly what you did-- extrapolate
1604
01:14:28,950 --> 01:14:31,740
to 4 scores, 5 scores 50 scores.
1605
01:14:31,740 --> 01:14:34,020
This can't be that well-designed because now you're
1606
01:14:34,020 --> 01:14:36,300
going to have 4 lines of code, 5 lines of code,
1607
01:14:36,300 --> 01:14:38,550
50 lines of code that are almost identical,
1608
01:14:38,550 --> 01:14:40,770
except for this like arbitrary number that we're
1609
01:14:40,770 --> 01:14:42,430
updating at the end of the variable.
1610
01:14:42,430 --> 01:14:44,940
So indeed, there's probably going to be a better
1611
01:14:44,940 --> 01:14:48,690
way, even though, at least in C, we haven't yet seen that technique.
1612
01:14:48,690 --> 01:14:52,440
But the solution, today onward, is going to be something called an array.
1613
01:14:52,440 --> 01:14:57,180
An array is a way of storing your data back
1614
01:14:57,180 --> 01:15:00,630
to back to back in the computer's memory in such a way
1615
01:15:00,630 --> 01:15:03,960
that you can access each individual member easily.
1616
01:15:03,960 --> 01:15:08,530
Put another way, with an array, you can instead do something like this.
1617
01:15:08,530 --> 01:15:12,300
Instead of saying int score1, int score2, int score3,
1618
01:15:12,300 --> 01:15:15,790
giving each a value, you can first tell the computer,
1619
01:15:15,790 --> 01:15:18,330
please give me a variable called scores--
1620
01:15:18,330 --> 01:15:20,700
plural, though you can call it anything you want--
1621
01:15:20,700 --> 01:15:24,090
of size three, each of which will be an integer.
1622
01:15:24,090 --> 01:15:28,680
That is to say, this is how you declare an array in C that will have
1623
01:15:28,680 --> 01:15:30,930
enough room to store three integers.
1624
01:15:30,930 --> 01:15:34,540
Put another way, this is the technical way of telling the computer,
1625
01:15:34,540 --> 01:15:38,880
please give me 12 bytes in total--
1626
01:15:38,880 --> 01:15:42,660
3 times 4 each for an int, so give me 12 bytes in total.
1627
01:15:42,660 --> 01:15:44,640
And what the computer will do is guarantee
1628
01:15:44,640 --> 01:15:47,350
that they're back to back to back in the computer's memory.
1629
01:15:47,350 --> 01:15:49,360
And that'll be useful in just a moment.
1630
01:15:49,360 --> 01:15:51,820
So let me go ahead and do something useful with this.
1631
01:15:51,820 --> 01:15:53,640
Let me store three actual scores.
1632
01:15:53,640 --> 01:15:58,500
Here's how I could now store those same numeric scores in this array.
1633
01:15:58,500 --> 01:16:03,040
Syntax is a little different, but there's one variable called scores.
1634
01:16:03,040 --> 01:16:05,010
But if you want to go to its first location,
1635
01:16:05,010 --> 01:16:08,520
starting today, you use square brackets and go to location 0
1636
01:16:08,520 --> 01:16:13,080
first, which because things in C are 0 indexed, so to speak,
1637
01:16:13,080 --> 01:16:14,280
you start counting at 0.
1638
01:16:14,280 --> 01:16:16,410
The first int is at [0].
1639
01:16:16,410 --> 01:16:18,030
Second int is at [1].
1640
01:16:18,030 --> 01:16:19,530
Third int is at [2].
1641
01:16:19,530 --> 01:16:20,730
So it's not one, two, three.
1642
01:16:20,730 --> 01:16:22,090
It's literally 0, 1, 2.
1643
01:16:22,090 --> 01:16:24,090
And this is not something you have control over.
1644
01:16:24,090 --> 01:16:26,250
You must start at 0.
1645
01:16:26,250 --> 01:16:29,940
So these lines now create an array of size three,
1646
01:16:29,940 --> 01:16:33,510
and then insert one, two, three values into that array.
1647
01:16:33,510 --> 01:16:37,770
But the upside now is that you only have one name of the variable to remember.
1648
01:16:37,770 --> 01:16:39,240
It's just called scores.
1649
01:16:39,240 --> 01:16:43,380
Yes, you need to go into the array to get individual values.
1650
01:16:43,380 --> 01:16:46,618
You need to index into it using those square brackets.
1651
01:16:46,618 --> 01:16:48,660
But at least you don't have this hackish approach
1652
01:16:48,660 --> 01:16:53,050
of declaring a separate variable for each and every one of these values.
1653
01:16:53,050 --> 01:16:56,070
So let me go back to scores.c here.
1654
01:16:56,070 --> 01:16:57,580
And let me propose that I do this.
1655
01:16:57,580 --> 01:17:00,580
Let me just use that same idea to do the following.
1656
01:17:00,580 --> 01:17:02,580
Let me get rid of these three separate integers.
1657
01:17:02,580 --> 01:17:06,210
Let me give myself an int scores array of size 3.
1658
01:17:06,210 --> 01:17:10,470
And then scores[0] will, as before, be 72.
1659
01:17:10,470 --> 01:17:14,070
Scores[1] will be 73.
1660
01:17:14,070 --> 01:17:16,830
And scores[2] will be 33.
1661
01:17:16,830 --> 01:17:18,780
And let me get rid of the little dot there.
1662
01:17:18,780 --> 01:17:23,490
All right, so now, if I go ahead and run this again with make scores--
1663
01:17:23,490 --> 01:17:24,642
Enter.
1664
01:17:24,642 --> 01:17:29,060
Huh, what did I do wrong here?
1665
01:17:29,060 --> 01:17:31,680
I think I got a little too ahead of myself.
1666
01:17:31,680 --> 01:17:36,100
Let me increase my terminal window.
1667
01:17:36,100 --> 01:17:38,830
Let's focus on line 10 here, first.
1668
01:17:38,830 --> 01:17:42,310
Error, use of undeclared identifier, score1.
1669
01:17:42,310 --> 01:17:44,170
What did I do here that was dumb?
1670
01:17:44,170 --> 01:17:45,430
Yeah?
1671
01:17:45,430 --> 01:17:47,440
AUDIENCE: You didn't declare it a variable.
1672
01:17:47,440 --> 01:17:49,420
DAVID MALAN: Right, so I didn't declare score1.
1673
01:17:49,420 --> 01:17:50,530
I've got old code.
1674
01:17:50,530 --> 01:17:53,798
So I just kind of, honestly, got ahead of myself here, not even intentionally.
1675
01:17:53,798 --> 01:17:56,090
So let me go ahead and shrink my terminal window again.
1676
01:17:56,090 --> 01:17:57,740
I need to finish my thought here.
1677
01:17:57,740 --> 01:17:58,960
So let me clear my terminal.
1678
01:17:58,960 --> 01:18:04,960
And let me change this now to be scores[0] plus scores[1] plus
1679
01:18:04,960 --> 01:18:05,610
scores[2].
1680
01:18:05,610 --> 01:18:07,360
So it's a little more verbose because I've
1681
01:18:07,360 --> 01:18:10,040
got these square brackets, so to speak.
1682
01:18:10,040 --> 01:18:12,220
But I think now my code is consistent.
1683
01:18:12,220 --> 01:18:13,870
So let me make scores now.
1684
01:18:13,870 --> 01:18:14,950
It now compiles.
1685
01:18:14,950 --> 01:18:19,870
./scores gives me, indeed, the same rough average with those same values.
1686
01:18:19,870 --> 01:18:24,280
All right, so let me go ahead and maybe enhance this a little bit.
1687
01:18:24,280 --> 01:18:26,920
It's a little silly to have to write a special program just
1688
01:18:26,920 --> 01:18:31,610
to check your average of three test scores like 72, 73, 33.
1689
01:18:31,610 --> 01:18:33,550
Why don't I actually make the program dynamic
1690
01:18:33,550 --> 01:18:37,250
and ask the human for those scores?
1691
01:18:37,250 --> 01:18:39,140
So instead, let me do this.
1692
01:18:39,140 --> 01:18:43,480
How about we get rid of the 72, and change this to getInt.
1693
01:18:43,480 --> 01:18:46,300
And I'll just prompt the user for a score.
1694
01:18:46,300 --> 01:18:52,510
Let me get rid of the 73 and get this to be getInt score, quote unquote.
1695
01:18:52,510 --> 01:18:56,560
And then lastly, get rid of the 33, and replace it with getInt, quote unquote,
1696
01:18:56,560 --> 01:18:57,670
score.
1697
01:18:57,670 --> 01:19:03,680
getInt is a CS50 thing for now, so I need to include cs50.h, as always.
1698
01:19:03,680 --> 01:19:05,650
But I think now, it's sort of a better program
1699
01:19:05,650 --> 01:19:08,680
because now I can compile it once, I can even share it with my friends.
1700
01:19:08,680 --> 01:19:12,490
And now any of us can average three scores on some classes test.
1701
01:19:12,490 --> 01:19:15,190
They don't need to know the code or rewrite the code just
1702
01:19:15,190 --> 01:19:16,910
to type in their scores.
1703
01:19:16,910 --> 01:19:19,150
So make scores worked.
1704
01:19:19,150 --> 01:19:25,120
./scores, now I can type anything I want-- maybe it's a 72, 73, 33,
1705
01:19:25,120 --> 01:19:26,320
still get the same answer.
1706
01:19:26,320 --> 01:19:31,210
Or maybe I'm having a better semester, 100, 100, maybe 99,
1707
01:19:31,210 --> 01:19:33,520
and now we get still a pretty high score there.
1708
01:19:33,520 --> 01:19:34,600
But now it's dynamic.
1709
01:19:34,600 --> 01:19:36,080
Now you don't need the source code.
1710
01:19:36,080 --> 01:19:37,747
You don't need to recompile the program.
1711
01:19:37,747 --> 01:19:39,670
It's just going to work again and again.
1712
01:19:39,670 --> 01:19:41,090
But this, too.
1713
01:19:41,090 --> 01:19:43,660
Let me propose that this code is correct if I
1714
01:19:43,660 --> 01:19:45,910
want to get three scores from the user.
1715
01:19:45,910 --> 01:19:50,950
But these highlighted lines now, 6 through 9, are they well-designed,
1716
01:19:50,950 --> 01:19:53,170
would you say?
1717
01:19:53,170 --> 01:19:53,680
Yeah?
1718
01:19:53,680 --> 01:19:54,898
AUDIENCE: Can you loop?
1719
01:19:54,898 --> 01:19:55,940
DAVID MALAN: Yeah, right?
1720
01:19:55,940 --> 01:19:58,220
This is-- we can use a loop, is the spoiler here.
1721
01:19:58,220 --> 01:19:58,820
Why?
1722
01:19:58,820 --> 01:20:01,590
I mean, my God, it's like the same code again and again and again.
1723
01:20:01,590 --> 01:20:03,465
The only thing that's changing is the number.
1724
01:20:03,465 --> 01:20:06,170
And this should have kind of had some code smell again,
1725
01:20:06,170 --> 01:20:09,080
because if I keep typing the same thing again and again,
1726
01:20:09,080 --> 01:20:11,810
that's clearly an opportunity to better design something.
1727
01:20:11,810 --> 01:20:13,650
So let me do this.
1728
01:20:13,650 --> 01:20:18,590
Let me go ahead and still create my array of size three.
1729
01:20:18,590 --> 01:20:23,270
But let me use our old friend, the for loop, for int i equals 0,
1730
01:20:23,270 --> 01:20:26,610
i less than 3, i++.
1731
01:20:26,610 --> 01:20:29,510
And then in here, let me do scores bracket--
1732
01:20:29,510 --> 01:20:32,920
we haven't seen this before, but any intuition?
1733
01:20:32,920 --> 01:20:34,220
Scores bracket--
1734
01:20:34,220 --> 01:20:34,720
AUDIENCE: i.
1735
01:20:34,720 --> 01:20:39,730
DAVID MALAN: i, because that will use whatever i is, be it 0 or 1 or 2
1736
01:20:39,730 --> 01:20:40,720
in iteration.
1737
01:20:40,720 --> 01:20:43,780
And then I can get an int, asking the user for score,
1738
01:20:43,780 --> 01:20:47,000
without having to repeat myself again and again.
1739
01:20:47,000 --> 01:20:50,560
So hopefully, if I didn't make any typos, make scores, all good.
1740
01:20:50,560 --> 01:20:54,665
./scores, 72, 73, 33, and we're back in business.
1741
01:20:54,665 --> 01:20:56,540
But the code is arguably now better designed,
1742
01:20:56,540 --> 01:21:01,240
because now, I haven't actually hardcoded the scores,
1743
01:21:01,240 --> 01:21:04,940
and I haven't actually copied and pasted any of that code.
1744
01:21:04,940 --> 01:21:08,230
Well, if we consider now what's going on inside of the computer's memory,
1745
01:21:08,230 --> 01:21:10,510
it's pretty much the same in terms of the values.
1746
01:21:10,510 --> 01:21:15,490
But instead of the variables being, literally, score1, score2, score3,
1747
01:21:15,490 --> 01:21:17,210
there's just one variable.
1748
01:21:17,210 --> 01:21:19,030
It's an array called scores.
1749
01:21:19,030 --> 01:21:24,550
But you can index into its three locations by using scores[0] to get
1750
01:21:24,550 --> 01:21:28,810
the first, scores[1] to get the second, scores[2] to get the third.
1751
01:21:28,810 --> 01:21:29,990
But this is key.
1752
01:21:29,990 --> 01:21:33,040
The memory is contiguous.
1753
01:21:33,040 --> 01:21:35,380
The screen is only so large, so it wraps around.
1754
01:21:35,380 --> 01:21:38,950
But physically, digitally, the memory is contiguous-- top
1755
01:21:38,950 --> 01:21:40,270
to bottom, left to right.
1756
01:21:40,270 --> 01:21:41,530
And that's important, why?
1757
01:21:41,530 --> 01:21:46,060
Because the brackets indicate 0, 1, 2, that each of these integers
1758
01:21:46,060 --> 01:21:48,790
is just one integer away from the next.
1759
01:21:48,790 --> 01:21:51,220
It can't be randomly down here all of a sudden.
1760
01:21:51,220 --> 01:21:54,070
It's got to be back to back to back.
1761
01:21:54,070 --> 01:21:57,130
All right, now equipped with that paradigm,
1762
01:21:57,130 --> 01:22:00,710
what more could we actually do here?
1763
01:22:00,710 --> 01:22:04,270
Well, it turns out, it's worth knowing that it's possible in code
1764
01:22:04,270 --> 01:22:06,850
to even pass arrays around as arguments.
1765
01:22:06,850 --> 01:22:09,100
And let me just whip this program up somewhat quickly,
1766
01:22:09,100 --> 01:22:11,320
just so you've seen it before long.
1767
01:22:11,320 --> 01:22:13,190
But let me go ahead and do this.
1768
01:22:13,190 --> 01:22:18,130
Let me propose that I create a function that does this averaging for me.
1769
01:22:18,130 --> 01:22:22,510
So I'm going to create a function called average that returns a float.
1770
01:22:22,510 --> 01:22:26,860
And the arguments this thing is going to take--
1771
01:22:26,860 --> 01:22:28,640
let's see, it's going to be the array.
1772
01:22:28,640 --> 01:22:31,480
So it turns out, if you want to take in an array of numbers--
1773
01:22:31,480 --> 01:22:33,050
you can call it anything you want.
1774
01:22:33,050 --> 01:22:36,970
This is how you tell C that a function takes, not
1775
01:22:36,970 --> 01:22:39,790
an integer, but an array of integers.
1776
01:22:39,790 --> 01:22:41,290
And you don't have to call it array.
1777
01:22:41,290 --> 01:22:42,790
I'm doing that just for the sake of discussion.
1778
01:22:42,790 --> 01:22:43,660
It can be called x.
1779
01:22:43,660 --> 01:22:44,490
It can be numbers.
1780
01:22:44,490 --> 01:22:45,490
It can be anything else.
1781
01:22:45,490 --> 01:22:49,060
I'm just calling an array to be super explicit as to what it is there.
1782
01:22:49,060 --> 01:22:51,730
Now, how do I change my code down here?
1783
01:22:51,730 --> 01:22:55,130
What I think I'm going to do for the moment is just this.
1784
01:22:55,130 --> 01:22:59,110
I'm going to get rid of this code here, where I manually computed the average.
1785
01:22:59,110 --> 01:23:01,480
And let me just call the average function here
1786
01:23:01,480 --> 01:23:05,000
by passing in the whole array of scores.
1787
01:23:05,000 --> 01:23:07,030
So this is just an example of abstraction,
1788
01:23:07,030 --> 01:23:08,890
like now I have a function called average.
1789
01:23:08,890 --> 01:23:09,670
I don't care.
1790
01:23:09,670 --> 01:23:12,490
I don't have to remember how it works once I implement it.
1791
01:23:12,490 --> 01:23:15,010
It just kind of tightens up my main code a little bit.
1792
01:23:15,010 --> 01:23:17,030
But I do still have to implement this.
1793
01:23:17,030 --> 01:23:19,360
So later in my file-- let me repeat myself before,
1794
01:23:19,360 --> 01:23:22,270
the only time it's OK in C to repeat yourself again and again,
1795
01:23:22,270 --> 01:23:27,010
by typing out again, average, and then int array open bracket--
1796
01:23:27,010 --> 01:23:28,580
but now not a semicolon.
1797
01:23:28,580 --> 01:23:30,250
Now I have to implement this thing.
1798
01:23:30,250 --> 01:23:33,400
And I can implement this in a bunch of different ways,
1799
01:23:33,400 --> 01:23:37,630
but I don't know in advance--
1800
01:23:37,630 --> 01:23:39,040
I can't just do this.
1801
01:23:39,040 --> 01:23:48,400
I can't just do array[0] plus array[1] plus array[2],
1802
01:23:48,400 --> 01:23:52,130
unless this program's only ever going to work on three numbers.
1803
01:23:52,130 --> 01:23:55,460
So let me go ahead and do this.
1804
01:23:55,460 --> 01:23:58,570
Let me first propose that there's a poor design here.
1805
01:23:58,570 --> 01:24:01,930
In my main function, what value have I repeated twice?
1806
01:24:01,930 --> 01:24:05,050
1807
01:24:05,050 --> 01:24:07,550
Among the highlighted lines, what jumps out at you as twice?
1808
01:24:07,550 --> 01:24:09,020
AUDIENCE: The length of the array?
1809
01:24:09,020 --> 01:24:11,520
DAVID MALAN: Yeah, the length of the array, it's just three.
1810
01:24:11,520 --> 01:24:14,720
Now it's not a huge deal that I typed the number three on line 8 and line 9,
1811
01:24:14,720 --> 01:24:17,120
but this is exactly the kind of like shortcut
1812
01:24:17,120 --> 01:24:18,440
that's going to get you in trouble eventually.
1813
01:24:18,440 --> 01:24:18,860
Why?
1814
01:24:18,860 --> 01:24:20,240
Because, eventually, you or someone else is
1815
01:24:20,240 --> 01:24:22,407
going to go in and make the array bigger or smaller,
1816
01:24:22,407 --> 01:24:24,410
and you're not going to realize that magically,
1817
01:24:24,410 --> 01:24:26,270
that same number is in two places.
1818
01:24:26,270 --> 01:24:29,270
And indeed, this is what a programmer would often call a magic number.
1819
01:24:29,270 --> 01:24:31,940
A magic number is one that just kind of appears magically.
1820
01:24:31,940 --> 01:24:35,210
And you're on the honor system to change it here, if you change it here,
1821
01:24:35,210 --> 01:24:36,688
and then you change it over here.
1822
01:24:36,688 --> 01:24:39,230
That's not going to end well if the onus is on the programmer
1823
01:24:39,230 --> 01:24:43,190
to remember where they hardcoded-- that is, wrote out three explicitly.
1824
01:24:43,190 --> 01:24:46,250
So any time you reuse a value like this, you know what?
1825
01:24:46,250 --> 01:24:50,690
We should probably do what we did last week, which was to declare a variable,
1826
01:24:50,690 --> 01:24:53,510
perhaps at the very top of my program, so it's super obvious
1827
01:24:53,510 --> 01:24:56,990
what it is, called, maybe n, and set that equal to 3.
1828
01:24:56,990 --> 01:24:59,030
Better yet, what did I do last week to make sure
1829
01:24:59,030 --> 01:25:02,390
that I can't screw up and accidentally change that value?
1830
01:25:02,390 --> 01:25:03,440
Yeah, constant.
1831
01:25:03,440 --> 01:25:05,810
And the keyword there was just const for short.
1832
01:25:05,810 --> 01:25:09,110
And now I have a global variable-- global in the sense that I can
1833
01:25:09,110 --> 01:25:11,870
access it anywhere-- that is called n.
1834
01:25:11,870 --> 01:25:12,680
It's an int.
1835
01:25:12,680 --> 01:25:14,450
And it's always going to be 3.
1836
01:25:14,450 --> 01:25:18,500
And now I can improve my main function a little bit by just changing
1837
01:25:18,500 --> 01:25:22,662
the 3's to n, so now if I, if a colleague realized, oh, wait a minute,
1838
01:25:22,662 --> 01:25:23,870
there's four tests this year.
1839
01:25:23,870 --> 01:25:25,610
You change n to four, recompile the code,
1840
01:25:25,610 --> 01:25:31,190
and it just works everywhere else, except in my average function.
1841
01:25:31,190 --> 01:25:33,830
Let me change it back to 3, just for consistency.
1842
01:25:33,830 --> 01:25:39,770
This is not going to fly now, to just sum up things like this, for instance,
1843
01:25:39,770 --> 01:25:43,610
and then return this divided by 3.
1844
01:25:43,610 --> 01:25:51,130
Why will this not work now as I've defined it?
1845
01:25:51,130 --> 01:25:52,159
Yeah?
1846
01:25:52,159 --> 01:25:58,030
AUDIENCE: [INAUDIBLE]
1847
01:25:58,030 --> 01:26:00,980
DAVID MALAN: OK, I might be returning an integer value when
1848
01:26:00,980 --> 01:26:02,870
I intend to return a float per this.
1849
01:26:02,870 --> 01:26:05,870
But I think I'm OK because I used that little trick where I made sure
1850
01:26:05,870 --> 01:26:08,810
that at least one of the numbers in my arithmetic expression
1851
01:26:08,810 --> 01:26:11,010
is, in fact, a floating point value.
1852
01:26:11,010 --> 01:26:14,180
And just by adding the point 0, make sure that everything
1853
01:26:14,180 --> 01:26:15,650
gets treated as a float.
1854
01:26:15,650 --> 01:26:17,864
So I think that's OK.
1855
01:26:17,864 --> 01:26:19,034
AUDIENCE: [INAUDIBLE]
1856
01:26:19,034 --> 01:26:20,701
DAVID MALAN: I'm sorry, a little louder.
1857
01:26:20,701 --> 01:26:24,385
AUDIENCE: It just seems like you're [INAUDIBLE]..
1858
01:26:24,385 --> 01:26:25,260
DAVID MALAN: Exactly.
1859
01:26:25,260 --> 01:26:27,093
So left hand's not talking to the right hand
1860
01:26:27,093 --> 01:26:30,210
here, in that my current implementation of average
1861
01:26:30,210 --> 01:26:33,510
is still assuming that there's only going to be three tests or whatever.
1862
01:26:33,510 --> 01:26:35,670
But wait a minute, I just went through the trouble
1863
01:26:35,670 --> 01:26:39,480
of modifying this to be n, generically.
1864
01:26:39,480 --> 01:26:43,205
And if I change this to 4, I'm not going to be happy, perhaps,
1865
01:26:43,205 --> 01:26:46,080
with my average because now I'm going to ignore one of my test scores
1866
01:26:46,080 --> 01:26:46,690
altogether.
1867
01:26:46,690 --> 01:26:48,450
So let me change this back to 3.
1868
01:26:48,450 --> 01:26:51,180
And unfortunately, if it's a variable now,
1869
01:26:51,180 --> 01:26:55,500
n, and therefore, I have literally a variable number of scores,
1870
01:26:55,500 --> 01:27:00,920
how do I take the average of a variable number of things?
1871
01:27:00,920 --> 01:27:02,630
I mean, what's my building block there?
1872
01:27:02,630 --> 01:27:03,170
Yeah?
1873
01:27:03,170 --> 01:27:10,100
AUDIENCE: [INAUDIBLE]
1874
01:27:10,100 --> 01:27:10,850
DAVID MALAN: Yeah.
1875
01:27:10,850 --> 01:27:14,880
Why don't I use a loop that goes through the array and adds things up as you go?
1876
01:27:14,880 --> 01:27:17,360
I mean, kind of like grade school, as you take the average on your calculator
1877
01:27:17,360 --> 01:27:19,730
or paper and pencil, you just keep adding the numbers together,
1878
01:27:19,730 --> 01:27:22,380
and then you divide at the end by the total number of things.
1879
01:27:22,380 --> 01:27:23,520
So how can I do this?
1880
01:27:23,520 --> 01:27:25,730
Well, let me change my implementation of average
1881
01:27:25,730 --> 01:27:30,515
to first declare a variable called sum, or whatever, set it equal to 0.
1882
01:27:30,515 --> 01:27:33,140
So this is like me on my piece of paper getting ready to count,
1883
01:27:33,140 --> 01:27:36,590
or my calculator, of course, when you turn it on, typically defaults to zero.
1884
01:27:36,590 --> 01:27:41,570
And now, let me do for, int i equals 0. i is less than a--
1885
01:27:41,570 --> 01:27:43,700
well, no, I didn't do that.
1886
01:27:43,700 --> 01:27:46,730
i is less than n, i++.
1887
01:27:46,730 --> 01:27:52,640
And now in here, let me go ahead and add to the current sum, whatever
1888
01:27:52,640 --> 01:27:55,910
is in the array's location, i.
1889
01:27:55,910 --> 01:28:00,740
And then down here, I think I can just return some divided by 3.0--
1890
01:28:00,740 --> 01:28:04,560
not 3.0, n, perhaps here.
1891
01:28:04,560 --> 01:28:08,492
And actually, I think I'm going to get-- let's make sure it's a float.
1892
01:28:08,492 --> 01:28:11,450
Let's use the type casting trick just to make sure I don't accidentally
1893
01:28:11,450 --> 01:28:15,540
shortchange someone and throw away everything after the decimal point.
1894
01:28:15,540 --> 01:28:17,300
So it just escalated quickly, right?
1895
01:28:17,300 --> 01:28:18,990
Average just got a lot more involved.
1896
01:28:18,990 --> 01:28:22,130
It's not just a single one line of code, but now it's dynamic.
1897
01:28:22,130 --> 01:28:25,070
I initialize a variable called sum to 0.
1898
01:28:25,070 --> 01:28:30,920
In this loop, I go through and just keep adding to sum, which is initially 0,
1899
01:28:30,920 --> 01:28:33,200
whatever's in array[i]--
1900
01:28:33,200 --> 01:28:36,740
or specifically array[0], array[1], array[2].
1901
01:28:36,740 --> 01:28:40,970
That gives me a total sum that I return, divided by the total number of things.
1902
01:28:40,970 --> 01:28:42,560
Now, this I can tighten slightly.
1903
01:28:42,560 --> 01:28:45,650
Recall that this is syntactic sugar for just adding things.
1904
01:28:45,650 --> 01:28:48,620
I can't use plus plus because that only literally adds one.
1905
01:28:48,620 --> 01:28:52,630
But I can use here, plus equals.
1906
01:28:52,630 --> 01:28:54,880
Questions on this implementation here?
1907
01:28:54,880 --> 01:28:58,000
Really the only takeaway-- or the most important takeaway
1908
01:28:58,000 --> 01:29:00,730
is that this is the syntax for how you tell
1909
01:29:00,730 --> 01:29:04,210
a function that it expects a whole array, not
1910
01:29:04,210 --> 01:29:06,450
a single variable like an int or the like.
1911
01:29:06,450 --> 01:29:08,200
You literally use square brackets, but you
1912
01:29:08,200 --> 01:29:11,530
don't specify the length inside there.
1913
01:29:11,530 --> 01:29:12,748
Yeah?
1914
01:29:12,748 --> 01:29:16,410
AUDIENCE: What variable [INAUDIBLE] at the top?
1915
01:29:16,410 --> 01:29:18,410
DAVID MALAN: What about the variable at the top?
1916
01:29:18,410 --> 01:29:22,205
AUDIENCE: [INAUDIBLE]
1917
01:29:22,205 --> 01:29:23,330
DAVID MALAN: Good question.
1918
01:29:23,330 --> 01:29:25,220
What do I have it defined as at the top?
1919
01:29:25,220 --> 01:29:31,280
This variable, N, it must be an integer if you're going to use it inside
1920
01:29:31,280 --> 01:29:33,840
of an arrays square brackets here.
1921
01:29:33,840 --> 01:29:38,360
So this line 10, notice, no longer says 3, it says N.
1922
01:29:38,360 --> 01:29:42,350
And so whatever N is 3 or 4 or something else, that's how many
1923
01:29:42,350 --> 01:29:43,970
integers I will get in that array.
1924
01:29:43,970 --> 01:29:47,070
And it must be, by definition of an array, an integer that
1925
01:29:47,070 --> 01:29:48,320
goes in those square brackets.
1926
01:29:48,320 --> 01:29:50,000
And here's a common source of confusion.
1927
01:29:50,000 --> 01:29:52,350
When you create the array, that is declare it,
1928
01:29:52,350 --> 01:29:54,350
you use square brackets like this, where you put
1929
01:29:54,350 --> 01:29:56,210
the total number of elements you want.
1930
01:29:56,210 --> 01:29:59,820
When you subsequently use the array, like I'm doing here,
1931
01:29:59,820 --> 01:30:02,690
you don't mention int again-- just like you don't mention int
1932
01:30:02,690 --> 01:30:04,610
again and again once a variable exists.
1933
01:30:04,610 --> 01:30:10,220
You use the square brackets still, but you don't use N. You use 0 or 1 or 2
1934
01:30:10,220 --> 01:30:11,990
or, generically here, i.
1935
01:30:11,990 --> 01:30:14,810
So when C was designed, they sometimes used the same syntax
1936
01:30:14,810 --> 01:30:17,060
for two different ideas or contexts.
1937
01:30:17,060 --> 01:30:17,984
Yeah?
1938
01:30:17,984 --> 01:30:22,645
AUDIENCE: Do you have to include line 6 [INAUDIBLE]??
1939
01:30:22,645 --> 01:30:23,770
DAVID MALAN: Good question.
1940
01:30:23,770 --> 01:30:25,900
Do I have to include line 6?
1941
01:30:25,900 --> 01:30:29,290
Short answer, yes, because of the reason we ran into last week.
1942
01:30:29,290 --> 01:30:32,750
C, or clang really, reads your code top to bottom, left to right.
1943
01:30:32,750 --> 01:30:38,890
And so if the compiler sees some mention of this function average on line 16,
1944
01:30:38,890 --> 01:30:41,800
but you haven't told the compiler that average exists,
1945
01:30:41,800 --> 01:30:43,610
you're going to get an error on the screen.
1946
01:30:43,610 --> 01:30:45,490
So the conventional way to do that is you
1947
01:30:45,490 --> 01:30:48,670
just copy paste the first line of code from the function,
1948
01:30:48,670 --> 01:30:51,260
it's so-called prototype or declaration.
1949
01:30:51,260 --> 01:30:51,760
Yeah?
1950
01:30:51,760 --> 01:30:55,662
AUDIENCE: Is there a library if you don't know the size of the array?
1951
01:30:55,662 --> 01:30:58,120
DAVID MALAN: Really good question, and a perfect segue way.
1952
01:30:58,120 --> 01:31:01,078
Is there a library you can use if you don't know the size of the array?
1953
01:31:01,078 --> 01:31:01,720
No.
1954
01:31:01,720 --> 01:31:07,660
And so if any of you have programmed in Java or Python or other languages,
1955
01:31:07,660 --> 01:31:11,020
you can actually just ask the array, how big is it?
1956
01:31:11,020 --> 01:31:13,778
In C, you and I, the programmers, have to remember it.
1957
01:31:13,778 --> 01:31:15,820
And so short answer, no, there's no function that
1958
01:31:15,820 --> 01:31:17,445
will just automatically do this for us.
1959
01:31:17,445 --> 01:31:20,230
And in fact, let me make a more subtle claim
1960
01:31:20,230 --> 01:31:23,950
that it's fine to use global variables like this if they're really
1961
01:31:23,950 --> 01:31:25,160
for configuration options.
1962
01:31:25,160 --> 01:31:25,660
Why?
1963
01:31:25,660 --> 01:31:28,160
It's just convenient to put them at the very top of the file
1964
01:31:28,160 --> 01:31:30,565
because everyone, you, your colleagues, your TAs
1965
01:31:30,565 --> 01:31:32,440
are going to see them at the top of the code.
1966
01:31:32,440 --> 01:31:36,130
But you really shouldn't be using them everywhere throughout your code.
1967
01:31:36,130 --> 01:31:38,380
It'd be better if the average function, itself, were
1968
01:31:38,380 --> 01:31:40,610
independent of that special variable.
1969
01:31:40,610 --> 01:31:42,025
So by that, I mean this.
1970
01:31:42,025 --> 01:31:46,240
You know what I should really do, if I really want to be well-designed?
1971
01:31:46,240 --> 01:31:51,400
I should pass in the length of the array to the average function.
1972
01:31:51,400 --> 01:31:54,310
I should give the average function a second argument--
1973
01:31:54,310 --> 01:31:57,800
I'll call it length, for instance, but I could call it anything I want.
1974
01:31:57,800 --> 01:32:02,500
And so rather than putting N all the way down here at the bottom of my file,
1975
01:32:02,500 --> 01:32:05,745
let me just dynamically say length instead.
1976
01:32:05,745 --> 01:32:08,620
And this is a subtlety-- and no need to get too tripped up over this.
1977
01:32:08,620 --> 01:32:11,830
But this, now, is just an example of how the same function can
1978
01:32:11,830 --> 01:32:13,690
take not one, but two arguments.
1979
01:32:13,690 --> 01:32:19,400
But indeed, in C, you must remember, yourself, what the length of an array
1980
01:32:19,400 --> 01:32:19,900
is.
1981
01:32:19,900 --> 01:32:22,810
You can't just ask the array via some syntax
1982
01:32:22,810 --> 01:32:26,560
like you can, those of you who've programmed before in Java or Python.
1983
01:32:26,560 --> 01:32:27,070
Yeah?
1984
01:32:27,070 --> 01:32:35,115
AUDIENCE: [INAUDIBLE]
1985
01:32:35,115 --> 01:32:36,240
DAVID MALAN: Good question.
1986
01:32:36,240 --> 01:32:39,198
Would it be better designed to write a function that computes the size?
1987
01:32:39,198 --> 01:32:42,570
Short answer, can't do that in C. As soon as you pass an array
1988
01:32:42,570 --> 01:32:47,263
into a function in C, you cannot figure out its size if it's a generic array
1989
01:32:47,263 --> 01:32:48,180
like that of integers.
1990
01:32:48,180 --> 01:32:51,040
There are special cases that you can do that.
1991
01:32:51,040 --> 01:32:53,283
But in general, no, it's just not possible in C.
1992
01:32:53,283 --> 01:32:55,200
And if that's some frustration, honestly, this
1993
01:32:55,200 --> 01:32:57,180
is why more modern languages add that feature.
1994
01:32:57,180 --> 01:32:57,680
Why?
1995
01:32:57,680 --> 01:32:59,910
Because it was really annoying, as I'm alluding here
1996
01:32:59,910 --> 01:33:01,560
to not having that information.
1997
01:33:01,560 --> 01:33:03,643
Now, just to make sure I didn't screw up anywhere,
1998
01:33:03,643 --> 01:33:07,540
let me compile this final version of scores.
1999
01:33:07,540 --> 01:33:08,620
Suspense.
2000
01:33:08,620 --> 01:33:14,030
All good. ./scores, 72, 73, 33, and we're still back in business.
2001
01:33:14,030 --> 01:33:15,530
So this version is more complicated.
2002
01:33:15,530 --> 01:33:18,738
And as always, we'll have this version on the course's website for reference.
2003
01:33:18,738 --> 01:33:20,740
But the point, really, is that arrays, not only
2004
01:33:20,740 --> 01:33:23,290
can be used as containers to store multiple values--
2005
01:33:23,290 --> 01:33:25,490
three or more in this case--
2006
01:33:25,490 --> 01:33:30,440
you can also even pass them around as arguments, as such.
2007
01:33:30,440 --> 01:33:34,300
All right, now besides that, let's simplify for just a moment,
2008
01:33:34,300 --> 01:33:36,100
and consider now the world of chars.
2009
01:33:36,100 --> 01:33:39,200
If we've just got single bytes, where does this lead us?
2010
01:33:39,200 --> 01:33:41,200
And how does this get us, ultimately, to strings
2011
01:33:41,200 --> 01:33:44,170
to solve problems like readability and cryptography and the like?
2012
01:33:44,170 --> 01:33:46,390
Well here, for instance, are three lines of code,
2013
01:33:46,390 --> 01:33:48,967
out of context, that simply store three chars.
2014
01:33:48,967 --> 01:33:50,800
And you can already see where this is going.
2015
01:33:50,800 --> 01:33:53,920
Having three variables called c1, c2, c3 is clearly
2016
01:33:53,920 --> 01:33:57,470
going to end up being bad design because of all the silly redundancy here.
2017
01:33:57,470 --> 01:33:59,650
But notice, I'm using single quotes like last week
2018
01:33:59,650 --> 01:34:01,330
because these are single chars.
2019
01:34:01,330 --> 01:34:03,647
What does this look like in the computer's memory?
2020
01:34:03,647 --> 01:34:05,480
Well, it looks a little something like this.
2021
01:34:05,480 --> 01:34:09,730
If we clear out the old memory, c1, c2, c3 probably
2022
01:34:09,730 --> 01:34:12,562
will end up here, maybe not literally in the top left-hand corner.
2023
01:34:12,562 --> 01:34:14,020
This is just an artist's rendition.
2024
01:34:14,020 --> 01:34:18,440
But c1, c2, c3 will probably end up like that.
2025
01:34:18,440 --> 01:34:20,020
Now, what's really there?
2026
01:34:20,020 --> 01:34:21,730
It's really those same three numbers--
2027
01:34:21,730 --> 01:34:23,350
72, 73, 33.
2028
01:34:23,350 --> 01:34:27,920
But how many bits does a byte have?
2029
01:34:27,920 --> 01:34:28,880
Just eight.
2030
01:34:28,880 --> 01:34:33,830
So if we were to look at the binary representation of these characters,
2031
01:34:33,830 --> 01:34:35,330
it would only be eight bits each.
2032
01:34:35,330 --> 01:34:39,140
That's enough to store small numbers like 72, 73, 33.
2033
01:34:39,140 --> 01:34:41,580
We're not dealing with Unicode and emoji and the like.
2034
01:34:41,580 --> 01:34:42,837
But the point is the same.
2035
01:34:42,837 --> 01:34:45,170
You don't have to use four bytes to store these numbers.
2036
01:34:45,170 --> 01:34:48,087
You can use a different data type like chars, and underneath the hood,
2037
01:34:48,087 --> 01:34:51,420
it's, indeed, going to use just single bytes for each.
2038
01:34:51,420 --> 01:34:55,850
But this is sort of like a-- this isn't really how we implement strings, right?
2039
01:34:55,850 --> 01:34:59,270
When you wanted to say, hi, last week, or this, we used double quotes.
2040
01:34:59,270 --> 01:35:02,400
And we wrote all of the things together and used one variable, not three,
2041
01:35:02,400 --> 01:35:02,900
right?
2042
01:35:02,900 --> 01:35:06,260
When I typed in David, I didn't have a variable for D-A-V-I-D.
2043
01:35:06,260 --> 01:35:09,750
I had one variable called name that stored the whole thing.
2044
01:35:09,750 --> 01:35:13,310
So in C, we keep talking about these things called strings.
2045
01:35:13,310 --> 01:35:17,427
We'll see, eventually, that strings are not necessarily what they seem to be.
2046
01:35:17,427 --> 01:35:19,760
But for now, the key thing about strings is that they're
2047
01:35:19,760 --> 01:35:22,070
variable length, so to speak, right?
2048
01:35:22,070 --> 01:35:25,250
They might be three characters, Hi, or five characters, David,
2049
01:35:25,250 --> 01:35:28,250
or anything smaller or larger.
2050
01:35:28,250 --> 01:35:30,980
So how do we go about implementing strings,
2051
01:35:30,980 --> 01:35:33,110
if all we have at the end of the day is my memory?
2052
01:35:33,110 --> 01:35:36,290
Well, here is an example of just creating, declaring,
2053
01:35:36,290 --> 01:35:39,650
and defining a string called s. s because it's just a simple string,
2054
01:35:39,650 --> 01:35:41,900
and quote unquote, HI!, in double quotes.
2055
01:35:41,900 --> 01:35:44,090
What does this look like in the computer's memory?
2056
01:35:44,090 --> 01:35:45,230
Well, let's clear it again.
2057
01:35:45,230 --> 01:35:48,110
And here, now, because it's technically stored in one variable,
2058
01:35:48,110 --> 01:35:50,960
s, here is how I might draw it as an artist.
2059
01:35:50,960 --> 01:35:52,520
It's three bytes in total--
2060
01:35:52,520 --> 01:35:53,990
H-I exclamation point.
2061
01:35:53,990 --> 01:35:59,630
But there's no c1, c2, c3, it's just, the whole thing is s.
2062
01:35:59,630 --> 01:36:03,800
But it turns out that a string, fun fact,
2063
01:36:03,800 --> 01:36:06,990
is really just what underneath the hood?
2064
01:36:06,990 --> 01:36:09,610
Kind of leading up to this--
2065
01:36:09,610 --> 01:36:12,090
what is a string, if this is how it's laid out in memory?
2066
01:36:12,090 --> 01:36:13,190
AUDIENCE: An array.
2067
01:36:13,190 --> 01:36:15,830
DAVID MALAN: Literally, it's just an array of characters.
2068
01:36:15,830 --> 01:36:18,590
And we didn't have to know about arrays last week to use strings.
2069
01:36:18,590 --> 01:36:21,382
This is where, again, the training wheels are starting to come off.
2070
01:36:21,382 --> 01:36:23,730
But a string is just an array of characters.
2071
01:36:23,730 --> 01:36:26,040
H-I exclamation point, for instance.
2072
01:36:26,040 --> 01:36:28,370
So technically, an array--
2073
01:36:28,370 --> 01:36:33,890
or a string called s is really a variable called s that allows you
2074
01:36:33,890 --> 01:36:38,150
to get at the first character with s[0], if you want-- s[1], s[2].
2075
01:36:38,150 --> 01:36:40,340
You can literally get individual characters
2076
01:36:40,340 --> 01:36:43,820
just by treating s as though it's an array, which it really
2077
01:36:43,820 --> 01:36:47,000
is underneath the hood, in this case.
2078
01:36:47,000 --> 01:36:48,560
But there's a catch.
2079
01:36:48,560 --> 01:36:51,500
How do you know where strings end?
2080
01:36:51,500 --> 01:36:54,560
In the past, when I drew some integers on the screen,
2081
01:36:54,560 --> 01:36:57,080
I know, I claim they always take up 4 bytes.
2082
01:36:57,080 --> 01:37:00,200
If I had drawn a long, it always takes up 8 bytes.
2083
01:37:00,200 --> 01:37:03,530
If I had drawn a character, it always takes up 1 byte.
2084
01:37:03,530 --> 01:37:06,533
But how many bytes does a string take up?
2085
01:37:06,533 --> 01:37:08,450
Yeah, I mean, that's kind of the right answer.
2086
01:37:08,450 --> 01:37:10,490
In this case, three, it would seem.
2087
01:37:10,490 --> 01:37:13,490
But if it's David, that's a good five characters.
2088
01:37:13,490 --> 01:37:16,173
But where do we put the number three?
2089
01:37:16,173 --> 01:37:17,840
Where do you put the number five, right?
2090
01:37:17,840 --> 01:37:20,190
This is literally all that's inside your computer.
2091
01:37:20,190 --> 01:37:23,430
This is all our building blocks in front of us.
2092
01:37:23,430 --> 01:37:25,490
So how can we-- where does the three go?
2093
01:37:25,490 --> 01:37:26,540
Where does the five go?
2094
01:37:26,540 --> 01:37:29,420
Well, it turns out you can solve this in a couple of different ways.
2095
01:37:29,420 --> 01:37:34,160
But the way humans decided to implement strings years ago is, indeed, an array,
2096
01:37:34,160 --> 01:37:38,960
but they added one extra byte at the end of every such string array,
2097
01:37:38,960 --> 01:37:41,840
just to make clear, with a so-called sentinel value,
2098
01:37:41,840 --> 01:37:44,480
that the string ends here.
2099
01:37:44,480 --> 01:37:45,050
Why?
2100
01:37:45,050 --> 01:37:47,930
So that if you have two strings in the computer's memory like, HI!
2101
01:37:47,930 --> 01:37:52,760
and bye, you know where the barrier is between the exclamation point of one
2102
01:37:52,760 --> 01:37:54,590
and the letter B in the next, right?
2103
01:37:54,590 --> 01:37:56,000
You need some kind of delimiter.
2104
01:37:56,000 --> 01:38:00,110
And so what really is underneath the hood is this.
2105
01:38:00,110 --> 01:38:04,460
When you store a string in memory, when you type in a string-- as the user,
2106
01:38:04,460 --> 01:38:07,040
if you type in 3 characters, it's going to use
2107
01:38:07,040 --> 01:38:10,280
3 plus 1 equals 4 bytes in total.
2108
01:38:10,280 --> 01:38:14,130
If you type in David, it's going to use 5 plus 1 equals 6 bytes in total.
2109
01:38:14,130 --> 01:38:14,630
Why?
2110
01:38:14,630 --> 01:38:20,210
Because C automatically adds this special 0 at the end of the string.
2111
01:38:20,210 --> 01:38:24,710
I've drawn it with backslash 0 because this is how you represent 0 as a char,
2112
01:38:24,710 --> 01:38:25,710
as a character.
2113
01:38:25,710 --> 01:38:28,230
But this is literally just 0, as we'll soon see.
2114
01:38:28,230 --> 01:38:31,100
So any time there's a string in memory, it always takes up
2115
01:38:31,100 --> 01:38:36,197
one more byte than you, yourself, as the programmer or human typed in.
2116
01:38:36,197 --> 01:38:38,780
In fact, if we convert this again, just for discussion's sake,
2117
01:38:38,780 --> 01:38:41,572
to those integers, what's literally stored in the computer's memory
2118
01:38:41,572 --> 01:38:45,170
is going to be 72, 73, 33, and now a 0.
2119
01:38:45,170 --> 01:38:48,240
And the computer, because of C and how it was invented,
2120
01:38:48,240 --> 01:38:51,350
it's just smart enough to know that when you print out a string,
2121
01:38:51,350 --> 01:38:54,530
it prints out every character until it sees a 0,
2122
01:38:54,530 --> 01:38:56,150
and then it just stops printing.
2123
01:38:56,150 --> 01:38:58,470
In particular, printf knows how this works.
2124
01:38:58,470 --> 01:39:02,050
And this is why printf knows when to stop printing.
2125
01:39:02,050 --> 01:39:03,800
Decimal numbers are not that enlightening.
2126
01:39:03,800 --> 01:39:05,940
We'll generally write the characters like this.
2127
01:39:05,940 --> 01:39:09,350
And again, backslash 0 is just special symbology.
2128
01:39:09,350 --> 01:39:13,190
It's what the programmer types to make clear that you're not saying, HI!, 0.
2129
01:39:13,190 --> 01:39:15,980
You're saying HI!, and then it's a special 0.
2130
01:39:15,980 --> 01:39:20,887
Specifically, it is eight 0 bits that indicate
2131
01:39:20,887 --> 01:39:22,220
that it's the end of the string.
2132
01:39:22,220 --> 01:39:26,330
Technically, that backslash zero, if you want to be fancy, it's called null,
2133
01:39:26,330 --> 01:39:27,320
N-U-L-L.
2134
01:39:27,320 --> 01:39:30,320
And it turns out, you've seen this before, though we didn't call it out.
2135
01:39:30,320 --> 01:39:33,230
Here's that same ASCII chart from the past couple of weeks.
2136
01:39:33,230 --> 01:39:39,080
If I highlight this, what is decimal number 0 mapping to?
2137
01:39:39,080 --> 01:39:42,830
NUL, which is just programmer speak for the special null character.
2138
01:39:42,830 --> 01:39:46,550
All 0 bits that means the string ends here.
2139
01:39:46,550 --> 01:39:48,510
This all happens automatically for you.
2140
01:39:48,510 --> 01:39:53,420
You do not need to create these null characters or these zeros.
2141
01:39:53,420 --> 01:40:00,030
Any questions then, on this implementation thus far?
2142
01:40:00,030 --> 01:40:01,820
Any questions here?
2143
01:40:01,820 --> 01:40:02,320
No?
2144
01:40:02,320 --> 01:40:03,195
Well, let me do this.
2145
01:40:03,195 --> 01:40:05,310
Let me go back to VS Code in a second.
2146
01:40:05,310 --> 01:40:07,770
And let's actually corroborate this with some code.
2147
01:40:07,770 --> 01:40:10,830
Let me go ahead and create a small program called hi.c.
2148
01:40:10,830 --> 01:40:12,070
And how about we do this?
2149
01:40:12,070 --> 01:40:14,550
Let me include stdio.h.
2150
01:40:14,550 --> 01:40:18,670
Let me include-- let me type out int main void, as always.
2151
01:40:18,670 --> 01:40:20,910
And now let me do something simple and kind of bad,
2152
01:40:20,910 --> 01:40:24,960
but char c1 equals quote unquote, h, in single quotes.
2153
01:40:24,960 --> 01:40:28,590
Char c2 equals quote unquote, I, in single quotes.
2154
01:40:28,590 --> 01:40:32,830
And lastly, char c3 equals exclamation point, in single quotes.
2155
01:40:32,830 --> 01:40:34,500
And now, let me just print this out.
2156
01:40:34,500 --> 01:40:36,960
I can't use %s because that is not a string.
2157
01:40:36,960 --> 01:40:40,290
That's literally three chars, because that's the design decision I made.
2158
01:40:40,290 --> 01:40:41,430
But I could do this--
2159
01:40:41,430 --> 01:40:48,600
%c, %c, %c, which we haven't seen before, but %s is string, %i is int,
2160
01:40:48,600 --> 01:40:51,060
%c is, indeed, char.
2161
01:40:51,060 --> 01:40:54,150
So let me put a backslash n at the end for cleanliness,
2162
01:40:54,150 --> 01:40:56,280
and now do, c1, c2, c3.
2163
01:40:56,280 --> 01:41:00,430
So this is like a char-based version of printing string.
2164
01:41:00,430 --> 01:41:01,650
So let me make HI!
2165
01:41:01,650 --> 01:41:05,880
And then let me do ./hi, and it looks like I used printf with %s.
2166
01:41:05,880 --> 01:41:09,750
But I did things very manually by printing out each individual character.
2167
01:41:09,750 --> 01:41:11,700
What's cool now, though, is that once you
2168
01:41:11,700 --> 01:41:15,270
know that characters are just numbers and strings are just characters,
2169
01:41:15,270 --> 01:41:16,560
you can kind of poke around.
2170
01:41:16,560 --> 01:41:21,970
Let me change all three placeholders to %i instead.
2171
01:41:21,970 --> 01:41:23,860
And this is totally fine, too.
2172
01:41:23,860 --> 01:41:26,310
Let me rerun this, make hi.
2173
01:41:26,310 --> 01:41:31,570
Actually, let me make one change, just so we can see this.
2174
01:41:31,570 --> 01:41:37,710
Let me add spaces, just for aesthetics sake, let me do make hi, ./hi, Enter,
2175
01:41:37,710 --> 01:41:40,350
and voila, like now, you can actually see the numbers,
2176
01:41:40,350 --> 01:41:44,085
that I claimed back in week zero, were in fact happening underneath the hood.
2177
01:41:44,085 --> 01:41:45,960
Well, this is not how you would make strings.
2178
01:41:45,960 --> 01:41:49,457
It'd be incredibly tedious to have three variables for three letter words, five
2179
01:41:49,457 --> 01:41:50,790
variables for five letter words.
2180
01:41:50,790 --> 01:41:52,998
We've been using, of course, strings since last week,
2181
01:41:52,998 --> 01:41:54,450
so let's do that instead.
2182
01:41:54,450 --> 01:41:59,370
String s equals quote unquote, double quotes "HI!"
2183
01:41:59,370 --> 01:42:02,520
For this, no, because of these training wheels,
2184
01:42:02,520 --> 01:42:04,560
I need to include the CS50 library.
2185
01:42:04,560 --> 01:42:06,580
But we'll come back to that in the coming weeks.
2186
01:42:06,580 --> 01:42:10,530
But for now, I'm going to go ahead and create a string s called quote unquote,
2187
01:42:10,530 --> 01:42:11,580
"HI!"
2188
01:42:11,580 --> 01:42:14,760
And now I'm going to change this to be my familiar %s,
2189
01:42:14,760 --> 01:42:17,610
and now just print out s itself.
2190
01:42:17,610 --> 01:42:20,430
This, of course, is the same thing as last week, ./hi,
2191
01:42:20,430 --> 01:42:24,750
gives me the exact same thing, but now, we're dealing, of course, with strings.
2192
01:42:24,750 --> 01:42:27,610
But how can we see a little beyond that?
2193
01:42:27,610 --> 01:42:28,810
Well, how about this?
2194
01:42:28,810 --> 01:42:31,530
Let's poke around further with today's primitives.
2195
01:42:31,530 --> 01:42:35,580
Even though s is a string, I could technically print out its first
2196
01:42:35,580 --> 01:42:39,000
character with %c by doing s[0].
2197
01:42:39,000 --> 01:42:43,110
I could technically print out its second character with %c by doing s[1].
2198
01:42:43,110 --> 01:42:47,820
I could print out its third character with %c and printing out s[2].
2199
01:42:47,820 --> 01:42:50,430
So again, this just derives logically from my understanding
2200
01:42:50,430 --> 01:42:52,770
now that strings are arrays, as you note.
2201
01:42:52,770 --> 01:42:54,540
Let me do make--
2202
01:42:54,540 --> 01:42:57,300
let me do make hi, ./hi.
2203
01:42:57,300 --> 01:43:00,760
And no visual change, but I'm just kind of now tinkering around.
2204
01:43:00,760 --> 01:43:03,400
And in fact, if you're really curious, let me do this.
2205
01:43:03,400 --> 01:43:06,870
Let me change these back to i, back to i--
2206
01:43:06,870 --> 01:43:08,250
oops, back to i.
2207
01:43:08,250 --> 01:43:11,310
And let me add a fourth one because if I'm really curious now,
2208
01:43:11,310 --> 01:43:14,490
let's see what's in s[3].
2209
01:43:14,490 --> 01:43:16,020
This is the fourth byte.
2210
01:43:16,020 --> 01:43:18,990
And even though the string itself is H-I,
2211
01:43:18,990 --> 01:43:21,840
I think we can corroborate this whole null thing.
2212
01:43:21,840 --> 01:43:26,248
Make hi, ./hi, Enter, and there it is.
2213
01:43:26,248 --> 01:43:28,290
You could have done this last week, if you really
2214
01:43:28,290 --> 01:43:29,580
wanted to geek out on strings.
2215
01:43:29,580 --> 01:43:33,060
But for now, it's just revealing what's going on underneath the hood.
2216
01:43:33,060 --> 01:43:36,480
Questions then, on what these strings are?
2217
01:43:36,480 --> 01:43:37,498
Yeah?
2218
01:43:37,498 --> 01:43:41,293
AUDIENCE: [INAUDIBLE]
2219
01:43:41,293 --> 01:43:42,960
DAVID MALAN: Why do we need the bracket?
2220
01:43:42,960 --> 01:43:45,430
AUDIENCE: [INAUDIBLE]
2221
01:43:45,430 --> 01:43:47,180
DAVID MALAN: Why do you not need brackets?
2222
01:43:47,180 --> 01:43:47,780
Good question.
2223
01:43:47,780 --> 01:43:51,620
Why do I not need brackets on line 6?
2224
01:43:51,620 --> 01:43:53,300
Because s is a string.
2225
01:43:53,300 --> 01:43:56,930
We'll see in a couple of weeks that s is, essentially,
2226
01:43:56,930 --> 01:44:00,200
implemented underneath the hood, indeed, as an array,
2227
01:44:00,200 --> 01:44:02,240
but that happens automatically for you.
2228
01:44:02,240 --> 01:44:06,800
You can treat s as just a variable name without square brackets.
2229
01:44:06,800 --> 01:44:09,500
You will use square brackets when you have arrays of ints
2230
01:44:09,500 --> 01:44:13,730
or you manually create arrays of chars or doubles or floats or anything else.
2231
01:44:13,730 --> 01:44:14,900
But strings are special.
2232
01:44:14,900 --> 01:44:15,440
Why?
2233
01:44:15,440 --> 01:44:19,190
I mean, every program you write seems to use strings, text in some form.
2234
01:44:19,190 --> 01:44:21,930
We're humans we like text, not just numbers and such.
2235
01:44:21,930 --> 01:44:25,910
So this is just treated a little specially in C and many other languages
2236
01:44:25,910 --> 01:44:28,580
as well.
2237
01:44:28,580 --> 01:44:31,170
Other questions on this here?
2238
01:44:31,170 --> 01:44:31,670
No?
2239
01:44:31,670 --> 01:44:33,530
Let's add then, one other string to the mix.
2240
01:44:33,530 --> 01:44:36,290
So instead of just saying, HI!, why don't we consider a version
2241
01:44:36,290 --> 01:44:38,660
of the program that says both, HI! and BYE!.
2242
01:44:38,660 --> 01:44:41,420
And I claim now that that backslash zero,
2243
01:44:41,420 --> 01:44:44,270
that null character is going to be ever more important now
2244
01:44:44,270 --> 01:44:46,820
if we've got two strings in memory, so that C knows
2245
01:44:46,820 --> 01:44:48,570
how to distinguish one from the other.
2246
01:44:48,570 --> 01:44:51,487
So let me go ahead and just get rid of these two lines for the moment.
2247
01:44:51,487 --> 01:44:55,430
Let me recreate string s equals, quote unquote double quotes, "HI!"
2248
01:44:55,430 --> 01:44:56,780
Let me give myself another one.
2249
01:44:56,780 --> 01:44:59,905
And because I'm just playing around, I'll choose very short variable names.
2250
01:44:59,905 --> 01:45:04,410
String t equals quote unquote, "BYE!"
2251
01:45:04,410 --> 01:45:06,470
And then let me just print them both out.
2252
01:45:06,470 --> 01:45:11,300
Let me go ahead and print out %s, backslash n, comma s,
2253
01:45:11,300 --> 01:45:16,910
and then printf %s backslash n, and then t.
2254
01:45:16,910 --> 01:45:19,970
So very simple demonstration of just these two variables.
2255
01:45:19,970 --> 01:45:26,090
Make hi, ./hi, and of course, it prints out two lines, one after the other.
2256
01:45:26,090 --> 01:45:27,980
What's actually going on underneath the hood?
2257
01:45:27,980 --> 01:45:29,510
Well, let's go back to the computer's memory.
2258
01:45:29,510 --> 01:45:32,160
HI!, I think, is going to be, I claim, pretty much the same.
2259
01:45:32,160 --> 01:45:36,170
So s, I'll claim, is in the top left, followed by the backslash zero.
2260
01:45:36,170 --> 01:45:40,035
And that's important now because BYE! probably is going to end up there.
2261
01:45:40,035 --> 01:45:43,160
And visually, it wraps just by nature of how I've drawn this grid of bytes,
2262
01:45:43,160 --> 01:45:44,330
but it's contiguous.
2263
01:45:44,330 --> 01:45:46,340
B-Y-E-!
2264
01:45:46,340 --> 01:45:51,470
null, A.K.A. backslash zero, this is now helpful to printf
2265
01:45:51,470 --> 01:45:55,550
because now printf knows where one begins and ends
2266
01:45:55,550 --> 01:45:58,580
by way of that special null character.
2267
01:45:58,580 --> 01:46:00,230
But we can poke around now, too.
2268
01:46:00,230 --> 01:46:01,620
What else can I do here?
2269
01:46:01,620 --> 01:46:02,840
How about this?
2270
01:46:02,840 --> 01:46:08,870
How about I go into my code here, back to VS code, and let me go ahead
2271
01:46:08,870 --> 01:46:13,790
and say something like, well, if I've got two of these strings,
2272
01:46:13,790 --> 01:46:15,410
you know, let's put them in an array.
2273
01:46:15,410 --> 01:46:20,520
Let's kind of do this sort of arrays in arrays, sort of inception-style here.
2274
01:46:20,520 --> 01:46:23,060
So string words[2].
2275
01:46:23,060 --> 01:46:25,100
So give me an array of two strings is what
2276
01:46:25,100 --> 01:46:28,100
I'm saying here in code, even though we've not done it with strings yet.
2277
01:46:28,100 --> 01:46:29,270
We only did it with ints.
2278
01:46:29,270 --> 01:46:30,770
And now let me do this.
2279
01:46:30,770 --> 01:46:35,480
The first word A.K.A. words[0] will equal, as before, HI!
2280
01:46:35,480 --> 01:46:40,940
And now words[1] will equal quote unquote, "BYE!"
2281
01:46:40,940 --> 01:46:43,760
And now I've done the exact same thing, but again, I'm
2282
01:46:43,760 --> 01:46:48,650
just avoiding having s, t, q, r, and all these different variables in my code.
2283
01:46:48,650 --> 01:46:52,790
I just now am treating them as one single array of strings.
2284
01:46:52,790 --> 01:46:54,750
How do I change my code down here?
2285
01:46:54,750 --> 01:46:57,380
Well, if I want to print the first word, I do words[0].
2286
01:46:57,380 --> 01:46:59,900
And if I want to print the second word, I do words[1].
2287
01:46:59,900 --> 01:47:02,088
This is not a useful exercise at the moment
2288
01:47:02,088 --> 01:47:04,130
because I'm just making my code more complicated.
2289
01:47:04,130 --> 01:47:06,830
But again, it allows us to poke around and see what's
2290
01:47:06,830 --> 01:47:08,690
going on because there is that HI!
2291
01:47:08,690 --> 01:47:09,530
and BYE!.
2292
01:47:09,530 --> 01:47:10,700
But watch this.
2293
01:47:10,700 --> 01:47:14,670
If I really want to be cool, I can do this.
2294
01:47:14,670 --> 01:47:24,380
Let's print out %c, %c, %c, backslash n, and then here, %c, %c, %c, %c,
2295
01:47:24,380 --> 01:47:25,700
so four of those.
2296
01:47:25,700 --> 01:47:28,430
And now here's where things get interesting.
2297
01:47:28,430 --> 01:47:30,620
Words is an array of strings.
2298
01:47:30,620 --> 01:47:33,400
Again, if I may, what's a string?
2299
01:47:33,400 --> 01:47:35,060
An array of characters.
2300
01:47:35,060 --> 01:47:36,790
So just use the same logic.
2301
01:47:36,790 --> 01:47:41,110
If words is an array of strings, you get at the first string with words[0].
2302
01:47:41,110 --> 01:47:44,530
How do you get at the first character in the first string?
2303
01:47:44,530 --> 01:47:52,150
Bracket 0, words[0][1], and lastly, words[0][2].
2304
01:47:52,150 --> 01:47:57,460
And now down here, words[1], but the first character is there.
2305
01:47:57,460 --> 01:48:00,400
Word[1], the second character is here.
2306
01:48:00,400 --> 01:48:03,190
Words[1], the third character is here--
2307
01:48:03,190 --> 01:48:04,720
whoops-- third character's here.
2308
01:48:04,720 --> 01:48:07,898
And words[1], the fourth character is here.
2309
01:48:07,898 --> 01:48:09,190
This is not how people program.
2310
01:48:09,190 --> 01:48:10,840
This is only for demonstrations sake.
2311
01:48:10,840 --> 01:48:13,060
My God, it's so tedious and verbose already.
2312
01:48:13,060 --> 01:48:20,410
But if I make hi now, ./hi, now, I'm manually reinventing %s,
2313
01:48:20,410 --> 01:48:22,990
if I forgot it existed, using %c alone.
2314
01:48:22,990 --> 01:48:25,900
But you can indeed manipulate arrays in this way.
2315
01:48:25,900 --> 01:48:28,300
But because strings are arrays of characters,
2316
01:48:28,300 --> 01:48:32,200
you can manipulate strings in this way too.
2317
01:48:32,200 --> 01:48:34,675
Any question now on this syntax?
2318
01:48:34,675 --> 01:48:37,210
2319
01:48:37,210 --> 01:48:38,800
Any questions here?
2320
01:48:38,800 --> 01:48:39,460
No?
2321
01:48:39,460 --> 01:48:39,970
No?
2322
01:48:39,970 --> 01:48:42,070
All right, well, let's go ahead and propose
2323
01:48:42,070 --> 01:48:45,830
that we solve a couple of other problems we might not have as before.
2324
01:48:45,830 --> 01:48:49,150
But first, a quick visual of what's been going on underneath the hood here.
2325
01:48:49,150 --> 01:48:52,420
If here, again, is where we left off on the screen, HI! and BYE!
2326
01:48:52,420 --> 01:48:56,470
back to back, here is really how I just treated these things.
2327
01:48:56,470 --> 01:49:00,880
s bracket 0, 1, 2, 3 and then t 0, 1, 2, 3, 4.
2328
01:49:00,880 --> 01:49:04,840
But really, once I put them in an array, the picture becomes this.
2329
01:49:04,840 --> 01:49:07,030
Words[0] is the whole HI!.
2330
01:49:07,030 --> 01:49:08,680
Words[1] is the whole BYE!.
2331
01:49:08,680 --> 01:49:11,470
But if I really get into the weeds and start indexing
2332
01:49:11,470 --> 01:49:14,980
into individual characters in those strings, all I'm using
2333
01:49:14,980 --> 01:49:20,710
is new syntax in order to represent these same values here.
2334
01:49:20,710 --> 01:49:28,710
Questions then, on these representations before we forge ahead?
2335
01:49:28,710 --> 01:49:29,430
No?
2336
01:49:29,430 --> 01:49:30,030
Yeah?
2337
01:49:30,030 --> 01:49:33,390
AUDIENCE: Does the new line character not [INAUDIBLE]??
2338
01:49:33,390 --> 01:49:36,030
DAVID MALAN: Does the new line character-- say that once more?
2339
01:49:36,030 --> 01:49:38,597
AUDIENCE: Does the new line character take up any space?
2340
01:49:38,597 --> 01:49:40,180
DAVID MALAN: Ah, really good question.
2341
01:49:40,180 --> 01:49:42,730
Does the new line character take up any space?
2342
01:49:42,730 --> 01:49:45,340
It does, so far as printf is concerned.
2343
01:49:45,340 --> 01:49:48,790
But I'm not storing the backslash n in my strings,
2344
01:49:48,790 --> 01:49:53,460
printf is being manually handed that thing instead.
2345
01:49:53,460 --> 01:49:55,520
All right, so let's go ahead then and consider
2346
01:49:55,520 --> 01:49:58,970
how we might solve some problems that have arisen now with these strings,
2347
01:49:58,970 --> 01:50:00,680
as follows here.
2348
01:50:00,680 --> 01:50:02,760
Suppose I-- let's do this.
2349
01:50:02,760 --> 01:50:04,400
Let me go back to VS Code here.
2350
01:50:04,400 --> 01:50:09,980
And let me go ahead and open up a new file called, how about, length.c.
2351
01:50:09,980 --> 01:50:12,680
And let's consider for a moment how I might actually figure out
2352
01:50:12,680 --> 01:50:16,130
what the length of a string is, which is distinct from the length of an array.
2353
01:50:16,130 --> 01:50:19,680
I claimed earlier, you cannot figure out dynamically what the length of an array
2354
01:50:19,680 --> 01:50:20,180
is.
2355
01:50:20,180 --> 01:50:24,020
But I can figure out the length of a string, specifically, because
2356
01:50:24,020 --> 01:50:26,960
of this implementation detail of that null character.
2357
01:50:26,960 --> 01:50:28,500
So let me go ahead and do this.
2358
01:50:28,500 --> 01:50:31,940
Let me include cs50.h in this second program here.
2359
01:50:31,940 --> 01:50:35,090
Let me include stdio.h, as before.
2360
01:50:35,090 --> 01:50:38,120
And let me do this, int main void--
2361
01:50:38,120 --> 01:50:40,970
and the first thing I'll do is just get a string from the user.
2362
01:50:40,970 --> 01:50:43,250
I'll ask the user, as always, for their name.
2363
01:50:43,250 --> 01:50:48,170
So I'll call getString, and say, what's your name, question mark, as always.
2364
01:50:48,170 --> 01:50:51,620
And then down here, if I want to figure out the length of this string
2365
01:50:51,620 --> 01:50:56,210
and print the length out on the screen, well, I
2366
01:50:56,210 --> 01:50:58,465
can kind of do this similar in spirit to the average,
2367
01:50:58,465 --> 01:50:59,840
where I'm accumulating something.
2368
01:50:59,840 --> 01:51:02,600
Let me go ahead and initialize N to 0.
2369
01:51:02,600 --> 01:51:05,120
Let me give myself--
2370
01:51:05,120 --> 01:51:07,035
it's not a for loop because I don't have a--
2371
01:51:07,035 --> 01:51:08,660
I don't know in advance how long it is.
2372
01:51:08,660 --> 01:51:09,980
But what if I do this?
2373
01:51:09,980 --> 01:51:20,600
While the value at name[n] does not equal '/0'--
2374
01:51:20,600 --> 01:51:23,390
crazy syntax at the moment, but it's just the culmination
2375
01:51:23,390 --> 01:51:25,590
of these various building blocks.
2376
01:51:25,590 --> 01:51:28,970
Let me just finish the thought here, n++.
2377
01:51:28,970 --> 01:51:33,656
And then down here, let's just print out, with printf and %i,
2378
01:51:33,656 --> 01:51:38,930
that value of N. So I claim this is going to show me the length of any
2379
01:51:38,930 --> 01:51:43,220
string I type in, whether it's hi or bye or David or anything else.
2380
01:51:43,220 --> 01:51:45,410
I initialize a variable to zero, and that's good
2381
01:51:45,410 --> 01:51:47,535
because that's where you start counting in general.
2382
01:51:47,535 --> 01:51:50,990
While name[0] does not equal backslash zero.
2383
01:51:50,990 --> 01:51:51,930
What is this saying?
2384
01:51:51,930 --> 01:51:55,580
Well, if name is the string the user typed in-- and name is just an array,
2385
01:51:55,580 --> 01:51:56,460
as you noted--
2386
01:51:56,460 --> 01:51:59,390
the name[0] is going to be the first character.
2387
01:51:59,390 --> 01:52:02,750
And I'm asking the question, well, does the first character not equal
2388
01:52:02,750 --> 01:52:03,680
backslash zero?
2389
01:52:03,680 --> 01:52:08,750
And if I type in David, D, it's not, so I keep going and I add 1 to N.
2390
01:52:08,750 --> 01:52:10,750
Then I'm going to check name[1].
2391
01:52:10,750 --> 01:52:13,895
Well, if I typed in David, name[1] is going to be A.
2392
01:52:13,895 --> 01:52:18,020
A does not equal backslash zero, and so it's going to go again and again
2393
01:52:18,020 --> 01:52:18,740
and again.
2394
01:52:18,740 --> 01:52:23,090
But five steps in total later, it's going to get to the byte after
2395
01:52:23,090 --> 01:52:26,480
D-A-V-I-D, realize, wait a minute, that is a backslash n.
2396
01:52:26,480 --> 01:52:29,750
The loop finishes, and I print out the total length.
2397
01:52:29,750 --> 01:52:33,050
Arrays, in general, do not have this null character.
2398
01:52:33,050 --> 01:52:34,910
However, strings do.
2399
01:52:34,910 --> 01:52:38,150
Again, strings are special versus all of the other data types
2400
01:52:38,150 --> 01:52:39,590
we've talked about thus far.
2401
01:52:39,590 --> 01:52:43,220
But how could I, for instance, do this differently?
2402
01:52:43,220 --> 01:52:47,220
Well, let's actually factor this out as a function, as I've commonly done.
2403
01:52:47,220 --> 01:52:50,540
But rather than implement it myself, you know what?
2404
01:52:50,540 --> 01:52:54,140
It turns out what's nice about strings being so common,
2405
01:52:54,140 --> 01:52:57,260
there are many other people who have solved these problems before.
2406
01:52:57,260 --> 01:53:00,290
And in fact, there's a whole string library in C.
2407
01:53:00,290 --> 01:53:04,190
It is used by way of a header file called string.h.
2408
01:53:04,190 --> 01:53:08,400
And what string.h is, is a library of string-related functions.
2409
01:53:08,400 --> 01:53:10,760
In fact, you can see in CS50's manual pages
2410
01:53:10,760 --> 01:53:16,217
for C, the string.h functions, at least those that we recommend as most useful,
2411
01:53:16,217 --> 01:53:18,050
and in particular, if you poke around there,
2412
01:53:18,050 --> 01:53:20,290
you'll see that there's a function called strlen.
2413
01:53:20,290 --> 01:53:22,055
It means string length.
2414
01:53:22,055 --> 01:53:24,680
It was named very succinctly, just because it's a little easier
2415
01:53:24,680 --> 01:53:25,850
to type than string length.
2416
01:53:25,850 --> 01:53:28,800
But strlen tells you the length of a string.
2417
01:53:28,800 --> 01:53:30,990
So how might I use this in my code here?
2418
01:53:30,990 --> 01:53:34,020
Well, it turns out, I can simplify this quite a bit.
2419
01:53:34,020 --> 01:53:37,700
Let me get rid of my loop, get rid of my accounting
2420
01:53:37,700 --> 01:53:40,880
manually, and do something like this-- int n
2421
01:53:40,880 --> 01:53:45,630
equals strlen of the humans name, name.
2422
01:53:45,630 --> 01:53:49,430
And now I'll just use printf, as before, with %i backslash n,
2423
01:53:49,430 --> 01:53:51,290
and output the value of n.
2424
01:53:51,290 --> 01:53:54,380
But there's a bug at the moment.
2425
01:53:54,380 --> 01:53:58,480
What have I forgotten to do?
2426
01:53:58,480 --> 01:54:01,670
Yeah, I have to include the header file at the top of the screen,
2427
01:54:01,670 --> 01:54:03,260
so let me-- at the top of the code.
2428
01:54:03,260 --> 01:54:07,640
So let me also include string.h at the top of my file,
2429
01:54:07,640 --> 01:54:10,970
so that C knows that, in fact, strlen exists.
2430
01:54:10,970 --> 01:54:14,170
Let me go ahead and make length, as before.
2431
01:54:14,170 --> 01:54:18,670
./length-- or actually, really for the first time, what's your name?
2432
01:54:18,670 --> 01:54:22,360
D-A-V-I-D. And hopefully, I'm going to see, in fact, 5.
2433
01:54:22,360 --> 01:54:26,950
By contrast, if I run it again and type in HI!, now I see three.
2434
01:54:26,950 --> 01:54:29,785
So strlen is just one of the functions in that library.
2435
01:54:29,785 --> 01:54:30,910
And there are so many more.
2436
01:54:30,910 --> 01:54:33,700
In fact, yet another library that might be useful moving forward
2437
01:54:33,700 --> 01:54:37,570
is this one, ctype, which relates to C data
2438
01:54:37,570 --> 01:54:40,580
types and lots of functions therein that can be useful.
2439
01:54:40,580 --> 01:54:43,690
For instance, if you review its documentation in the manual pages
2440
01:54:43,690 --> 01:54:46,930
online, you'll see that there are functions via which
2441
01:54:46,930 --> 01:54:49,460
we can solve problems like this.
2442
01:54:49,460 --> 01:54:52,480
Let me go ahead and propose here--
2443
01:54:52,480 --> 01:54:53,680
let me see.
2444
01:54:53,680 --> 01:54:59,080
Let's do an example here involving--
2445
01:54:59,080 --> 01:55:03,250
how about checking if something is uppercase or lowercase,
2446
01:55:03,250 --> 01:55:06,700
and converting it to uppercase only.
2447
01:55:06,700 --> 01:55:10,810
Let me go back to VS Code, and code a program called uppercase.c.
2448
01:55:10,810 --> 01:55:15,220
In this, file I'm going to start by including now, as always, cs50.h.
2449
01:55:15,220 --> 01:55:17,710
I'm going to include stdio.h.
2450
01:55:17,710 --> 01:55:21,670
And I'm going to add one other to the mix, which
2451
01:55:21,670 --> 01:55:26,230
is string.h now too, so I can access the length of things as needed.
2452
01:55:26,230 --> 01:55:28,570
Int main void comes next.
2453
01:55:28,570 --> 01:55:30,460
And then within my main function, I'm going
2454
01:55:30,460 --> 01:55:32,230
to go ahead and declare a string called s.
2455
01:55:32,230 --> 01:55:34,240
I'm going to call getString, as before.
2456
01:55:34,240 --> 01:55:38,170
And I'm going to go ahead and just ask the user for a string called before.
2457
01:55:38,170 --> 01:55:39,670
I want to do a before and after.
2458
01:55:39,670 --> 01:55:41,350
Whatever the user types in is before.
2459
01:55:41,350 --> 01:55:44,770
But I want to force everything to uppercase, thereafter.
2460
01:55:44,770 --> 01:55:48,740
Let me now, in this loop here, do this.
2461
01:55:48,740 --> 01:55:53,800
Let me printf quote unquote, "After," just so we can see this on the screen.
2462
01:55:53,800 --> 01:56:02,440
And let me do four int i gets 0, i is less than strlen of s, i++.
2463
01:56:02,440 --> 01:56:03,610
What am I about to do?
2464
01:56:03,610 --> 01:56:06,190
I'm about to iterate over every character in the string
2465
01:56:06,190 --> 01:56:11,230
from left to right, from 0 on up to, but not through, the length of s.
2466
01:56:11,230 --> 01:56:13,990
And how do I check if something is lowercase,
2467
01:56:13,990 --> 01:56:16,990
so that I can actually force it to uppercase?
2468
01:56:16,990 --> 01:56:19,630
Well, it turns out, I could do this literally.
2469
01:56:19,630 --> 01:56:27,436
If the character in s at location i is greater than or equal to capital A,
2470
01:56:27,436 --> 01:56:31,780
ampersand, ampersand, which means and instead of or, which we saw
2471
01:56:31,780 --> 01:56:37,930
in the past, s[i] is less than or equal to little z, that means,
2472
01:56:37,930 --> 01:56:41,800
logically in English, that this is indeed lowercase.
2473
01:56:41,800 --> 01:56:44,830
How do I now convert it to uppercase, this character?
2474
01:56:44,830 --> 01:56:48,160
Well, I could just literally print out the same character.
2475
01:56:48,160 --> 01:56:52,280
But that would not be the answer here because that's not changing the value.
2476
01:56:52,280 --> 01:56:54,470
But what could I do instead?
2477
01:56:54,470 --> 01:56:59,890
Well, let me actually pull up here real fast the ASCII chart as before,
2478
01:56:59,890 --> 01:57:03,220
and let's see if we can't glean some insight.
2479
01:57:03,220 --> 01:57:05,710
If I pull up the same ASCII chart, and suppose
2480
01:57:05,710 --> 01:57:09,790
the human has typed in a lowercase a, that's 97.
2481
01:57:09,790 --> 01:57:13,240
What letter-- I want to convert it to uppercase
2482
01:57:13,240 --> 01:57:18,660
A, so what number do I want to convert the 97 to, per week zero?
2483
01:57:18,660 --> 01:57:21,000
So 65, we keep coming back to that one.
2484
01:57:21,000 --> 01:57:23,010
What if the user types in lowercase b?
2485
01:57:23,010 --> 01:57:27,550
I want to change the 98 value to 66, and so forth.
2486
01:57:27,550 --> 01:57:30,130
And any quick math, how far apart are those?
2487
01:57:30,130 --> 01:57:33,120
So it's always 32, like uppercase to lowercase
2488
01:57:33,120 --> 01:57:37,990
is always, wonderfully, good design, 32 away, one from the other.
2489
01:57:37,990 --> 01:57:39,100
So what does this mean?
2490
01:57:39,100 --> 01:57:41,350
Well, I think we saw earlier that underneath the hood,
2491
01:57:41,350 --> 01:57:42,600
a char is just a number.
2492
01:57:42,600 --> 01:57:44,340
You can certainly do arithmetic on it.
2493
01:57:44,340 --> 01:57:46,507
And here, again, if you understand these lower level
2494
01:57:46,507 --> 01:57:48,180
primitives, what if I do this?
2495
01:57:48,180 --> 01:57:53,940
Whatever s[i] is, if I know on line 13 that it's lowercase,
2496
01:57:53,940 --> 01:57:57,048
do I want to add or subtract 32?
2497
01:57:57,048 --> 01:57:57,840
AUDIENCE: Subtract.
2498
01:57:57,840 --> 01:58:01,910
DAVID MALAN: So I want to subtract because I want to go from like 97 to 65
2499
01:58:01,910 --> 01:58:06,560
or 98 to 66, so indeed, if you do some quick math, that gives you 32.
2500
01:58:06,560 --> 01:58:10,970
So it's suffices to just treat chars as numbers, subtract the 32,
2501
01:58:10,970 --> 01:58:16,370
and printing it with %c, I think, will just convert lowercase to uppercase.
2502
01:58:16,370 --> 01:58:19,795
If you now fast forward to the real world, Microsoft Word or Google Docs,
2503
01:58:19,795 --> 01:58:22,670
if you've ever chosen the menu option that forces things to uppercase
2504
01:58:22,670 --> 01:58:24,980
or lowercase on occasion, literally, that's
2505
01:58:24,980 --> 01:58:26,480
what Microsoft and Google have done.
2506
01:58:26,480 --> 01:58:29,605
They iterate over every character in the document, check if it's lowercase,
2507
01:58:29,605 --> 01:58:33,810
and if so, they subtract 32 from it and show you the new value.
2508
01:58:33,810 --> 01:58:36,650
What if, though, it is not a lowercase letter?
2509
01:58:36,650 --> 01:58:40,520
I think I can keep it easy and just print out the current letter unchanged,
2510
01:58:40,520 --> 01:58:44,850
if my goal is to simply force things to all uppercase, and that letter,
2511
01:58:44,850 --> 01:58:46,490
then would be s[i].
2512
01:58:46,490 --> 01:58:50,750
So let me go ahead now and make uppercase, hopefully, no errors.
2513
01:58:50,750 --> 01:58:55,670
./uppercase, and I'll now type in David with an uppercase D,
2514
01:58:55,670 --> 01:58:57,120
but lowercase everything else.
2515
01:58:57,120 --> 01:59:00,020
But now the after version is DAVID--
2516
01:59:00,020 --> 01:59:01,190
an aesthetic bug.
2517
01:59:01,190 --> 01:59:04,400
Notice here, I forgot to include, just for prettiness sake,
2518
01:59:04,400 --> 01:59:05,930
a backslash n at the end.
2519
01:59:05,930 --> 01:59:07,640
No problem, I'll add that.
2520
01:59:07,640 --> 01:59:08,870
Let me fix my mistake.
2521
01:59:08,870 --> 01:59:12,050
Make uppercase, ./uppercase, Enter.
2522
01:59:12,050 --> 01:59:14,240
D-A-V-I-D, Enter, and voila.
2523
01:59:14,240 --> 01:59:16,820
And I deliberately added another space after,
2524
01:59:16,820 --> 01:59:19,130
just so they would line up pretty, even though before
2525
01:59:19,130 --> 01:59:22,070
and after have different numbers of letters.
2526
01:59:22,070 --> 01:59:25,630
Questions then, on this implementation of forcing something
2527
01:59:25,630 --> 01:59:28,380
to uppercase, which in and of itself is not all that enlightening,
2528
01:59:28,380 --> 01:59:33,990
but is representative now of how you can leverage these low level primitives.
2529
01:59:33,990 --> 01:59:35,880
Question?
2530
01:59:35,880 --> 01:59:36,380
No?
2531
01:59:36,380 --> 01:59:38,633
All right, well, this honestly is tedious.
2532
01:59:38,633 --> 01:59:40,550
My God, like does Microsoft, Google, everyone,
2533
01:59:40,550 --> 01:59:43,550
you have to literally write out this code just to do something simple?
2534
01:59:43,550 --> 01:59:46,310
Well, no, that's, again, why we have things like libraries.
2535
01:59:46,310 --> 01:59:49,220
And increasingly now, for problem sets, projects, and beyond,
2536
01:59:49,220 --> 01:59:52,040
well, you just use libraries more often off-the-shelf
2537
01:59:52,040 --> 01:59:55,940
so as to solve problems that, surely, other people have had before you.
2538
01:59:55,940 --> 01:59:59,570
So how can I now use this library, ctype.h?
2539
01:59:59,570 --> 02:00:01,320
Well, let me go back into my code.
2540
02:00:01,320 --> 02:00:05,090
Let me include this among my header files here.
2541
02:00:05,090 --> 02:00:08,030
Just so I can skim things easily, I tend to alphabetize my headers.
2542
02:00:08,030 --> 02:00:11,238
But that's not strictly necessary, but it allows me, at a glance, to realize,
2543
02:00:11,238 --> 02:00:13,400
did I or did I not include something I need?
2544
02:00:13,400 --> 02:00:15,570
Now, let me go ahead and do this.
2545
02:00:15,570 --> 02:00:20,390
It turns out if you read the documentation for the C type library,
2546
02:00:20,390 --> 02:00:24,710
there's a function, wonderfully called, if islower,
2547
02:00:24,710 --> 02:00:28,910
that takes in a character as its argument, essentially, so s[i].
2548
02:00:28,910 --> 02:00:32,182
And if that returns true, a Boolean value, if you will,
2549
02:00:32,182 --> 02:00:33,890
well, I'm going to force it to lowercase.
2550
02:00:33,890 --> 02:00:36,560
But I don't have to do this math anymore.
2551
02:00:36,560 --> 02:00:40,610
Turns out, in the C type library, there's also a function called to upper
2552
02:00:40,610 --> 02:00:43,130
that takes a character as input, like s[i],
2553
02:00:43,130 --> 02:00:45,060
and it just does the math for you.
2554
02:00:45,060 --> 02:00:47,270
So that you can abstract away the 32 thing,
2555
02:00:47,270 --> 02:00:50,400
and just know that someone else has solved that problem for you.
2556
02:00:50,400 --> 02:00:53,030
Otherwise, I can leave my code unchanged down below
2557
02:00:53,030 --> 02:00:55,200
because I'm not changing anything else.
2558
02:00:55,200 --> 02:01:00,410
So if I do make uppercase now, and then ./uppercase, D-a-v-i-d,
2559
02:01:00,410 --> 02:01:03,710
with just a capital D, and now it still works.
2560
02:01:03,710 --> 02:01:06,890
But if you read the documentation further, it turns out that to upper
2561
02:01:06,890 --> 02:01:07,520
is smart.
2562
02:01:07,520 --> 02:01:10,220
If you pass in a character to to upper, that's lowercase,
2563
02:01:10,220 --> 02:01:13,040
it obviously converts it to uppercase by doing that math.
2564
02:01:13,040 --> 02:01:17,240
But if you pass in a character to to upper that's already uppercase,
2565
02:01:17,240 --> 02:01:21,540
the documentation you would see tells you that it leaves it unchanged.
2566
02:01:21,540 --> 02:01:23,910
So I can tighten all of this up.
2567
02:01:23,910 --> 02:01:25,880
I can get rid of the whole else.
2568
02:01:25,880 --> 02:01:29,150
I can get rid of the whole if, and arguably now,
2569
02:01:29,150 --> 02:01:33,620
implement a program that's just as correct, but better designed.
2570
02:01:33,620 --> 02:01:34,250
Why?
2571
02:01:34,250 --> 02:01:38,000
Fewer lines of code easier to read, lower probability of mistakes,
2572
02:01:38,000 --> 02:01:39,740
assuming the library is correct.
2573
02:01:39,740 --> 02:01:43,160
It just makes it easier and faster for me, now, to write code.
2574
02:01:43,160 --> 02:01:47,960
So if I now do, one last time, make uppercase, Enter, ./uppercase,
2575
02:01:47,960 --> 02:01:50,190
and type in my name, still working.
2576
02:01:50,190 --> 02:01:53,810
But now notice, we've whittled this down to far fewer lines of code,
2577
02:01:53,810 --> 02:01:57,740
albeit, using now this additional library.
2578
02:01:57,740 --> 02:02:00,140
Questions then on how we did this?
2579
02:02:00,140 --> 02:02:03,930
2580
02:02:03,930 --> 02:02:06,230
Well, even though this code, I daresay, is correct,
2581
02:02:06,230 --> 02:02:09,120
it's not necessarily well-designed just yet.
2582
02:02:09,120 --> 02:02:12,590
In fact, there's one line of code, one function
2583
02:02:12,590 --> 02:02:14,690
call in this current implementation that's
2584
02:02:14,690 --> 02:02:17,900
more inefficient than it needs to be.
2585
02:02:17,900 --> 02:02:20,630
And allow me to draw your attention to this here,
2586
02:02:20,630 --> 02:02:24,320
line 10, wherein we're calling strlen.
2587
02:02:24,320 --> 02:02:27,350
But we're calling it inside of this for loop, specifically,
2588
02:02:27,350 --> 02:02:29,000
inside of the condition.
2589
02:02:29,000 --> 02:02:33,720
And why might that not necessarily be the best idea?
2590
02:02:33,720 --> 02:02:36,810
Well, is the length of the string as changing, ever?
2591
02:02:36,810 --> 02:02:38,950
I mean, certainly not within the span of this loop.
2592
02:02:38,950 --> 02:02:42,840
And so here we are within our for loop on line 10, 11, 12, and 13,
2593
02:02:42,840 --> 02:02:45,242
asking on every iteration that same question.
2594
02:02:45,242 --> 02:02:46,200
What's the length of s?
2595
02:02:46,200 --> 02:02:47,190
What's the length of s?
2596
02:02:47,190 --> 02:02:48,330
What's the length of s?
2597
02:02:48,330 --> 02:02:50,702
And in turn, we're calling strlen every time,
2598
02:02:50,702 --> 02:02:52,660
even though we're getting back the same answer.
2599
02:02:52,660 --> 02:02:54,960
So I daresay a better solution here would
2600
02:02:54,960 --> 02:02:58,230
be to maybe figure out the length of s earlier on in my code,
2601
02:02:58,230 --> 02:02:59,490
and maybe declare a variable.
2602
02:02:59,490 --> 02:03:02,580
Or perhaps do something that's syntactically a little more elegant,
2603
02:03:02,580 --> 02:03:05,070
and in fact, a very common design in a loop like this,
2604
02:03:05,070 --> 02:03:07,860
would be to declare not just one variable like i,
2605
02:03:07,860 --> 02:03:12,060
but to actually declare a second variable called n, for instance, where
2606
02:03:12,060 --> 02:03:16,530
n is just some number, set n equal to the length of s.
2607
02:03:16,530 --> 02:03:18,900
But thereafter, inside of this condition,
2608
02:03:18,900 --> 02:03:24,540
instead of calling strlen of s again and again and again, what might I now do?
2609
02:03:24,540 --> 02:03:28,110
I could instead just compare i against n itself,
2610
02:03:28,110 --> 02:03:31,080
because n now will only be calculated once when it's initialized,
2611
02:03:31,080 --> 02:03:32,730
just as i is initialize to zero.
2612
02:03:32,730 --> 02:03:36,000
And thereafter, we're going to be comparing i, which is changing,
2613
02:03:36,000 --> 02:03:37,350
against n, which will not be.
2614
02:03:37,350 --> 02:03:40,330
So it's going to be marginally more efficient by design.
2615
02:03:40,330 --> 02:03:42,900
Now with that said, a good compiler could also
2616
02:03:42,900 --> 02:03:46,080
recognize that there is this optimization possibility,
2617
02:03:46,080 --> 02:03:47,100
and maybe do it for us.
2618
02:03:47,100 --> 02:03:49,080
But for now, best to get into the habit, best
2619
02:03:49,080 --> 02:03:52,260
to develop the muscle memory for making those better design decisions
2620
02:03:52,260 --> 02:03:54,010
yourselves.
2621
02:03:54,010 --> 02:03:56,380
Questions, then, on how we did this?
2622
02:03:56,380 --> 02:03:58,900
2623
02:03:58,900 --> 02:03:59,650
No?
2624
02:03:59,650 --> 02:04:03,050
All right, a few final building blocks for the day.
2625
02:04:03,050 --> 02:04:07,870
So we started by talking about those command line arguments that clang uses,
2626
02:04:07,870 --> 02:04:13,090
whereby, anything after the command that you type at a prompt, be it make
2627
02:04:13,090 --> 02:04:18,160
or clang or even CD in Linux, any word thereafter, or something
2628
02:04:18,160 --> 02:04:21,350
cryptic like -o is a command line argument.
2629
02:04:21,350 --> 02:04:22,840
It's an input to the command.
2630
02:04:22,840 --> 02:04:26,132
It's different from a function argument because a function argument, of course,
2631
02:04:26,132 --> 02:04:27,280
is an input to a function.
2632
02:04:27,280 --> 02:04:28,345
But it's the same idea.
2633
02:04:28,345 --> 02:04:30,970
It's just different syntax after the dollar sign at the prompt.
2634
02:04:30,970 --> 02:04:33,880
Well, it turns out that command line arguments
2635
02:04:33,880 --> 02:04:37,660
are something you can now use in your own programs
2636
02:04:37,660 --> 02:04:41,800
by accessing words after the prompt.
2637
02:04:41,800 --> 02:04:45,410
And let me propose that we invent this as follows.
2638
02:04:45,410 --> 02:04:49,540
Let me propose that we switch back to VS Code here,
2639
02:04:49,540 --> 02:04:53,560
and I'll open a new file here called greet.c.
2640
02:04:53,560 --> 02:04:56,410
So in greet.c, it's going to be a program that very simply greets
2641
02:04:56,410 --> 02:04:57,070
the user.
2642
02:04:57,070 --> 02:04:59,440
Had we written this last week, we would have done this.
2643
02:04:59,440 --> 02:05:08,200
Include cs50.h, and then include stdio.h, and then int main void,
2644
02:05:08,200 --> 02:05:13,060
and then we might do something simple like string name equals getString,
2645
02:05:13,060 --> 02:05:15,980
quote unquote, "What's your name?"
2646
02:05:15,980 --> 02:05:20,020
And then we would have printed out, as always, Hello, %s,
2647
02:05:20,020 --> 02:05:21,490
and then plugging in that name.
2648
02:05:21,490 --> 02:05:25,300
So this is the same program we've implemented many times, just
2649
02:05:25,300 --> 02:05:26,590
to make sure it works--
2650
02:05:26,590 --> 02:05:29,140
although, nope, that's not quite the same program.
2651
02:05:29,140 --> 02:05:30,940
Semicolon's in the wrong place.
2652
02:05:30,940 --> 02:05:32,960
This now is the same program.
2653
02:05:32,960 --> 02:05:37,610
So make greet, dot ./greet, and I'll type in my own name. hello, David.
2654
02:05:37,610 --> 02:05:38,770
So we're back there.
2655
02:05:38,770 --> 02:05:41,770
Now, what's arguably a little annoying about this program,
2656
02:05:41,770 --> 02:05:44,110
if I type in something else like, Carter,
2657
02:05:44,110 --> 02:05:48,130
Enter, I have to run the program, wait for the prompt, type in my name,
2658
02:05:48,130 --> 02:05:48,910
hit Enter.
2659
02:05:48,910 --> 02:05:52,360
And that's fine, but imagine if every program worked like this.
2660
02:05:52,360 --> 02:05:55,415
Like make, suppose you could only type make, then you wait for a prompt,
2661
02:05:55,415 --> 02:05:58,540
then you type the name of the program you want to make, then you hit Enter.
2662
02:05:58,540 --> 02:06:01,720
Or worse, in Linux when you have to change directories,
2663
02:06:01,720 --> 02:06:05,263
as you might have for problem set one, what if you had to type CD, Enter,
2664
02:06:05,263 --> 02:06:07,930
now type the name of the folder you want to change into, Enter--
2665
02:06:07,930 --> 02:06:09,710
I mean, it just slows life down.
2666
02:06:09,710 --> 02:06:11,470
And so it just gets annoying quickly.
2667
02:06:11,470 --> 02:06:16,070
So command line arguments just let you express your whole thought all at once.
2668
02:06:16,070 --> 02:06:18,200
So how can I do this?
2669
02:06:18,200 --> 02:06:22,450
Well, if I want to express the notion of command line arguments in my code,
2670
02:06:22,450 --> 02:06:25,640
I could do something like this.
2671
02:06:25,640 --> 02:06:28,750
I could, for the very first time, go up and get
2672
02:06:28,750 --> 02:06:33,730
rid of this void, which as of today means, this program takes no command
2673
02:06:33,730 --> 02:06:34,780
line arguments.
2674
02:06:34,780 --> 02:06:37,540
And I can change it to exactly this.
2675
02:06:37,540 --> 02:06:43,490
Int argc, string argv, with brackets.
2676
02:06:43,490 --> 02:06:44,950
Now it's cryptic, admittedly.
2677
02:06:44,950 --> 02:06:46,150
And let me zoom in.
2678
02:06:46,150 --> 02:06:49,300
But I think we can perhaps infer now, what's going on.
2679
02:06:49,300 --> 02:06:52,750
If main now does not have void as its input, which
2680
02:06:52,750 --> 02:06:55,600
means it takes no arguments, surely, the spoiler
2681
02:06:55,600 --> 02:06:59,230
here is that now main will take command line arguments somehow.
2682
02:06:59,230 --> 02:07:05,180
Any guesses as to what argv is or will be?
2683
02:07:05,180 --> 02:07:08,330
What might this represent?
2684
02:07:08,330 --> 02:07:11,390
It's an array of strings, right, by way of the syntax.
2685
02:07:11,390 --> 02:07:13,223
Yeah?
2686
02:07:13,223 --> 02:07:15,480
AUDIENCE: All the characters will be typed out.
2687
02:07:15,480 --> 02:07:16,050
DAVID MALAN: Exactly.
2688
02:07:16,050 --> 02:07:18,550
It will be all of the characters, or really all of the words
2689
02:07:18,550 --> 02:07:19,830
that you type at the prompt.
2690
02:07:19,830 --> 02:07:21,765
Argc, as an int, any guess?
2691
02:07:21,765 --> 02:07:24,360
2692
02:07:24,360 --> 02:07:28,700
Argument count is what it generally stands for, though technically,
2693
02:07:28,700 --> 02:07:30,290
you could call these things anything.
2694
02:07:30,290 --> 02:07:31,520
But this is the convention.
2695
02:07:31,520 --> 02:07:35,780
Because I claimed earlier that arrays don't keep track of their own length,
2696
02:07:35,780 --> 02:07:38,930
if you want to know how many words the human typed at the prompt
2697
02:07:38,930 --> 02:07:41,420
after your program's name, you have to be told,
2698
02:07:41,420 --> 02:07:45,650
not just the array of the words, but the length of that array.
2699
02:07:45,650 --> 02:07:48,530
The strings, you can figure out the length of using strlen,
2700
02:07:48,530 --> 02:07:53,360
but you can't figure out the length of the array of strings, the collection
2701
02:07:53,360 --> 02:07:55,020
of words that the human typed in.
2702
02:07:55,020 --> 02:07:56,760
So how can I now use this?
2703
02:07:56,760 --> 02:07:59,190
Well, let me go ahead and do this.
2704
02:07:59,190 --> 02:08:04,190
Let me go ahead and change this program now just to be printf, quote unquote,
2705
02:08:04,190 --> 02:08:11,630
"hello, %2 /n", then argv[1].
2706
02:08:11,630 --> 02:08:14,780
So this is not the best version of my code yet, but it's my first.
2707
02:08:14,780 --> 02:08:21,020
Make greet, and now let me do ./greet, David all at once.
2708
02:08:21,020 --> 02:08:23,210
Enter, hello, David.
2709
02:08:23,210 --> 02:08:25,820
Now let me run it again, ./greet, Carter.
2710
02:08:25,820 --> 02:08:27,620
Enter, hello, Carter.
2711
02:08:27,620 --> 02:08:29,840
It's a marginal improvement, but I don't have
2712
02:08:29,840 --> 02:08:32,330
to wait for getString to prompt me to hit Enter.
2713
02:08:32,330 --> 02:08:34,370
It's just speeding things up, twice as fast.
2714
02:08:34,370 --> 02:08:36,890
One less command to type in.
2715
02:08:36,890 --> 02:08:41,390
But I deliberately did [1], but what's the beginning of argv?
2716
02:08:41,390 --> 02:08:42,170
It would be [0].
2717
02:08:42,170 --> 02:08:44,730
2718
02:08:44,730 --> 02:08:45,780
Well, what's that?
2719
02:08:45,780 --> 02:08:48,840
This is sometimes useful, though for now, it's not.
2720
02:08:48,840 --> 02:08:54,110
Suppose I recompile my code and run this program now, greet David.
2721
02:08:54,110 --> 02:08:58,598
Anyone want to guess what's in argv[0]?
2722
02:08:58,598 --> 02:08:59,530
AUDIENCE: [INAUDIBLE]
2723
02:08:59,530 --> 02:09:00,220
DAVID MALAN: Say again?
2724
02:09:00,220 --> 02:09:01,230
AUDIENCE: Greet, hello.
2725
02:09:01,230 --> 02:09:04,530
DAVID MALAN: Greet, Enter, hello, ./greet.
2726
02:09:04,530 --> 02:09:08,280
So if you want to sort of inception style your program to figure out what
2727
02:09:08,280 --> 02:09:11,910
its own name is, or at least how it was executed at the command line,
2728
02:09:11,910 --> 02:09:14,460
at the terminal, you can look at argv[0].
2729
02:09:14,460 --> 02:09:17,160
In general, probably not that useful, probably better
2730
02:09:17,160 --> 02:09:21,900
to start looking at [1], which was the first word after the program name.
2731
02:09:21,900 --> 02:09:25,320
And if there were more, I could do this how about argv[2],
2732
02:09:25,320 --> 02:09:27,690
let me add in a second %s.
2733
02:09:27,690 --> 02:09:29,550
Let me recompile greet.
2734
02:09:29,550 --> 02:09:35,490
Let me do ./greet David Malan, Enter, and that, too, now works,
2735
02:09:35,490 --> 02:09:37,112
taking in two words at the prompt.
2736
02:09:37,112 --> 02:09:38,820
If I really want to be smart at this now,
2737
02:09:38,820 --> 02:09:40,445
I could do something like this, though.
2738
02:09:40,445 --> 02:09:44,700
How about if the count of arguments, A.K.A. argc,
2739
02:09:44,700 --> 02:09:49,890
equals equals to, then assume that the human typed in only their first name,
2740
02:09:49,890 --> 02:09:58,440
and do printf hello comma %s /n, and then argv[1].
2741
02:09:58,440 --> 02:10:01,470
Else, if the human did not provide exactly two
2742
02:10:01,470 --> 02:10:04,920
arguments, the name of the program and their own name,
2743
02:10:04,920 --> 02:10:07,890
let's just print out a default value, lest they forgot their name
2744
02:10:07,890 --> 02:10:09,990
or they typed in two names or three names.
2745
02:10:09,990 --> 02:10:13,110
Let's just do, hello comma world as a default.
2746
02:10:13,110 --> 02:10:15,270
And we'll just ignore what the human typed in.
2747
02:10:15,270 --> 02:10:20,850
If I recompile this, make greet, I can do ./greet and David again, Enter.
2748
02:10:20,850 --> 02:10:24,840
Oops-- sorry, what am I missing?
2749
02:10:24,840 --> 02:10:26,640
Yeah, so newbie mistake.
2750
02:10:26,640 --> 02:10:30,090
Else, all right, make greet again.
2751
02:10:30,090 --> 02:10:34,050
./greet, David, Enter, there's my hello, David.
2752
02:10:34,050 --> 02:10:37,870
But if I omit my name, I just get the generic, like a default value.
2753
02:10:37,870 --> 02:10:41,590
And if I get a little curious and I type in both names, then I get ignored too.
2754
02:10:41,590 --> 02:10:42,090
Why?
2755
02:10:42,090 --> 02:10:44,880
Because I just haven't built in support for argc of three.
2756
02:10:44,880 --> 02:10:47,610
I could do anything I want, but now we have access
2757
02:10:47,610 --> 02:10:50,730
to these kinds of building blocks.
2758
02:10:50,730 --> 02:10:52,780
All right, what else might I do here?
2759
02:10:52,780 --> 02:10:57,660
Well, it turns out there might be some final features for us to now execute.
2760
02:10:57,660 --> 02:11:00,090
Notice, though, that in C, despite what you
2761
02:11:00,090 --> 02:11:02,820
might see in books or online tutorials, nowadays,
2762
02:11:02,820 --> 02:11:06,180
the two official formats for defining a main function
2763
02:11:06,180 --> 02:11:11,130
are either this, which we've been using now for two plus weeks or now this,
2764
02:11:11,130 --> 02:11:14,250
whereby, you change the void to int argc,
2765
02:11:14,250 --> 02:11:17,880
and then for now, string argv, and then empty brackets.
2766
02:11:17,880 --> 02:11:20,608
And we'll see that this, too, is a simplification, some training
2767
02:11:20,608 --> 02:11:21,400
wheels if you will.
2768
02:11:21,400 --> 02:11:23,550
But for now, those are the two forms, even
2769
02:11:23,550 --> 02:11:26,550
though you will see in online tutorials and even books, some people
2770
02:11:26,550 --> 02:11:27,840
use main in different ways.
2771
02:11:27,840 --> 02:11:30,142
These are the two now to keep in mind.
2772
02:11:30,142 --> 02:11:32,100
And I'll note that these command line arguments
2773
02:11:32,100 --> 02:11:33,360
are kind of all over the place.
2774
02:11:33,360 --> 02:11:35,590
Didn't probably expect to see this word on the screen here.
2775
02:11:35,590 --> 02:11:36,490
And what does it mean?
2776
02:11:36,490 --> 02:11:37,920
Well, it turns out that for decades-- there's
2777
02:11:37,920 --> 02:11:40,080
actually this program that comes with Linux systems
2778
02:11:40,080 --> 02:11:41,880
in particular called cowsay.
2779
02:11:41,880 --> 02:11:42,510
Why?
2780
02:11:42,510 --> 02:11:45,300
Probably because someone had too much free time once and decided
2781
02:11:45,300 --> 02:11:49,920
to write a program that creates ASCII art out of a cow saying something
2782
02:11:49,920 --> 02:11:51,520
textually on the screen.
2783
02:11:51,520 --> 02:11:55,780
But you use cowsay, just for fun, by way of command line arguments.
2784
02:11:55,780 --> 02:12:00,660
So for instance, let me propose that I go back to VS Code
2785
02:12:00,660 --> 02:12:03,020
here, not because I want to write any code,
2786
02:12:03,020 --> 02:12:04,770
but I just want to use my terminal window.
2787
02:12:04,770 --> 02:12:07,320
And let me maximize my terminal window here.
2788
02:12:07,320 --> 02:12:11,880
And let me go ahead and type in something like, how about cowsay,
2789
02:12:11,880 --> 02:12:13,170
space moo?
2790
02:12:13,170 --> 02:12:14,822
So cowsay is not a program I wrote.
2791
02:12:14,822 --> 02:12:16,030
It's been around for decades.
2792
02:12:16,030 --> 02:12:18,870
But we installed it in VS Code for you in the cloud.
2793
02:12:18,870 --> 02:12:21,330
It takes at least one command line argument.
2794
02:12:21,330 --> 02:12:23,070
What do you want the cow to say?
2795
02:12:23,070 --> 02:12:26,190
I can say, cowsay moo, and hit Enter, and voila, there
2796
02:12:26,190 --> 02:12:29,490
is my ASCII art of a cow saying moo on the screen.
2797
02:12:29,490 --> 02:12:31,090
It can say multiple words.
2798
02:12:31,090 --> 02:12:33,960
So I can say, Hello, world, Enter.
2799
02:12:33,960 --> 02:12:35,800
And now it says, Hello, world.
2800
02:12:35,800 --> 02:12:38,730
So this is just an example of a silly program that uses command line
2801
02:12:38,730 --> 02:12:40,470
arguments, but it takes others too.
2802
02:12:40,470 --> 02:12:43,650
Just like clang, use this convention of hyphens
2803
02:12:43,650 --> 02:12:45,750
to change the output of the program.
2804
02:12:45,750 --> 02:12:49,350
Dash something is just a super common convention with command line arguments
2805
02:12:49,350 --> 02:12:53,520
when you want a very terse notation for some option like output.
2806
02:12:53,520 --> 02:12:56,460
In cowsay, I read the documentation, and it turns out
2807
02:12:56,460 --> 02:12:59,040
there's a dash f command line argument that
2808
02:12:59,040 --> 02:13:03,460
allows you to change the appearance of the cow, if you will.
2809
02:13:03,460 --> 02:13:10,170
So if I do cowsay dash f, duck, and then some other word like quack,
2810
02:13:10,170 --> 02:13:11,640
it's no longer a cow.
2811
02:13:11,640 --> 02:13:15,850
That command line argument turns it into a tiny, adorable duck instead.
2812
02:13:15,850 --> 02:13:19,020
And then lastly, just for fun, because I spent way too much time
2813
02:13:19,020 --> 02:13:20,790
playing with command line arguments.
2814
02:13:20,790 --> 02:13:25,260
Cowsay dash f, dragon, and then how about, rawr, Enter,
2815
02:13:25,260 --> 02:13:27,910
you can even get this on the screen here.
2816
02:13:27,910 --> 02:13:30,150
So this, too, is just an example of what you
2817
02:13:30,150 --> 02:13:34,230
can do with these command line arguments now that we have this building block.
2818
02:13:34,230 --> 02:13:36,960
And there's one final thing we can now do with code.
2819
02:13:36,960 --> 02:13:39,150
There's one last feature today that we'll
2820
02:13:39,150 --> 02:13:41,610
introduce before we now connect all of these dots
2821
02:13:41,610 --> 02:13:47,520
to readability and encryption by talking, lastly, about something called
2822
02:13:47,520 --> 02:13:48,450
exit status.
2823
02:13:48,450 --> 02:13:52,380
It turns out that whenever your main function exits,
2824
02:13:52,380 --> 02:13:55,590
it returns a secret integer that you can figure out,
2825
02:13:55,590 --> 02:13:58,260
as the programmer or an advanced user, what it was.
2826
02:13:58,260 --> 02:14:02,398
And these exit codes, exit statuses, are typically used to indicate errors.
2827
02:14:02,398 --> 02:14:05,190
So for instance, over the past couple of years, if you've used zoom
2828
02:14:05,190 --> 02:14:08,560
and you ever got some kind of error, you might have seen a screen like this.
2829
02:14:08,560 --> 02:14:11,040
It's usually not that helpful, maybe tells you to click
2830
02:14:11,040 --> 02:14:13,050
Report Problem or Contact Support.
2831
02:14:13,050 --> 02:14:16,980
But very often in our human world on Macs, PCs, and phones,
2832
02:14:16,980 --> 02:14:20,010
you see cryptic error codes, like literally numbers
2833
02:14:20,010 --> 02:14:23,640
that probably only Zoom knows, or Microsoft or Google or whatever company
2834
02:14:23,640 --> 02:14:25,050
wrote the software you're using.
2835
02:14:25,050 --> 02:14:28,260
But that number corresponds to a specific error
2836
02:14:28,260 --> 02:14:32,070
that some human somewhere knows might very well happen.
2837
02:14:32,070 --> 02:14:34,950
These are used similarly, although under a different name
2838
02:14:34,950 --> 02:14:38,260
that we'll talk about later in the term, on the web as well.
2839
02:14:38,260 --> 02:14:41,350
Have you ever seen this-- maybe not character, but number?
2840
02:14:41,350 --> 02:14:43,485
So, 404 means what?
2841
02:14:43,485 --> 02:14:44,880
AUDIENCE: Error.
2842
02:14:44,880 --> 02:14:47,790
DAVID MALAN: So error, yes, but really, not found.
2843
02:14:47,790 --> 02:14:48,410
So, why?
2844
02:14:48,410 --> 02:14:49,993
I mean, this is the most arcane thing.
2845
02:14:49,993 --> 02:14:53,000
And we'll talk in a few weeks about what this and other numbers mean,
2846
02:14:53,000 --> 02:14:54,917
but numbers are all around us in technology,
2847
02:14:54,917 --> 02:14:57,500
and they very often mean something to the technical people who
2848
02:14:57,500 --> 02:15:00,270
wrote the software, less so to humans like you and me.
2849
02:15:00,270 --> 02:15:03,230
Why so many of us recognize 404 is kind of weird,
2850
02:15:03,230 --> 02:15:05,900
that like that's been around long enough that we all know it.
2851
02:15:05,900 --> 02:15:10,250
But it really is just a special number that represents an error of some sort.
2852
02:15:10,250 --> 02:15:13,100
So it turns out, the last thing we'll reveal today
2853
02:15:13,100 --> 02:15:15,530
about what we've been taking for granted for two weeks,
2854
02:15:15,530 --> 02:15:18,200
is what the int is in main.
2855
02:15:18,200 --> 02:15:21,650
We've seen, just a moment ago, that the thing in the parentheses, which
2856
02:15:21,650 --> 02:15:24,680
up until now has been void, which means no command line arguments.
2857
02:15:24,680 --> 02:15:29,690
now int argc string argv brackets just means, yes, command line arguments.
2858
02:15:29,690 --> 02:15:31,290
And we've seen how to access them.
2859
02:15:31,290 --> 02:15:33,620
So the last piece of the puzzle, honestly,
2860
02:15:33,620 --> 02:15:37,460
of all the cryptic syntax the past two weeks, is just what int means.
2861
02:15:37,460 --> 02:15:40,610
Int is always there for main, and it indicates
2862
02:15:40,610 --> 02:15:44,300
that main will always return an integer, even though you and I have never
2863
02:15:44,300 --> 02:15:46,010
done so explicitly.
2864
02:15:46,010 --> 02:15:50,450
Usually, main returns 0, by default. But it
2865
02:15:50,450 --> 02:15:53,928
would be weird if you saw an error message saying 0, so 0 is just hidden.
2866
02:15:53,928 --> 02:15:55,470
You would never see it on the screen.
2867
02:15:55,470 --> 02:15:58,670
But it's happening automatically by way of how C is designed.
2868
02:15:58,670 --> 02:16:01,550
So let me write one final program here.
2869
02:16:01,550 --> 02:16:05,750
I'll call it, for instance, status.c to show you these exit statuses.
2870
02:16:05,750 --> 02:16:10,790
Code of status.c, and then up here, let me do something simple like include
2871
02:16:10,790 --> 02:16:18,020
cs50.h, then include stdio.h, and then int main--
2872
02:16:18,020 --> 02:16:21,350
actually, let's use a command line argument. int argc, string argv[],
2873
02:16:21,350 --> 02:16:23,180
so that's copy, paste.
2874
02:16:23,180 --> 02:16:26,000
But now let's do this.
2875
02:16:26,000 --> 02:16:29,280
If argc does not equal to--
2876
02:16:29,280 --> 02:16:30,780
why don't we do something like this?
2877
02:16:30,780 --> 02:16:33,740
Let's not just default to hello, world like last time.
2878
02:16:33,740 --> 02:16:34,770
Let's yell at the user.
2879
02:16:34,770 --> 02:16:38,802
So let's say something like printf missing command line argument,
2880
02:16:38,802 --> 02:16:40,760
so that they know they screwed up and they need
2881
02:16:40,760 --> 02:16:43,160
to run the program again correctly.
2882
02:16:43,160 --> 02:16:51,320
Else, let's go ahead and say, print out, as before, Hello, comma %s,
2883
02:16:51,320 --> 02:16:56,730
and then plug in argv[1], so the human's name from the prompt.
2884
02:16:56,730 --> 02:17:01,910
Now at this point, let me go ahead and run status, ./status,
2885
02:17:01,910 --> 02:17:03,590
and I'll type nothing first.
2886
02:17:03,590 --> 02:17:04,700
I get yelled at.
2887
02:17:04,700 --> 02:17:10,170
This time, I'll type it again. ./status David, and it works properly.
2888
02:17:10,170 --> 02:17:14,090
But now let me show you a somewhat secret, cryptic command.
2889
02:17:14,090 --> 02:17:17,330
You can type this at your prompt, and it's just a coincidence
2890
02:17:17,330 --> 02:17:18,740
that there's another dollar sign.
2891
02:17:18,740 --> 02:17:22,400
Echo $?, totally arcane, but it allows you
2892
02:17:22,400 --> 02:17:25,490
to see what exit status your program has ended with.
2893
02:17:25,490 --> 02:17:27,559
So let me run this again the wrong way.
2894
02:17:27,559 --> 02:17:31,040
./status, I get the error message.
2895
02:17:31,040 --> 02:17:32,780
What was secretly returned?
2896
02:17:32,780 --> 02:17:33,440
I can't see it.
2897
02:17:33,440 --> 02:17:37,280
There's obviously no error screen, but by typing echo $?,
2898
02:17:37,280 --> 02:17:41,420
I can see that, oh, my program automatically, by default, returns
2899
02:17:41,420 --> 02:17:42,170
zero.
2900
02:17:42,170 --> 02:17:46,879
However, if I run it again correctly, ./status David, Enter,
2901
02:17:46,879 --> 02:17:48,690
this is the correct version.
2902
02:17:48,690 --> 02:17:50,629
But if I run echo $?
2903
02:17:50,629 --> 02:17:52,879
status again, it's still entered with 0.
2904
02:17:52,879 --> 02:17:55,879
And long story short, this is just a missed opportunity.
2905
02:17:55,879 --> 02:17:59,570
When something goes wrong, why don't I return a value other than 0?
2906
02:17:59,570 --> 02:18:01,070
0, by default, means success.
2907
02:18:01,070 --> 02:18:02,690
And it's always there automatically.
2908
02:18:02,690 --> 02:18:04,940
But you can control this.
2909
02:18:04,940 --> 02:18:11,160
I can go into my code here and return 1, else, if something works fine,
2910
02:18:11,160 --> 02:18:14,870
I can return 0, by default. And honestly, if I omit the return zero,
2911
02:18:14,870 --> 02:18:17,129
again, zero automatically is returned.
2912
02:18:17,129 --> 02:18:20,719
So let me go ahead and go be explicit, just so I know what's going on.
2913
02:18:20,719 --> 02:18:26,360
Make status again, ./status, and let's do this correctly with David.
2914
02:18:26,360 --> 02:18:28,520
Enter, hello, David.
2915
02:18:28,520 --> 02:18:32,059
Echo $?, zero.
2916
02:18:32,059 --> 02:18:33,270
So all is well.
2917
02:18:33,270 --> 02:18:38,240
But now if I do ./status and nothing, or multiple things, but not just David,
2918
02:18:38,240 --> 02:18:40,530
Enter, I get the error message.
2919
02:18:40,530 --> 02:18:45,230
But now if I do echo $?, voila, there now is the one.
2920
02:18:45,230 --> 02:18:47,330
So what does this now mean?
2921
02:18:47,330 --> 02:18:49,490
This is, in the graphical world, we would just
2922
02:18:49,490 --> 02:18:51,020
show something like this on the screen, which is
2923
02:18:51,020 --> 02:18:52,459
a little more informative to the user.
2924
02:18:52,459 --> 02:18:54,469
But even in the Linux world where you don't have a GUI,
2925
02:18:54,469 --> 02:18:56,690
necessarily, even for the programs we've written,
2926
02:18:56,690 --> 02:18:58,549
you can check these exit statuses.
2927
02:18:58,549 --> 02:19:01,070
And in fact, more comfortable, more advanced programmers,
2928
02:19:01,070 --> 02:19:03,889
when they write code that calls programs,
2929
02:19:03,889 --> 02:19:07,340
be it cowsay or anything else, you can encode,
2930
02:19:07,340 --> 02:19:11,030
check what the exit status is of a program, and then decide,
2931
02:19:11,030 --> 02:19:13,170
did my program work or did it not?
2932
02:19:13,170 --> 02:19:16,219
And now let's connect the final dots before we
2933
02:19:16,219 --> 02:19:19,070
adjourn for some fruit snacks.
2934
02:19:19,070 --> 02:19:22,100
Cryptography, namely one of the applications this week
2935
02:19:22,100 --> 02:19:24,770
via which you'll be able to send, if you will,
2936
02:19:24,770 --> 02:19:27,650
secret messages, and better yet, decrypt secret messages.
2937
02:19:27,650 --> 02:19:29,780
This will be in addition to perhaps analyzing
2938
02:19:29,780 --> 02:19:32,120
the readability of text using heuristics, like we
2939
02:19:32,120 --> 02:19:34,040
identified at the start of class two.
2940
02:19:34,040 --> 02:19:38,299
So cryptography is just the art, the science of encrypting information,
2941
02:19:38,299 --> 02:19:41,330
scrambling information so that if you have a secret message
2942
02:19:41,330 --> 02:19:45,980
to send in so-called plaintext, you can run it through some algorithm
2943
02:19:45,980 --> 02:19:49,910
and turn it into what's called ciphertext, thereby, encrypting it.
2944
02:19:49,910 --> 02:19:53,150
And only someone who knows what algorithm you've used
2945
02:19:53,150 --> 02:19:55,880
and what input you've used to the algorithm, theoretically,
2946
02:19:55,880 --> 02:19:59,880
can decrypt that process and convert it back to the original message.
2947
02:19:59,880 --> 02:20:03,030
So if we use our mental model from last week, here is a problem.
2948
02:20:03,030 --> 02:20:04,910
Here is an input and output.
2949
02:20:04,910 --> 02:20:08,120
The goal I claim here is to take some plain text, like the message
2950
02:20:08,120 --> 02:20:10,250
you want to send, think back to grade school
2951
02:20:10,250 --> 02:20:13,640
if you ever passed a note to a friend or to your crush saying, I love you,
2952
02:20:13,640 --> 02:20:16,910
it's a little awkward if the teacher or someone else intercepts the paper.
2953
02:20:16,910 --> 02:20:19,490
And in English, it just says, I love you, or whatever it is.
2954
02:20:19,490 --> 02:20:22,350
It'd be nice if you had at least encrypted it in some way.
2955
02:20:22,350 --> 02:20:25,220
But the other person needs to know what algorithm you used
2956
02:20:25,220 --> 02:20:27,230
and what inputs you use to that algorithm
2957
02:20:27,230 --> 02:20:31,100
so that, ultimately, they can decode the so-called ciphertext, which
2958
02:20:31,100 --> 02:20:32,040
is the output.
2959
02:20:32,040 --> 02:20:34,190
So what goes inside of the box today?
2960
02:20:34,190 --> 02:20:37,970
Well, an algorithm, as it relates to cryptography, is called a cipher.
2961
02:20:37,970 --> 02:20:41,390
And a cipher is a fancy name for an algorithm that encrypts text
2962
02:20:41,390 --> 02:20:43,250
from plaintext to ciphertext.
2963
02:20:43,250 --> 02:20:46,760
The catch is, there needs to be not just the algorithm,
2964
02:20:46,760 --> 02:20:48,750
there needs to be an input to it.
2965
02:20:48,750 --> 02:20:52,590
And so, for instance, you might draw the picture like this for the first time
2966
02:20:52,590 --> 02:20:53,090
today.
2967
02:20:53,090 --> 02:20:54,257
And we've seen this in code.
2968
02:20:54,257 --> 02:20:57,180
You can give multiple inputs or arguments to functions.
2969
02:20:57,180 --> 02:20:59,960
So in this black box, can you imagine passing in the message
2970
02:20:59,960 --> 02:21:02,510
you want to send, and then some secret.
2971
02:21:02,510 --> 02:21:05,300
So for instance, suppose that, the simplest
2972
02:21:05,300 --> 02:21:08,750
thing I could think of as a kid was instead of sending the letter A,
2973
02:21:08,750 --> 02:21:10,310
why don't I write the letter B?
2974
02:21:10,310 --> 02:21:13,070
Instead of the letter B, why don't I write the letter C?
2975
02:21:13,070 --> 02:21:16,280
So I can kind of shift the English alphabet by one space.
2976
02:21:16,280 --> 02:21:18,740
So A becomes B, B becomes C, dot, dot, dot,
2977
02:21:18,740 --> 02:21:21,690
Z becomes A. You can wrap around at the end.
2978
02:21:21,690 --> 02:21:24,120
And let's assume no punctuation in this part of the story.
2979
02:21:24,120 --> 02:21:29,420
So that's a very simple algorithm-- add a value to each letter
2980
02:21:29,420 --> 02:21:32,090
and send the value as the ciphertext.
2981
02:21:32,090 --> 02:21:35,540
And now the teacher, the classmate, they have to know that you use,
2982
02:21:35,540 --> 02:21:39,410
not only this rotational algorithm, also known as a Caesar cipher,
2983
02:21:39,410 --> 02:21:41,300
they also need to know what number you use.
2984
02:21:41,300 --> 02:21:45,200
Did you add 1 to every letter, 2 to every letter, 25 to every letter?
2985
02:21:45,200 --> 02:21:49,310
Now if they're super smart and probably not the young age in this story,
2986
02:21:49,310 --> 02:21:51,165
they could also just try all possibilities.
2987
02:21:51,165 --> 02:21:53,040
And that would be an attack on the algorithm.
2988
02:21:53,040 --> 02:21:55,310
This is not a sophisticated algorithm, but it's
2989
02:21:55,310 --> 02:21:56,970
enough to send a message in class.
2990
02:21:56,970 --> 02:21:58,940
So if the two inputs now are HI!
2991
02:21:58,940 --> 02:22:04,280
as the plain text message, and 1 as the so-called key, the secret number
2992
02:22:04,280 --> 02:22:06,950
that only you and the other person know, you
2993
02:22:06,950 --> 02:22:11,040
might be able to encrypt a message from one way to the other.
2994
02:22:11,040 --> 02:22:13,400
And so in this case, for instance, HI!
2995
02:22:13,400 --> 02:22:16,198
would become I-J-!.
2996
02:22:16,198 --> 02:22:17,990
In this version of the algorithm, we're not
2997
02:22:17,990 --> 02:22:19,823
going to bother with numbers or punctuation.
2998
02:22:19,823 --> 02:22:23,090
We'll only operate on A through Z, be it uppercase or lowercase.
2999
02:22:23,090 --> 02:22:28,250
So now if you were to receive a slip of paper in class with I-J on it,
3000
02:22:28,250 --> 02:22:31,290
you, the recipient, would know what it is
3001
02:22:31,290 --> 02:22:33,440
so long as you know that the sender used one,
3002
02:22:33,440 --> 02:22:36,500
because you just reverse the algorithm and you subtract one instead.
3003
02:22:36,500 --> 02:22:39,110
The teacher, they probably don't know what this means,
3004
02:22:39,110 --> 02:22:41,443
and they're not going to spend time hacking the message,
3005
02:22:41,443 --> 02:22:42,975
so it just looks scrambled to them.
3006
02:22:42,975 --> 02:22:44,600
And that's what we get from encryption.
3007
02:22:44,600 --> 02:22:47,430
Someone who intercepts it, be it in class or in the real world,
3008
02:22:47,430 --> 02:22:51,080
on the internet or anywhere else, can't actually figure out, ideally,
3009
02:22:51,080 --> 02:22:52,700
what it is you have sent.
3010
02:22:52,700 --> 02:22:55,130
The opposite, of course, is indeed called decryption,
3011
02:22:55,130 --> 02:22:56,300
but the process is the same.
3012
02:22:56,300 --> 02:22:58,370
We now pass in negative 1.
3013
02:22:58,370 --> 02:23:00,300
And so how about this?
3014
02:23:00,300 --> 02:23:02,840
Why don't we end with a demonstration here?
3015
02:23:02,840 --> 02:23:08,360
UIJT XBT DT50-- there's a bit of a tell there.
3016
02:23:08,360 --> 02:23:11,060
If we pass that in and do negative 1, well,
3017
02:23:11,060 --> 02:23:14,180
how do we get out the plaintext originally?
3018
02:23:14,180 --> 02:23:18,200
Well, if this is the ciphertext, and we subtract 1 from each letter,
3019
02:23:18,200 --> 02:23:28,010
I think U becomes T, I becomes H, J becomes I, T becomes S, X becomes W,
3020
02:23:28,010 --> 02:23:37,580
B becomes A, T becomes S, D becomes C, T becomes S, and this was, indeed, CS50.
3021
02:23:37,580 --> 02:23:40,250
Have a duck on your way out, and some snacks in the lobby.
3022
02:23:40,250 --> 02:23:42,350
[APPLAUSE]
3023
02:23:42,350 --> 02:23:43,850
[FILM ROLLING]
3024
02:23:43,850 --> 02:23:47,500
[MUSIC PLAYING]
3025
02:23:47,500 --> 02:24:19,000
253018
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.