Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:02,982
2
00:00:02,982 --> 00:00:06,461
[MUSIC PLAYING]
3
00:00:06,461 --> 00:01:12,600
4
00:01:12,600 --> 00:01:13,590
DAVID MALAN: All right.
5
00:01:13,590 --> 00:01:17,130
This is CS50, and this
is week 2 wherein we're
6
00:01:17,130 --> 00:01:20,610
going to take a look at a
lower level at how things work,
7
00:01:20,610 --> 00:01:24,120
and indeed, among the goals of the
course is this bottom-up understanding
8
00:01:24,120 --> 00:01:26,670
so that in a couple of weeks'
time, even a few years' time,
9
00:01:26,670 --> 00:01:29,920
when you encounter some new technology,
you'll be able to think back hopefully
10
00:01:29,920 --> 00:01:33,180
on some of this week's and this is
basic building blocks and primitives
11
00:01:33,180 --> 00:01:36,060
and really just deduce how
tomorrow's technologies work.
12
00:01:36,060 --> 00:01:37,685
But along the way, it's going to seem--
13
00:01:37,685 --> 00:01:40,727
it's going to be a little hard, perhaps,
to see the forest for the trees,
14
00:01:40,727 --> 00:01:41,380
so to speak.
15
00:01:41,380 --> 00:01:44,783
And so the goal at the end of the day
still is going to be problem-solving.
16
00:01:44,783 --> 00:01:47,700
And so we thought we'd begin today
with a look at some of the problems
17
00:01:47,700 --> 00:01:50,405
we'll talk about or
solve this coming week,
18
00:01:50,405 --> 00:01:53,280
and for that, we have some brave
volunteers who have already come up.
19
00:01:53,280 --> 00:01:58,320
If we could turn on some dramatic
lighting and meet today's volunteers.
20
00:01:58,320 --> 00:02:00,430
So on my left here, we have--
21
00:02:00,430 --> 00:02:00,930
ALEX: Hi.
22
00:02:00,930 --> 00:02:01,960
My name is Alex.
23
00:02:01,960 --> 00:02:05,340
I'm a first-year at the college and
I'm from Chapel Hill, North Carolina.
24
00:02:05,340 --> 00:02:07,080
DAVID MALAN: Welcome to Alex.
25
00:02:07,080 --> 00:02:09,180
And to Alex's right.
26
00:02:09,180 --> 00:02:10,050
SARAH: I'm Sarah.
27
00:02:10,050 --> 00:02:13,230
I'm from Toronto, Canada, and I'm also
a first-year student at the college.
28
00:02:13,230 --> 00:02:14,188
DAVID MALAN: Wonderful.
29
00:02:14,188 --> 00:02:15,869
Well, welcome to both Alex and Sarah.
30
00:02:15,869 --> 00:02:18,577
So one of the problems you'll
perhaps solve this week for problem
31
00:02:18,577 --> 00:02:22,442
set 2 is to analyze the reading
level of a body of text,
32
00:02:22,442 --> 00:02:25,650
whether someone reads at a first grade
level, second grade level, third grade
33
00:02:25,650 --> 00:02:28,570
level, all the way up
to 12 or 13 or beyond.
34
00:02:28,570 --> 00:02:32,250
What you perhaps never quite thought
about, certainly in terms of code,
35
00:02:32,250 --> 00:02:35,310
like how you would analyze
some text, some book and figure
36
00:02:35,310 --> 00:02:36,750
out what reading level is it at.
37
00:02:36,750 --> 00:02:40,330
And yet, surely our teachers growing up
knew or had an intuitive sense of this.
38
00:02:40,330 --> 00:02:42,450
So let's consider some sample text.
39
00:02:42,450 --> 00:02:45,960
For instance, Alex, what
have you been reading lately?
40
00:02:45,960 --> 00:02:52,502
ALEX: One fish, two fish,
red fish, blue fish.
41
00:02:52,502 --> 00:02:53,460
DAVID MALAN: Wonderful.
42
00:02:53,460 --> 00:02:58,890
So given that, what grade level would
you say Alex is currently reading at?
43
00:02:58,890 --> 00:03:01,500
Feel free to just shout it out.
44
00:03:01,500 --> 00:03:02,730
First, first?
45
00:03:02,730 --> 00:03:07,200
So indeed, you'll see this week, if
you run your code on Alex's text,
46
00:03:07,200 --> 00:03:10,410
it actually turns out he reads
below a first grade reading level.
47
00:03:10,410 --> 00:03:12,400
But why might that be?
48
00:03:12,400 --> 00:03:16,410
What might your intuition
be for why we've
49
00:03:16,410 --> 00:03:19,020
accused Alex of reading at this level?
50
00:03:19,020 --> 00:03:20,990
Feel free to shout out.
51
00:03:20,990 --> 00:03:21,490
Yeah.
52
00:03:21,490 --> 00:03:24,520
So very few syllables, short
words, short sentences.
53
00:03:24,520 --> 00:03:27,828
And so there's some heuristics, perhaps,
we can infer from that short text,
54
00:03:27,828 --> 00:03:30,370
that that probably means that
it's best for younger children.
55
00:03:30,370 --> 00:03:33,370
Now Sarah, by contrast,
what have you been reading?
56
00:03:33,370 --> 00:03:35,470
SARAH: Mr. And Mrs. Dursley of Number.
57
00:03:35,470 --> 00:03:38,890
Four Privet Drive were
proud to say that they were
58
00:03:38,890 --> 00:03:41,050
perfectly normal, thank you very much.
59
00:03:41,050 --> 00:03:43,480
They were the last people
you'd expect to be involved
60
00:03:43,480 --> 00:03:46,390
in anything strange or
mysterious because they just
61
00:03:46,390 --> 00:03:47,952
didn't hold with much nonsense.
62
00:03:47,952 --> 00:03:48,910
DAVID MALAN: All right.
63
00:03:48,910 --> 00:03:50,950
Now irrespective of what
grade you were in when
64
00:03:50,950 --> 00:03:53,283
you might have read that text,
what grade level to Sarah
65
00:03:53,283 --> 00:03:55,230
seemed to be reading at?
66
00:03:55,230 --> 00:03:57,570
So eighth grade, second grade.
67
00:03:57,570 --> 00:03:58,080
OK.
68
00:03:58,080 --> 00:04:01,125
So hearing a bit of everything, so
with that, at least according to code,
69
00:04:01,125 --> 00:04:03,240
it would actually be seventh grade.
70
00:04:03,240 --> 00:04:05,130
And what might the intuition there be?
71
00:04:05,130 --> 00:04:07,620
Why is that a higher grade
level even though we might
72
00:04:07,620 --> 00:04:09,917
disagree exactly which grade it is?
73
00:04:09,917 --> 00:04:11,250
AUDIENCE: Complicated sentences.
74
00:04:11,250 --> 00:04:12,000
DAVID MALAN: Yeah.
75
00:04:12,000 --> 00:04:14,218
So complicated sentences,
longer sentences.
76
00:04:14,218 --> 00:04:17,010
So indeed a lot more words were
being spoken by Sarah because there
77
00:04:17,010 --> 00:04:18,519
was so much more there on the page.
78
00:04:18,519 --> 00:04:22,079
So we'll translate these ideas
this coming week in problem set 2,
79
00:04:22,079 --> 00:04:25,170
if you tackle this one, through
code so that you can ultimately
80
00:04:25,170 --> 00:04:26,910
infer things of these quantitatively.
81
00:04:26,910 --> 00:04:29,190
But to do so, we're going
to have to understand text.
82
00:04:29,190 --> 00:04:32,610
So let's first thank our volunteers and
then we'll dive in to that lower level.
83
00:04:32,610 --> 00:04:35,337
[APPLAUSE]
84
00:04:35,337 --> 00:04:39,910
85
00:04:39,910 --> 00:04:40,600
Sorry.
86
00:04:40,600 --> 00:04:41,490
You can keep those.
87
00:04:41,490 --> 00:04:42,222
SARAH: Oh, OK.
88
00:04:42,222 --> 00:04:43,180
DAVID MALAN: All right.
89
00:04:43,180 --> 00:04:45,970
So besides that, let's
consider one other body of text
90
00:04:45,970 --> 00:04:48,010
perhaps that you might
see this week, which
91
00:04:48,010 --> 00:04:50,210
is namely a little something like this.
92
00:04:50,210 --> 00:04:53,860
What I have here on the screen is what
we'll start calling today ciphertext.
93
00:04:53,860 --> 00:04:56,530
It's the result of encrypting
some piece of information.
94
00:04:56,530 --> 00:05:00,190
And encryption, or more generally,
the art and science of cryptography
95
00:05:00,190 --> 00:05:00,908
is all around us.
96
00:05:00,908 --> 00:05:03,700
It's what you're using on the web,
on your phones, with your banks.
97
00:05:03,700 --> 00:05:07,000
And anything that tries to keep
data secure is using encryption.
98
00:05:07,000 --> 00:05:10,390
But there's going to be different levels
of encryption-- strong encryption,
99
00:05:10,390 --> 00:05:11,140
weak encryption.
100
00:05:11,140 --> 00:05:14,590
And what you see here on the
screen isn't all that strong,
101
00:05:14,590 --> 00:05:18,190
but we'll see later today how we
might decrypt this and actually reveal
102
00:05:18,190 --> 00:05:22,030
what the plaintext is that
corresponds to that ciphertext.
103
00:05:22,030 --> 00:05:25,670
But in order to do so, we have to
start taking off some training wheels,
104
00:05:25,670 --> 00:05:26,197
so to speak.
105
00:05:26,197 --> 00:05:28,030
And believe it or not,
even though your time
106
00:05:28,030 --> 00:05:30,100
would see this past
week for the first time,
107
00:05:30,100 --> 00:05:32,230
probably, might have
been rather in the weeds.
108
00:05:32,230 --> 00:05:36,072
And much more complicated seemingly
than C, it turns out that along the way,
109
00:05:36,072 --> 00:05:37,780
we have been providing
and we'll continue
110
00:05:37,780 --> 00:05:39,760
to provide certain training wheels.
111
00:05:39,760 --> 00:05:42,190
For instance, the CS50
Library is one of them,
112
00:05:42,190 --> 00:05:46,240
and even some of the explanations
we give of topics for now
113
00:05:46,240 --> 00:05:49,120
in these early weeks will be somewhat
simplified-- abstracted away,
114
00:05:49,120 --> 00:05:49,730
if you will.
115
00:05:49,730 --> 00:05:51,730
But the goal ultimately
is for you to understand
116
00:05:51,730 --> 00:05:55,060
each and every one of those details
so that after CS50, you really
117
00:05:55,060 --> 00:05:58,210
can stand on your own and
understand and wrap your mind
118
00:05:58,210 --> 00:06:01,040
around any future technologies as well.
119
00:06:01,040 --> 00:06:05,318
So let's consider first the very first
program with which we began last week,
120
00:06:05,318 --> 00:06:06,110
which was this one.
121
00:06:06,110 --> 00:06:09,215
So "hello, world" in C. At the end
of the day, it was really the printf
122
00:06:09,215 --> 00:06:11,590
function that was doing the
interesting part of the work,
123
00:06:11,590 --> 00:06:14,890
but there was a lot of technical
stuff above and below it.
124
00:06:14,890 --> 00:06:19,900
The curly braces, the parentheses,
words like void and include, and then
125
00:06:19,900 --> 00:06:21,730
of course, the angled brackets and more.
126
00:06:21,730 --> 00:06:25,870
But at the end of the day, we needed
to convert that source code in C
127
00:06:25,870 --> 00:06:30,190
to machine code, the 0's and 1's in
binary that the computer understood.
128
00:06:30,190 --> 00:06:32,500
And to do that, of course, we ran--
129
00:06:32,500 --> 00:06:33,700
we compiled the code.
130
00:06:33,700 --> 00:06:37,400
We ran make and then we were able
to actually run that code there.
131
00:06:37,400 --> 00:06:39,370
So let me actually go
over here to VS Code
132
00:06:39,370 --> 00:06:44,510
and really quickly recreate that hello.c
pretty much by transcribing the same.
133
00:06:44,510 --> 00:06:51,970
So I might have here include
stdio.h, int main void.
134
00:06:51,970 --> 00:06:54,460
And then in here, I had
quite simply, hello,
135
00:06:54,460 --> 00:06:57,430
comma, world with my
backslash, endquotes, and more.
136
00:06:57,430 --> 00:07:01,693
Now last time, to compile this, I indeed
ran make hello, followed by Enter.
137
00:07:01,693 --> 00:07:03,860
Hopefully you see no errors
and that's a good thing.
138
00:07:03,860 --> 00:07:05,980
And if you do dot,
slash, hello, you see,
139
00:07:05,980 --> 00:07:07,840
in fact, the results of that program.
140
00:07:07,840 --> 00:07:11,470
But it turns out that make
is not actually a compiler
141
00:07:11,470 --> 00:07:12,950
as I alluded to last week.
142
00:07:12,950 --> 00:07:15,520
It's a program that
clearly makes your program,
143
00:07:15,520 --> 00:07:19,030
but it itself just automates the
process of using an actual compiler.
144
00:07:19,030 --> 00:07:21,290
And there's lots of different
compilers out there,
145
00:07:21,290 --> 00:07:24,190
and the one that it's actually
using underneath the hood
146
00:07:24,190 --> 00:07:27,640
is a little something
called Clang for C Language.
147
00:07:27,640 --> 00:07:30,190
And Clang is a pretty
popular compiler nowadays.
148
00:07:30,190 --> 00:07:33,520
There's another one that's been
around for ages called GCC,
149
00:07:33,520 --> 00:07:36,330
but these are just specific
names for types of compilers
150
00:07:36,330 --> 00:07:38,830
that different people, different
companies, different groups
151
00:07:38,830 --> 00:07:40,310
have actually created.
152
00:07:40,310 --> 00:07:44,800
But if you use in week 1 a
compiler yourself manually,
153
00:07:44,800 --> 00:07:47,170
you have to understand a
little more about what's
154
00:07:47,170 --> 00:07:50,703
going on because it's even more
cryptic than what just make alone.
155
00:07:50,703 --> 00:07:53,620
So in fact, let me go back to my
terminal window here, let me go ahead
156
00:07:53,620 --> 00:07:58,690
and clear the screen a little bit
and just run really the raw compiler
157
00:07:58,690 --> 00:07:59,360
command.
158
00:07:59,360 --> 00:08:01,450
So what make is
automating for me let me,
159
00:08:01,450 --> 00:08:03,620
actually do this manually
for just a moment.
160
00:08:03,620 --> 00:08:10,450
So if I want to compile hello.c into
an executable program I can run,
161
00:08:10,450 --> 00:08:12,220
I can do this.
162
00:08:12,220 --> 00:08:17,110
clang, space, hello.c, and then Enter.
163
00:08:17,110 --> 00:08:20,980
And now there's no output, which is
a good thing in this case, no errors,
164
00:08:20,980 --> 00:08:22,010
but notice this.
165
00:08:22,010 --> 00:08:25,450
If I go ahead and type
ls, it turns out there's
166
00:08:25,450 --> 00:08:32,140
a file that's been created suddenly in
my current folder weirdly called a.out.
167
00:08:32,140 --> 00:08:33,580
That stands for Assembler Output.
168
00:08:33,580 --> 00:08:35,980
And long story short, that's
actually the default name
169
00:08:35,980 --> 00:08:39,440
of a program that's created when
you just run Clang by itself.
170
00:08:39,440 --> 00:08:41,830
Now that's a pretty
bad name for a program
171
00:08:41,830 --> 00:08:44,000
because it doesn't
describe what it does.
172
00:08:44,000 --> 00:08:49,870
So better would be here to perhaps do,
well, instead of a.out, which, yes,
173
00:08:49,870 --> 00:08:53,950
still prints hello.world, but isn't
really a clearly-named program,
174
00:08:53,950 --> 00:08:55,420
it'd be nice to name this hello.
175
00:08:55,420 --> 00:08:56,240
So what could I do?
176
00:08:56,240 --> 00:08:59,740
I could do like we learned last week--
well, I could rename a.out to hello
177
00:08:59,740 --> 00:09:01,820
by using Linux's mv command.
178
00:09:01,820 --> 00:09:04,480
So I'm going to move
a.out to become hello.
179
00:09:04,480 --> 00:09:06,370
But that, too, seems kind of tedious.
180
00:09:06,370 --> 00:09:07,720
Now I have three steps.
181
00:09:07,720 --> 00:09:10,750
Like write my code, compile
my code, and then rename it
182
00:09:10,750 --> 00:09:12,190
before I can even run it.
183
00:09:12,190 --> 00:09:13,580
We can do better than that.
184
00:09:13,580 --> 00:09:15,580
And so it turns out
that certain commands
185
00:09:15,580 --> 00:09:18,220
like clang support what
we're going to start today
186
00:09:18,220 --> 00:09:20,380
calling command line arguments.
187
00:09:20,380 --> 00:09:24,010
A command line argument, unlike
an argument to a function,
188
00:09:24,010 --> 00:09:27,040
is just an additional word
or key phrase that you
189
00:09:27,040 --> 00:09:30,400
type after a command at
your prompt in your terminal
190
00:09:30,400 --> 00:09:33,440
window that just modifies
the behavior of that command.
191
00:09:33,440 --> 00:09:35,600
It configures it a
little more specifically.
192
00:09:35,600 --> 00:09:39,220
So what you're seeing here on the screen
is some of a better command with which
193
00:09:39,220 --> 00:09:45,220
to run clang so that now I can specify
the output of this command per this o.
194
00:09:45,220 --> 00:09:46,610
So do what I mean by that?
195
00:09:46,610 --> 00:09:48,943
Well, let me go ahead and
clear my terminal window again
196
00:09:48,943 --> 00:09:54,955
and more explicitly type clang
-o hello hello.c and then Enter.
197
00:09:54,955 --> 00:09:57,580
Nothing, again, appears to happen,
but that's a good thing when
198
00:09:57,580 --> 00:10:02,860
you see no errors and now the program
I just created is indeed called Hello.
199
00:10:02,860 --> 00:10:07,280
So it achieves really the same
exact effect as make did, but what.
200
00:10:07,280 --> 00:10:09,820
I don't have to do with make
is type and remember something
201
00:10:09,820 --> 00:10:11,075
as long as this command.
202
00:10:11,075 --> 00:10:12,700
And this, too, is a bit of a white lie.
203
00:10:12,700 --> 00:10:16,420
It turns out, we have preconfigured
VS Code in the cloud for you
204
00:10:16,420 --> 00:10:21,310
to also use some other features
of Clang that would be even more
205
00:10:21,310 --> 00:10:22,840
tedious for you to write yourselves.
206
00:10:22,840 --> 00:10:28,130
And so really, this is why we distill
this as ultimately just running make.
207
00:10:28,130 --> 00:10:31,900
So let me pause here to see first if
there's any questions on what I've
208
00:10:31,900 --> 00:10:34,540
done by taking my very
first program in C
209
00:10:34,540 --> 00:10:37,720
and just now compiling it first
with make, but then starting over
210
00:10:37,720 --> 00:10:40,780
and now manually compiling
it with clang with what
211
00:10:40,780 --> 00:10:44,500
we'll call command line
arguments. -o, space, hello,
212
00:10:44,500 --> 00:10:46,820
and then the name of the file.
213
00:10:46,820 --> 00:10:47,320
Yeah?
214
00:10:47,320 --> 00:10:48,780
AUDIENCE: What is a.out?
215
00:10:48,780 --> 00:10:49,530
DAVID MALAN: Yeah.
216
00:10:49,530 --> 00:10:51,870
So a.out is a historical name.
217
00:10:51,870 --> 00:10:55,240
It refers to assembler
output-- more on that soon.
218
00:10:55,240 --> 00:10:58,080
And it's just the default file
name that you get automatically
219
00:10:58,080 --> 00:11:01,350
if you just run the compiler
on any file so that you
220
00:11:01,350 --> 00:11:02,970
have just a standard name for it.
221
00:11:02,970 --> 00:11:05,213
But it's not a very well-named program.
222
00:11:05,213 --> 00:11:07,380
Instead of running Microsoft
Word on your Mac or PC,
223
00:11:07,380 --> 00:11:09,880
it would be like
double-clicking on a.out.
224
00:11:09,880 --> 00:11:11,880
So instead with these
command line arguments,
225
00:11:11,880 --> 00:11:17,370
you can customize the output of Clang
and call it hello or anything you want.
226
00:11:17,370 --> 00:11:23,020
Other questions on what I've done
here with Clang itself, the compiler?
227
00:11:23,020 --> 00:11:23,520
Yeah?
228
00:11:23,520 --> 00:11:25,510
AUDIENCE: What is -o?
229
00:11:25,510 --> 00:11:26,565
DAVID MALAN: So -o--
230
00:11:26,565 --> 00:11:29,440
and you would only know this from
reading the manual, taking a class,
231
00:11:29,440 --> 00:11:30,500
means output.
232
00:11:30,500 --> 00:11:35,890
So -o means change Clang's
output to be a file called hello
233
00:11:35,890 --> 00:11:38,680
instead of the default, which is a.out.
234
00:11:38,680 --> 00:11:42,400
And this, too, is, again, a detail you
would have to look up on a web page,
235
00:11:42,400 --> 00:11:44,810
read the manual, hear someone
like me tell you about it.
236
00:11:44,810 --> 00:11:46,893
And in fact, there's even
more than these options,
237
00:11:46,893 --> 00:11:48,890
but we'll just scratch the surface here.
238
00:11:48,890 --> 00:11:49,390
All right.
239
00:11:49,390 --> 00:11:53,530
So if we now know this, what more is
actually happening underneath the hood?
240
00:11:53,530 --> 00:11:57,250
Well, let's take a closer look at
not just this version of my code,
241
00:11:57,250 --> 00:12:01,190
but my slightly more
complicated version last week,
242
00:12:01,190 --> 00:12:03,430
which looked a little
something like this, wherein
243
00:12:03,430 --> 00:12:07,330
I added in some dynamic input from the
user so I could say not hello, world
244
00:12:07,330 --> 00:12:11,810
to everyone, but hello, David or hello
to whoever actually runs this program.
245
00:12:11,810 --> 00:12:15,880
So in fact, let me go ahead and
change my code here in VS Code just
246
00:12:15,880 --> 00:12:17,770
to match that same code from last week.
247
00:12:17,770 --> 00:12:19,190
So no new code yet.
248
00:12:19,190 --> 00:12:22,820
I'm just going to, in a moment,
compile it in a slightly different way.
249
00:12:22,820 --> 00:12:29,020
So I did last week's string, I think,
answer equals string, quote-unquote,
250
00:12:29,020 --> 00:12:30,100
"What's your name?"
251
00:12:30,100 --> 00:12:31,540
Just like in Scratch.
252
00:12:31,540 --> 00:12:35,920
And then down here, instead of doing
world, I initially wrote answer,
253
00:12:35,920 --> 00:12:37,450
but that didn't go well.
254
00:12:37,450 --> 00:12:41,530
What did I ultimately do instead
to print out hello, David or hello,
255
00:12:41,530 --> 00:12:42,940
so-and-so?
256
00:12:42,940 --> 00:12:44,722
Yeah?
257
00:12:44,722 --> 00:12:45,680
Sorry, a little louder?
258
00:12:45,680 --> 00:12:46,430
AUDIENCE: %s?
259
00:12:46,430 --> 00:12:50,478
DAVID MALAN: Yeah, so %s, the so-called
format code that printf just knows how
260
00:12:50,478 --> 00:12:51,020
to deal with.
261
00:12:51,020 --> 00:12:52,470
And I had to add one other thing.
262
00:12:52,470 --> 00:12:54,350
Someone else besides %s--
263
00:12:54,350 --> 00:12:54,850
yeah?
264
00:12:54,850 --> 00:12:56,050
AUDIENCE: The name of the variable.
265
00:12:56,050 --> 00:12:58,870
DAVID MALAN: The name of the variable
that I want to plug into that
266
00:12:58,870 --> 00:13:00,190
placeholder %s.
267
00:13:00,190 --> 00:13:01,630
And in this case, it's answer.
268
00:13:01,630 --> 00:13:04,363
Now let me make one refinement
only because now we're in week 2
269
00:13:04,363 --> 00:13:06,530
and we're going to start
writing more lines of code,
270
00:13:06,530 --> 00:13:10,360
even though Scratch called the
return value of the ask puzzle piece,
271
00:13:10,360 --> 00:13:11,560
answer always.
272
00:13:11,560 --> 00:13:14,480
And see, we have full control over
what our variables are called.
273
00:13:14,480 --> 00:13:17,410
And now it's probably good not
to just generically always call
274
00:13:17,410 --> 00:13:19,870
my variable answer if
I'm using get_string.
275
00:13:19,870 --> 00:13:21,050
Let's call it what it is.
276
00:13:21,050 --> 00:13:23,680
So this is now just a matter
of style, if you will.
277
00:13:23,680 --> 00:13:26,620
Let me change the variable
to be name just so
278
00:13:26,620 --> 00:13:29,980
that it's a little clearer
to me, to you, to a TF or TA
279
00:13:29,980 --> 00:13:34,000
exactly what that variable represents
instead of more generically answer.
280
00:13:34,000 --> 00:13:37,030
All right, so that said, let me
go down to my terminal window,
281
00:13:37,030 --> 00:13:41,050
and last week again, I ran make to
compile this exact same program.
282
00:13:41,050 --> 00:13:43,270
Now, though, let me go
ahead and just use clang.
283
00:13:43,270 --> 00:13:45,490
So clang -o--
284
00:13:45,490 --> 00:13:47,500
I'll still call this version hello--
285
00:13:47,500 --> 00:13:49,330
space, hello.c.
286
00:13:49,330 --> 00:13:51,080
So exact same command as before.
287
00:13:51,080 --> 00:13:54,640
The only thing that's different is I've
added a couple of more lines of code
288
00:13:54,640 --> 00:13:56,330
to get the user's input.
289
00:13:56,330 --> 00:13:59,960
Let me hit Enter, and now,
darn it, our first error.
290
00:13:59,960 --> 00:14:02,750
So output from clang and
make is not a good thing,
291
00:14:02,750 --> 00:14:05,420
and here, we're seeing
something particularly cryptic.
292
00:14:05,420 --> 00:14:09,010
So something in function
'main--' undefined reference
293
00:14:09,010 --> 00:14:13,480
to 'get_string,' string and then
linker command failed with exit code 1.
294
00:14:13,480 --> 00:14:16,540
So there's actually a lot of jargon
in there that will tease apart today,
295
00:14:16,540 --> 00:14:20,338
but my hint is that clearly my problem's
in main, although that's not surprising
296
00:14:20,338 --> 00:14:22,130
because there's nothing
else going on here.
297
00:14:22,130 --> 00:14:26,830
get_string is an issue, and the issue
is that it's an undefined reference.
298
00:14:26,830 --> 00:14:28,990
And yet, notice, I was pretty good.
299
00:14:28,990 --> 00:14:32,920
I added the CS50 header file
and I said last week that that's
300
00:14:32,920 --> 00:14:35,920
enough to teach the compiler
that functions exist,
301
00:14:35,920 --> 00:14:39,070
but the problem is that even
though this does, in fact,
302
00:14:39,070 --> 00:14:43,090
teach Clang that get_string
exists, it is not
303
00:14:43,090 --> 00:14:47,530
sufficient information for Clang to go
find on the hard drive of the computer
304
00:14:47,530 --> 00:14:51,860
the 0's and 1's that actually
implement get_string itself.
305
00:14:51,860 --> 00:14:54,250
So in other words, this
include line, per last week,
306
00:14:54,250 --> 00:14:55,333
is a little bit of a hint.
307
00:14:55,333 --> 00:14:59,560
It's a teaser to Clang that you're about
to see and use this function somewhere.
308
00:14:59,560 --> 00:15:05,710
But if you actually want to use the 0's
and 1's that CS50 wrote some time ago
309
00:15:05,710 --> 00:15:08,740
and bake those into your
program so your program actually
310
00:15:08,740 --> 00:15:11,470
knows how to get input
from the user, well then,
311
00:15:11,470 --> 00:15:15,440
I'm going to have to go ahead and
run a slightly different command.
312
00:15:15,440 --> 00:15:16,250
So let me do this.
313
00:15:16,250 --> 00:15:18,917
Let me clear my terminal window
just get rid of that distraction
314
00:15:18,917 --> 00:15:23,020
and let me propose now that
we run this command instead.
315
00:15:23,020 --> 00:15:28,510
Almost the same as before, clang
-o, space, hello, then hello.c,
316
00:15:28,510 --> 00:15:34,210
but with one additional command line
argument at the end, and this is a -l--
317
00:15:34,210 --> 00:15:35,050
not a number 1.
318
00:15:35,050 --> 00:15:39,370
So -lcs with no space
in between those two.
319
00:15:39,370 --> 00:15:43,540
Now the l is going to result in all
of those 0's and 1's that actually
320
00:15:43,540 --> 00:15:48,350
were in by CS50 being linked into your
code, your few lines of code or mine
321
00:15:48,350 --> 00:15:48,850
here.
322
00:15:48,850 --> 00:15:53,530
But that's the second step that the
compiler requires in order to know how
323
00:15:53,530 --> 00:15:58,537
to actually execute and rather
compile your code and CS50's.
324
00:15:58,537 --> 00:16:00,370
And CS50 is not the
only one that does this.
325
00:16:00,370 --> 00:16:04,750
If you use any third party library in
C that doesn't come with the language,
326
00:16:04,750 --> 00:16:08,333
you would do -l such
and such where whoever--
327
00:16:08,333 --> 00:16:10,000
however they've named their own library.
328
00:16:10,000 --> 00:16:14,298
But you don't have to do it for built in
things like we've been using thus far.
329
00:16:14,298 --> 00:16:16,090
All right, so let me
go ahead and try this.
330
00:16:16,090 --> 00:16:19,000
I'll go back to VS Code
here, and let me go ahead now
331
00:16:19,000 --> 00:16:23,620
and run clang -o hello, then hello.c.
332
00:16:23,620 --> 00:16:26,560
And now instead of just
hitting Enter, -lcs50
333
00:16:26,560 --> 00:16:29,590
with no space between the
l and the cs50, Enter.
334
00:16:29,590 --> 00:16:33,310
Now nothing bad happens,
and now I can do ./hello.
335
00:16:33,310 --> 00:16:34,180
What's your name?
336
00:16:34,180 --> 00:16:37,633
I'll type in David, Enter,
and now we see hello, David.
337
00:16:37,633 --> 00:16:40,300
Now honestly, this is where we're
really getting into the weeds,
338
00:16:40,300 --> 00:16:42,130
and now this is taking--
339
00:16:42,130 --> 00:16:45,730
this is really just adding nuisance to
the process of compiling and running
340
00:16:45,730 --> 00:16:46,460
your code.
341
00:16:46,460 --> 00:16:49,960
And so the reality is, even though
this is indeed what is happening,
342
00:16:49,960 --> 00:16:51,880
this is why we used last
week and we're going
343
00:16:51,880 --> 00:16:55,240
to continue using this week
onward make because it just
344
00:16:55,240 --> 00:16:57,130
automates that whole process for you.
345
00:16:57,130 --> 00:17:00,130
But it's ideal to understand what's
going wrong because any of the error
346
00:17:00,130 --> 00:17:02,770
messages you saw for problem
set 1, any of the error messages
347
00:17:02,770 --> 00:17:05,859
you see for the next few weeks
probably aren't coming from make,
348
00:17:05,859 --> 00:17:08,560
they're coming from
Clang underneath the hood
349
00:17:08,560 --> 00:17:10,780
because make is just
automating the process.
350
00:17:10,780 --> 00:17:14,060
But with make, you literally just write
make and then the name of the program,
351
00:17:14,060 --> 00:17:17,560
you don't have to worry about any
of those command line arguments.
352
00:17:17,560 --> 00:17:22,240
Questions, then, on compiling
with dash -lcs50 or anything else?
353
00:17:22,240 --> 00:17:23,043
Yeah?
354
00:17:23,043 --> 00:17:24,960
AUDIENCE: What is the
benefit of [INAUDIBLE]??
355
00:17:24,960 --> 00:17:26,220
DAVID MALAN: Sorry,
what is the benefit of--
356
00:17:26,220 --> 00:17:27,512
AUDIENCE: Using Clang manually.
357
00:17:27,512 --> 00:17:30,000
DAVID MALAN: What is the
benefit of using Clang manually?
358
00:17:30,000 --> 00:17:30,870
None, really.
359
00:17:30,870 --> 00:17:33,450
In fact, all main is doing
is just say-- make is doing
360
00:17:33,450 --> 00:17:35,055
is saving us some keystrokes.
361
00:17:35,055 --> 00:17:37,680
If you prefer, though, and you
just like to be more in control,
362
00:17:37,680 --> 00:17:41,130
you can totally run Clang manually if
you remember the various command line
363
00:17:41,130 --> 00:17:42,090
arguments.
364
00:17:42,090 --> 00:17:42,660
Yeah?
365
00:17:42,660 --> 00:17:47,335
AUDIENCE: So why did you
have to explain [INAUDIBLE]
366
00:17:47,335 --> 00:17:48,210
DAVID MALAN: Exactly.
367
00:17:48,210 --> 00:17:49,560
Why did I have to explain--
368
00:17:49,560 --> 00:17:53,220
that is, provide a hint to CS50
with the cs50.h header file,
369
00:17:53,220 --> 00:17:55,470
but I didn't have to do
that with standardio.h?
370
00:17:55,470 --> 00:17:56,400
Just because.
371
00:17:56,400 --> 00:18:00,990
standardio.h comes with C, just
like a few other libraries come
372
00:18:00,990 --> 00:18:03,060
with C that we'll start seeing today.
373
00:18:03,060 --> 00:18:05,410
CS50, though, is not
built into C everywhere,
374
00:18:05,410 --> 00:18:07,890
and so you do have to
explicitly add that one there.
375
00:18:07,890 --> 00:18:08,767
Yeah?
376
00:18:08,767 --> 00:18:11,970
AUDIENCE: Can you define what
command line argument [INAUDIBLE]??
377
00:18:11,970 --> 00:18:15,210
DAVID MALAN: A command line
argument is a word or phrase
378
00:18:15,210 --> 00:18:17,740
that you type at the command line--
379
00:18:17,740 --> 00:18:22,200
a.k.a., your terminal-- in order to
influence the behavior of a program.
380
00:18:22,200 --> 00:18:22,742
AUDIENCE: OK.
381
00:18:22,742 --> 00:18:24,430
So it's a term for
whatever you're giving it.
382
00:18:24,430 --> 00:18:24,565
DAVID MALAN: Yeah.
383
00:18:24,565 --> 00:18:25,660
It changes the defaults.
384
00:18:25,660 --> 00:18:27,790
In our GUI world,
Graphical User Interface,
385
00:18:27,790 --> 00:18:29,680
you and I would probably
click some boxes,
386
00:18:29,680 --> 00:18:32,350
we would select some menu
options to configure a program
387
00:18:32,350 --> 00:18:33,460
to behave in the same way.
388
00:18:33,460 --> 00:18:36,850
At a command line interface, you have
to just say everything all at once,
389
00:18:36,850 --> 00:18:39,600
and that's why we have
command line arguments.
390
00:18:39,600 --> 00:18:40,605
Yeah?
391
00:18:40,605 --> 00:18:43,243
AUDIENCE: Is make [INAUDIBLE]
392
00:18:43,243 --> 00:18:43,910
DAVID MALAN: No.
393
00:18:43,910 --> 00:18:45,470
Make is not just for CS50.
394
00:18:45,470 --> 00:18:50,480
It's used globally in any project
really nowadays using C, C++,
395
00:18:50,480 --> 00:18:52,020
even other languages as well.
396
00:18:52,020 --> 00:18:54,140
In fact, most every command
you see in this class,
397
00:18:54,140 --> 00:18:57,530
unless it has 5-0 at the
end of it, is globally used.
398
00:18:57,530 --> 00:19:00,758
Only those-- a suffix with 50
are, indeed, course-specific.
399
00:19:00,758 --> 00:19:03,050
And even those we'll gradually
take training wheels off
400
00:19:03,050 --> 00:19:06,890
of so that exactly what those
commands are doing as well.
401
00:19:06,890 --> 00:19:09,053
All right, so what is
it that we've just done?
402
00:19:09,053 --> 00:19:11,720
Everything we've just done, of
course, I keep calling compiling,
403
00:19:11,720 --> 00:19:13,580
but let's just go down
one rabbit hole so
404
00:19:13,580 --> 00:19:15,967
that you understand that
when you compile code,
405
00:19:15,967 --> 00:19:18,050
there's actually a whole
bunch of steps, happening
406
00:19:18,050 --> 00:19:21,800
and this is going to enable a lot
of features, like companies can
407
00:19:21,800 --> 00:19:26,060
write code and then convert it
to run it on Macs and PCs alike
408
00:19:26,060 --> 00:19:27,240
or phones or the like.
409
00:19:27,240 --> 00:19:30,320
So it's not just a matter of
converting source code to machine code,
410
00:19:30,320 --> 00:19:34,610
there's actually four steps involved
in what you and I, as of last week,
411
00:19:34,610 --> 00:19:35,840
know as compiling.
412
00:19:35,840 --> 00:19:39,033
And these aren't terms that you'll
have to keep in mind constantly
413
00:19:39,033 --> 00:19:41,450
because again, we're going to
abstract a lot of this away.
414
00:19:41,450 --> 00:19:43,492
But just so we've gone
down the rabbit hole once,
415
00:19:43,492 --> 00:19:45,890
let's consider each of
these four steps that
416
00:19:45,890 --> 00:19:49,850
have been happening for you for a
week automatically, the first of which
417
00:19:49,850 --> 00:19:51,080
is called preprocessing.
418
00:19:51,080 --> 00:19:52,260
So what does this mean?
419
00:19:52,260 --> 00:19:54,450
Well, let's consider that
same program as before.
420
00:19:54,450 --> 00:19:57,830
So notice that two of the lines
of code start with a hash mark.
421
00:19:57,830 --> 00:20:02,338
That is a special symbol in C, and it's
a so-called preprocessor directive.
422
00:20:02,338 --> 00:20:04,130
You don't need to
memorize terms like that,
423
00:20:04,130 --> 00:20:07,005
but it just means that it's a little
different from every other line.
424
00:20:07,005 --> 00:20:08,960
And anything with a
hash symbol here should
425
00:20:08,960 --> 00:20:13,315
be preprocessed-- that is, analyzed
initially before anything else happens.
426
00:20:13,315 --> 00:20:17,100
So let's consider these two lines
up top, what exactly is happening.
427
00:20:17,100 --> 00:20:19,220
Well, it turns out with
these two lines, you
428
00:20:19,220 --> 00:20:23,390
have two header files, of
course, cs50.h and stdio.h.
429
00:20:23,390 --> 00:20:27,980
Where are those files, because
they've never been in VS Code for you,
430
00:20:27,980 --> 00:20:28,550
seemingly.
431
00:20:28,550 --> 00:20:31,940
If you type LS-- if you open up
the File Explorer in the GUI,
432
00:20:31,940 --> 00:20:35,900
you have never seen,
probably, cs50.h or stdio.h.
433
00:20:35,900 --> 00:20:39,620
They just work, but that's
because there's a folder somewhere
434
00:20:39,620 --> 00:20:43,340
on the hard drive that you're
using on your Mac or PC
435
00:20:43,340 --> 00:20:45,690
or somewhere in the
cloud, as in our case.
436
00:20:45,690 --> 00:20:50,210
And inside of this folder,
traditionally called /usr/include.
437
00:20:50,210 --> 00:20:51,857
And user is deliberately misspelled.
438
00:20:51,857 --> 00:20:54,440
It's just slightly more succinct,
although it's a little weird
439
00:20:54,440 --> 00:20:55,760
why we drop that one letter.
440
00:20:55,760 --> 00:21:01,760
But usr/include is just a folder on the
server that contains cs50.h, stdio.h,
441
00:21:01,760 --> 00:21:03,990
and a bunch of other things as well.
442
00:21:03,990 --> 00:21:08,030
So in fact, if you type in VS
Code, in your terminal window,
443
00:21:08,030 --> 00:21:13,310
when you're using code spaces in the
cloud and type LS space /usr/include,
444
00:21:13,310 --> 00:21:15,470
you can see all of the
files in that folder.
445
00:21:15,470 --> 00:21:17,580
But we've preinstalled
all of that stuff for you.
446
00:21:17,580 --> 00:21:20,390
So let's consider what's
actually in those files here.
447
00:21:20,390 --> 00:21:25,370
If I highlight these two lines up top
that start with hash include, well,
448
00:21:25,370 --> 00:21:30,530
I kind of hinted last week that what's
in that first file is a hint as to what
449
00:21:30,530 --> 00:21:32,660
functions CS50 wrote for you.
450
00:21:32,660 --> 00:21:35,540
So you can kind of think
of these include lines
451
00:21:35,540 --> 00:21:38,300
as being temporary
placeholders for what's
452
00:21:38,300 --> 00:21:41,000
going to become like a
global find and replace.
453
00:21:41,000 --> 00:21:44,270
That is the first thing clang is going
to do is to preprocess this file.
454
00:21:44,270 --> 00:21:47,300
It's going to look for any line
that starts with hash include.
455
00:21:47,300 --> 00:21:50,960
And if it sees that, it's going
to essentially go into that file,
456
00:21:50,960 --> 00:21:55,190
like cs50.h, and then just copy
and paste the contents of that file
457
00:21:55,190 --> 00:21:56,443
magically there for you.
458
00:21:56,443 --> 00:21:58,110
You don't see it visually on the screen.
459
00:21:58,110 --> 00:22:00,060
But it's happening behind the scenes.
460
00:22:00,060 --> 00:22:03,230
And so really, what's
happening with this first line
461
00:22:03,230 --> 00:22:09,380
is that somewhere in cs50.h is
the declaration of getString
462
00:22:09,380 --> 00:22:11,690
like we talked last
week, and it probably
463
00:22:11,690 --> 00:22:13,215
looks a little something like this.
464
00:22:13,215 --> 00:22:15,590
And we didn't spend much time
on this yet this past week,
465
00:22:15,590 --> 00:22:17,030
but we will in time more.
466
00:22:17,030 --> 00:22:21,470
Notice that this is how
a function is declared.
467
00:22:21,470 --> 00:22:23,677
That is, it is decreed to exist.
468
00:22:23,677 --> 00:22:25,760
The name of the function,
of course, is getString.
469
00:22:25,760 --> 00:22:28,310
Inside of the parentheses
are its arguments.
470
00:22:28,310 --> 00:22:31,580
In this case, there's one argument
to getString, I claim today,
471
00:22:31,580 --> 00:22:33,080
but you've known this implicitly.
472
00:22:33,080 --> 00:22:34,160
And it's a prompt.
473
00:22:34,160 --> 00:22:36,860
It's the prompt that the human
sees when you use getString.
474
00:22:36,860 --> 00:22:37,790
What is that prompt?
475
00:22:37,790 --> 00:22:41,060
Well, it's a string of text, like
quote unquote, "what's your name?"
476
00:22:41,060 --> 00:22:43,080
or anything else that I asked last week.
477
00:22:43,080 --> 00:22:46,610
Meanwhile, getString, as we know
from last week, has a return value.
478
00:22:46,610 --> 00:22:48,140
It returns something to you.
479
00:22:48,140 --> 00:22:49,610
And that, too, is a string.
480
00:22:49,610 --> 00:22:52,120
So again, this is also
called a functions prototype.
481
00:22:52,120 --> 00:22:53,870
It's the thing toward
the end of last week
482
00:22:53,870 --> 00:22:57,560
that I just copied and pasted from
the bottom of my file to the top,
483
00:22:57,560 --> 00:23:02,030
just so that it was like this teaser
for clang as to what would exist later.
484
00:23:02,030 --> 00:23:07,670
So you can think, then, of these include
lines as just kind of combining all
485
00:23:07,670 --> 00:23:11,360
of those function declarations in
some separate file called cs50.h,
486
00:23:11,360 --> 00:23:14,780
so that you yourself don't have to type
them every time you use the library--
487
00:23:14,780 --> 00:23:18,470
or worse, so that you, yourself, don't
have to copy and paste those lines.
488
00:23:18,470 --> 00:23:22,520
This is what clang is doing for you
in its first step of preprocessing.
489
00:23:22,520 --> 00:23:27,470
Second, and last in this example,
what happens when clang preprocesses
490
00:23:27,470 --> 00:23:29,175
this second include line?
491
00:23:29,175 --> 00:23:31,550
Well, the only other function
we care about in this story
492
00:23:31,550 --> 00:23:33,650
is printf, of course,
which comes with C.
493
00:23:33,650 --> 00:23:39,440
So essentially, you can think of
printf's prototype or declaration
494
00:23:39,440 --> 00:23:40,820
as just being this.
495
00:23:40,820 --> 00:23:42,870
Printf is the name of the function.
496
00:23:42,870 --> 00:23:47,370
It takes a string that you want
to format like, Hello comma world,
497
00:23:47,370 --> 00:23:49,110
or Hello comma %s.
498
00:23:49,110 --> 00:23:52,120
And then with dot, dot, dot, this
actually has technical meaning.
499
00:23:52,120 --> 00:23:55,770
It means, of course, that you can
plug-in 0 variables, 1 variable, 2
500
00:23:55,770 --> 00:23:56,340
or 10.
501
00:23:56,340 --> 00:23:58,530
So dot, dot, dot means
some number of variables.
502
00:23:58,530 --> 00:24:00,072
Now we haven't talked about this yet.
503
00:24:00,072 --> 00:24:01,410
And we won't really, in general.
504
00:24:01,410 --> 00:24:05,490
printf actually returns a value,
a number, that is an integer.
505
00:24:05,490 --> 00:24:07,420
But more on that perhaps another time.
506
00:24:07,420 --> 00:24:10,920
It's generally not something
the programmer tends to look at.
507
00:24:10,920 --> 00:24:14,250
But that's all we mean by preprocessing,
so that at the end of this process,
508
00:24:14,250 --> 00:24:18,030
even though there's more lines
of code in cs50.h and stdio.h,
509
00:24:18,030 --> 00:24:21,330
what's really just happening
is that clang, in preprocessing
510
00:24:21,330 --> 00:24:25,380
the file, copies and pastes the
contents of those files into your code
511
00:24:25,380 --> 00:24:29,160
so that now your code knows about
everything-- getString, printf,
512
00:24:29,160 --> 00:24:31,060
and anything else.
513
00:24:31,060 --> 00:24:35,230
Any questions, then, on that
first step, preprocessing?
514
00:24:35,230 --> 00:24:35,920
Yes?
515
00:24:35,920 --> 00:24:49,195
AUDIENCE: [INAUDIBLE]
516
00:24:49,195 --> 00:24:50,320
DAVID MALAN: Good question.
517
00:24:50,320 --> 00:24:52,720
When you include a file,
does it only include what
518
00:24:52,720 --> 00:24:54,880
you need or does it include everything?
519
00:24:54,880 --> 00:24:56,420
Think of it as including everything.
520
00:24:56,420 --> 00:24:59,020
So if it's a big file, that's
a lot of code at the very top.
521
00:24:59,020 --> 00:25:01,880
And that's why, if you think
back to all of the zeros and ones
522
00:25:01,880 --> 00:25:03,880
I showed a little bit
ago, as well as last week,
523
00:25:03,880 --> 00:25:06,130
there's a lot of zeros
and ones that end up
524
00:25:06,130 --> 00:25:08,892
on the screen as a result of
just writing, Hello, world.
525
00:25:08,892 --> 00:25:10,600
A lot of those zeros
and ones are perhaps
526
00:25:10,600 --> 00:25:13,390
coming from code that you didn't
actually, necessarily need.
527
00:25:13,390 --> 00:25:15,340
But some of it is
perhaps there, but there
528
00:25:15,340 --> 00:25:17,740
are ways to optimize that as well.
529
00:25:17,740 --> 00:25:22,395
All right, so step two of compiling
is, confusingly, called compiling.
530
00:25:22,395 --> 00:25:24,520
It's just, this is the term
that most everyone uses
531
00:25:24,520 --> 00:25:27,940
to describe the whole process,
instead of just this one step.
532
00:25:27,940 --> 00:25:32,140
But once a program has been
preprocessed behind the scenes
533
00:25:32,140 --> 00:25:35,865
by the compiler for you, it looks
now a little something like this.
534
00:25:35,865 --> 00:25:38,740
And I've put dot, dot, dot just to
imply that, yes, to your question,
535
00:25:38,740 --> 00:25:39,820
there's more stuff above it.
536
00:25:39,820 --> 00:25:40,987
There's more stuff below it.
537
00:25:40,987 --> 00:25:43,070
It's just not interesting
right now for us.
538
00:25:43,070 --> 00:25:44,860
So now we have just C code.
539
00:25:44,860 --> 00:25:46,960
There's no more preprocessor directives.
540
00:25:46,960 --> 00:25:49,840
At this point, all of the hash
symbols and those lines of code
541
00:25:49,840 --> 00:25:52,670
have been preprocessed and
converted to something else.
542
00:25:52,670 --> 00:25:56,380
And so now-- and this is where
things get a little spooky looking.
543
00:25:56,380 --> 00:26:00,370
Here now is what happens
when clang, or any compiler,
544
00:26:00,370 --> 00:26:03,310
literally compiles code like this.
545
00:26:03,310 --> 00:26:08,720
It converts it from this in
C to this in assembly code.
546
00:26:08,720 --> 00:26:10,720
So this is among the scarier languages.
547
00:26:10,720 --> 00:26:12,580
I, myself, don't really
have fond memories.
548
00:26:12,580 --> 00:26:14,805
This is not a language that
many people program in.
549
00:26:14,805 --> 00:26:16,930
If you take a subsequent
class in computer science,
550
00:26:16,930 --> 00:26:19,600
in systems, a higher level
class, you might actually
551
00:26:19,600 --> 00:26:21,430
learn this or some variant thereof.
552
00:26:21,430 --> 00:26:23,232
But there's at least
a few people out there
553
00:26:23,232 --> 00:26:24,940
that need to know this
stuff because this
554
00:26:24,940 --> 00:26:29,320
is closer to what the computers
themselves, nowadays, understand.
555
00:26:29,320 --> 00:26:34,600
The Intel CPUs or the AMD CPUs, the
brains of today's computers and phones
556
00:26:34,600 --> 00:26:37,960
understand stuff that looks
more like this and less like C.
557
00:26:37,960 --> 00:26:42,430
Now it's completely esoteric, but
let me just highlight a few phrases.
558
00:26:42,430 --> 00:26:44,630
There's some stuff
that's a little familiar.
559
00:26:44,630 --> 00:26:47,620
There is mention of main
at the top there in yellow.
560
00:26:47,620 --> 00:26:49,750
There is mention of
getString toward the bottom.
561
00:26:49,750 --> 00:26:52,070
There is mention of printf down below.
562
00:26:52,070 --> 00:26:55,600
So this is just another programming
language called assembly language,
563
00:26:55,600 --> 00:26:57,010
that decades ago, humans--
564
00:26:57,010 --> 00:26:58,450
myself included in school--
565
00:26:58,450 --> 00:27:00,130
did write code in.
566
00:27:00,130 --> 00:27:02,630
And absolutely, some people
still write this code,
567
00:27:02,630 --> 00:27:06,070
especially since you can write
very, very efficient code.
568
00:27:06,070 --> 00:27:08,590
But it's a lot more arcane.
569
00:27:08,590 --> 00:27:11,380
It's a lot less user friendly.
570
00:27:11,380 --> 00:27:14,650
So you'll see in yellow now, these
are the so-called instructions
571
00:27:14,650 --> 00:27:18,460
that a computer's brain or CPU
understands, pushing values
572
00:27:18,460 --> 00:27:23,630
around, moving them, subtracting values,
calling functions, and move, move,
573
00:27:23,630 --> 00:27:24,130
move.
574
00:27:24,130 --> 00:27:27,400
So really, the low-level operations
that computers understand
575
00:27:27,400 --> 00:27:31,030
tend to be arithmetic operations--
subtraction, addition,
576
00:27:31,030 --> 00:27:34,120
and the like-- moving
things in and out of memory.
577
00:27:34,120 --> 00:27:37,510
It's just a lot more tedious for
folks like us to write code like this.
578
00:27:37,510 --> 00:27:40,450
This is why you and I tend
to write stuff like this.
579
00:27:40,450 --> 00:27:44,080
And ideally, still, people like you and
I tend to drag and drop puzzle pieces
580
00:27:44,080 --> 00:27:46,520
that sort of abstract
all of that away further.
581
00:27:46,520 --> 00:27:49,420
But for now, this is, again,
called assembly language.
582
00:27:49,420 --> 00:27:54,310
It is what happens when the compiler
literally compiles your code.
583
00:27:54,310 --> 00:27:57,010
But of course, this,
still not zeros and ones.
584
00:27:57,010 --> 00:27:58,580
So we got two steps to go.
585
00:27:58,580 --> 00:28:02,270
So when a compiler
proceeds to step three,
586
00:28:02,270 --> 00:28:05,530
this is where things get
converted to machine code.
587
00:28:05,530 --> 00:28:08,500
And when a compiler
assembles your code for you,
588
00:28:08,500 --> 00:28:14,260
it converts what we just saw on the
screen here to actual zeros and ones--
589
00:28:14,260 --> 00:28:18,550
the so-called machine code that your
phone or your computer understands.
590
00:28:18,550 --> 00:28:22,120
But it's worth noting that
these are not necessarily all
591
00:28:22,120 --> 00:28:24,280
of the zeros and ones of your program.
592
00:28:24,280 --> 00:28:29,980
Yes, they are the zeros and ones
that correspond to your Hello program
593
00:28:29,980 --> 00:28:33,250
or printf and getString
and the like, but notice
594
00:28:33,250 --> 00:28:36,940
that here, we need one final step.
595
00:28:36,940 --> 00:28:40,100
In those zeros and ones are
only your lines of code.
596
00:28:40,100 --> 00:28:43,540
But what about CS50's lines of code
that we wrote to implement getString?
597
00:28:43,540 --> 00:28:46,990
What about the lines of code that humans
wrote decades ago to implement printf?
598
00:28:46,990 --> 00:28:50,020
Those are somewhere on this hard
drive, like on my Mac, my PC,
599
00:28:50,020 --> 00:28:54,460
or somewhere in the cloud, but we need
to combine all of those zeros and ones
600
00:28:54,460 --> 00:29:01,390
together and link my code with
CS50's code with standard I/O's code,
601
00:29:01,390 --> 00:29:02,420
all together.
602
00:29:02,420 --> 00:29:05,110
And so what happens in
the last step, ultimately,
603
00:29:05,110 --> 00:29:07,960
is that if we have my
code here in yellow,
604
00:29:07,960 --> 00:29:11,440
and then the code that CS50 wrote,
and the code that the authors of C
605
00:29:11,440 --> 00:29:15,940
itself wrote, what really is happening
is that somewhere, we have not only
606
00:29:15,940 --> 00:29:19,960
hello.c, which, obviously, I
wrote, and wrote with us live here,
607
00:29:19,960 --> 00:29:24,550
there's also, let's assume, somewhere
on the computer, a cs50.c file
608
00:29:24,550 --> 00:29:28,210
that, coincidentally, I and
CS50 staff wrote years ago.
609
00:29:28,210 --> 00:29:30,790
And also, somewhere on the
computer, there's another file.
610
00:29:30,790 --> 00:29:34,120
Let me oversimplify by
just calling it stdio.c.
611
00:29:34,120 --> 00:29:36,850
In practice, it's probably
specifically called printf.c.
612
00:29:36,850 --> 00:29:39,460
But they're somewhere,
these two other files.
613
00:29:39,460 --> 00:29:44,110
And so this last step called
linking takes my zeros and ones
614
00:29:44,110 --> 00:29:48,100
from the code I just wrote, namely
this code on the screen here.
615
00:29:48,100 --> 00:29:50,810
It then grabs the zeros
and ones that CS50 wrote.
616
00:29:50,810 --> 00:29:53,480
And it grabs the zeros and ones
that the authors of C wrote,
617
00:29:53,480 --> 00:29:56,240
in order to implement
the standard I/O library.
618
00:29:56,240 --> 00:30:00,750
And lastly, voila,
links them all together.
619
00:30:00,750 --> 00:30:03,980
And this is the same blob of zeros
and ones that we saw earlier.
620
00:30:03,980 --> 00:30:08,090
It's just now the result
of preprocessing your code,
621
00:30:08,090 --> 00:30:12,620
compiling your code, assembling your
code, linking your code, and my God,
622
00:30:12,620 --> 00:30:15,830
at this point, like if there were
any fun in programming for you yet,
623
00:30:15,830 --> 00:30:19,620
we've just taken it all away, we just
call this whole process compiling.
624
00:30:19,620 --> 00:30:20,120
Why?
625
00:30:20,120 --> 00:30:22,490
Because now that we
know those steps exist--
626
00:30:22,490 --> 00:30:25,370
and smart people solve
that problem for us--
627
00:30:25,370 --> 00:30:27,890
you and I can kind of operate
at this level of abstraction
628
00:30:27,890 --> 00:30:32,420
and just assume that compiling
converts source code to machine code.
629
00:30:32,420 --> 00:30:36,350
Questions, though, on any
of these intermediate steps?
630
00:30:36,350 --> 00:30:37,360
Yeah?
631
00:30:37,360 --> 00:30:41,958
AUDIENCE: For linking, are
different parts, like [INAUDIBLE]??
632
00:30:41,958 --> 00:30:50,072
633
00:30:50,072 --> 00:30:51,280
DAVID MALAN: A good question.
634
00:30:51,280 --> 00:30:53,238
So where are all of these
zeros and one stored?
635
00:30:53,238 --> 00:30:56,400
Because you and I, we've been using
a browser, right? code.cs50.io,
636
00:30:56,400 --> 00:30:58,330
of course, is this
web-based user interface.
637
00:30:58,330 --> 00:31:00,497
But again, recall from last
week, even though you're
638
00:31:00,497 --> 00:31:05,640
using a web browser to access VS Code,
that web-based version of VS code
639
00:31:05,640 --> 00:31:09,000
is connected to an actual
server somewhere in the cloud.
640
00:31:09,000 --> 00:31:13,170
And on that server, you have your own
account and your own files, and really,
641
00:31:13,170 --> 00:31:15,360
your own hard drive,
virtually in the cloud.
642
00:31:15,360 --> 00:31:18,872
Think of it a little like Dropbox
or Box or Google Drive or OneDrive
643
00:31:18,872 --> 00:31:19,830
or something like that.
644
00:31:19,830 --> 00:31:23,310
So you have a hard drive somewhere out
there that we've provisioned for you.
645
00:31:23,310 --> 00:31:27,930
And it's on that hard drive that you
have your code that you just wrote,
646
00:31:27,930 --> 00:31:32,700
or I just wrote, cs50.c, stdio.c,
and all of the other code
647
00:31:32,700 --> 00:31:36,967
that implements the math functions
and everything else that C supports.
648
00:31:36,967 --> 00:31:37,550
Good question.
649
00:31:37,550 --> 00:31:38,964
Yeah?
650
00:31:38,964 --> 00:31:45,425
AUDIENCE: So, say in the CS50
library, the line [INAUDIBLE]
651
00:31:45,425 --> 00:31:49,401
do we do the same
exact thing [INAUDIBLE]
652
00:31:49,401 --> 00:31:51,935
copy paste them all the way over?
653
00:31:51,935 --> 00:31:53,060
DAVID MALAN: Good question.
654
00:31:53,060 --> 00:31:57,110
That hash includes cs50.h
line at the top of my code.
655
00:31:57,110 --> 00:32:01,310
If I just replace that with the
contents of cs50.c, would that work?
656
00:32:01,310 --> 00:32:03,590
Short answer, yes, that would work.
657
00:32:03,590 --> 00:32:05,400
You could copy all of the code there.
658
00:32:05,400 --> 00:32:08,577
However, there's some order of
operations that might come into play.
659
00:32:08,577 --> 00:32:10,910
And so it's probably not quite
as simple as copy, paste.
660
00:32:10,910 --> 00:32:13,190
But conceptually, yes,
that's what's happening.
661
00:32:13,190 --> 00:32:19,370
Now with that said, in cs50.h, are
only the prototypes of the functions,
662
00:32:19,370 --> 00:32:23,628
the hints as to how the functions
look, what their return type is,
663
00:32:23,628 --> 00:32:25,670
what their name is, and
what their arguments are.
664
00:32:25,670 --> 00:32:29,867
It's in the dot c file that
actual code tends to be written.
665
00:32:29,867 --> 00:32:32,450
And this is a little confusing
now because you and I have only
666
00:32:32,450 --> 00:32:33,920
written code in dot c files.
667
00:32:33,920 --> 00:32:35,690
But in the next few
weeks, you'll actually
668
00:32:35,690 --> 00:32:37,940
start writing some of
your own dot h files
669
00:32:37,940 --> 00:32:40,460
as well, just like CS50,
just like standard I/O.
670
00:32:40,460 --> 00:32:44,150
But in essence, that line of code
just makes it easier to use and reuse
671
00:32:44,150 --> 00:32:46,020
code that's already been written.
672
00:32:46,020 --> 00:32:47,750
And that's the whole point of a library.
673
00:32:47,750 --> 00:32:50,327
AUDIENCE: Does linking them [INAUDIBLE]?
674
00:32:50,327 --> 00:32:51,910
DAVID MALAN: Say that a little louder.
675
00:32:51,910 --> 00:32:54,472
AUDIENCE: Does linking happen
when you use the compiler?
676
00:32:54,472 --> 00:32:55,180
DAVID MALAN: Yes.
677
00:32:55,180 --> 00:32:56,980
Does linking happen when
you compile your code?
678
00:32:56,980 --> 00:32:57,480
Yes.
679
00:32:57,480 --> 00:33:02,320
When you run make, as we have
been doing the past week now,
680
00:33:02,320 --> 00:33:04,570
all four of these steps are happening.
681
00:33:04,570 --> 00:33:07,780
Preprocessing converts the hash
include lines to something else.
682
00:33:07,780 --> 00:33:10,600
Compiling technically
converts it to assembly
683
00:33:10,600 --> 00:33:14,290
code, which the Mac, the PC, the
server more closely understands.
684
00:33:14,290 --> 00:33:18,850
Assembly converts that language to
binary machine code that this computer
685
00:33:18,850 --> 00:33:20,080
actually understands.
686
00:33:20,080 --> 00:33:22,540
And then linking combines
everything together.
687
00:33:22,540 --> 00:33:27,550
And in fact, if you think back a few
minutes ago to when I did this -lcs50,
688
00:33:27,550 --> 00:33:30,070
the reason I had to add
that, and the reason
689
00:33:30,070 --> 00:33:32,860
my code did not compile
at first, was because I
690
00:33:32,860 --> 00:33:38,650
forgot to tell clang to link in CS50's
zeros and ones per that last step.
691
00:33:38,650 --> 00:33:42,147
I don't need to do -lstdio
because it comes with C,
692
00:33:42,147 --> 00:33:44,480
so that would just be tedious
for everyone in the world.
693
00:33:44,480 --> 00:33:47,140
But CS50 does not come
with C, so we link that in.
694
00:33:47,140 --> 00:33:49,780
And to be clear, too, we won't
always use CS50's library.
695
00:33:49,780 --> 00:33:53,072
That'll be yet another pair of training
wheels we take off in the coming weeks.
696
00:33:53,072 --> 00:33:55,000
But for now, it makes
a few things simpler.
697
00:33:55,000 --> 00:33:57,284
Yeah?
698
00:33:57,284 --> 00:33:59,750
AUDIENCE: What is the [INAUDIBLE]?
699
00:33:59,750 --> 00:34:08,878
700
00:34:08,878 --> 00:34:10,170
DAVID MALAN: Short answer, yes.
701
00:34:10,170 --> 00:34:12,870
So what do the zeros and ones,
the machine code, translate to?
702
00:34:12,870 --> 00:34:15,690
Yes, there is a one-to-one
relationship between the machine
703
00:34:15,690 --> 00:34:17,340
code and the assembly code.
704
00:34:17,340 --> 00:34:21,510
Assembly code, it's not really English,
but at least it's symbols I recognize.
705
00:34:21,510 --> 00:34:22,800
It's not zeros and ones.
706
00:34:22,800 --> 00:34:24,810
Machine code, of course,
is just zeros and ones.
707
00:34:24,810 --> 00:34:27,960
So back in the day,
before C existed, people
708
00:34:27,960 --> 00:34:30,630
were programming only in assembly code.
709
00:34:30,630 --> 00:34:34,469
Before assembly code existed, people
were coding in zeros and ones.
710
00:34:34,469 --> 00:34:36,719
And you can imagine just
how painful that was,
711
00:34:36,719 --> 00:34:39,027
and so each of these
languages makes life, for us,
712
00:34:39,027 --> 00:34:40,110
sort of easier and easier.
713
00:34:40,110 --> 00:34:42,330
In a few weeks, we'll
transition to Python, which
714
00:34:42,330 --> 00:34:45,300
will, in turn, make C even simpler--
715
00:34:45,300 --> 00:34:48,090
or coding, in general,
simpler to do too.
716
00:34:48,090 --> 00:34:53,346
All right, so with that
said, what now can we--
717
00:34:53,346 --> 00:34:55,060
what could go wrong with this?
718
00:34:55,060 --> 00:34:58,140
Well, it turns out that besides
compiling, technically speaking,
719
00:34:58,140 --> 00:34:59,233
there's decompiling.
720
00:34:59,233 --> 00:35:01,150
And we've not done this,
and we won't do this.
721
00:35:01,150 --> 00:35:04,080
But it's worth considering
for just a moment.
722
00:35:04,080 --> 00:35:07,560
If you were to not compile
your code, but decompile it--
723
00:35:07,560 --> 00:35:11,340
as the word suggests, this just means
reversing the process, converting it,
724
00:35:11,340 --> 00:35:14,580
ideally, from machine
code-- zeros and ones--
725
00:35:14,580 --> 00:35:19,870
maybe back to C. Now this would be cool,
perhaps, if all you have is a program,
726
00:35:19,870 --> 00:35:22,080
you can convert it and see
the actual source code.
727
00:35:22,080 --> 00:35:25,320
What might a downside be,
if anyone on the internet
728
00:35:25,320 --> 00:35:28,650
is able to decompile
code on their machine?
729
00:35:28,650 --> 00:35:29,160
Yeah?
730
00:35:29,160 --> 00:35:30,270
AUDIENCE: [INAUDIBLE]
731
00:35:30,270 --> 00:35:34,130
DAVID MALAN: OK, so it's easier
to find bugs in the code that--
732
00:35:34,130 --> 00:35:35,430
oh, to exploit.
733
00:35:35,430 --> 00:35:38,417
So it might be easier to
hack into the software
734
00:35:38,417 --> 00:35:41,000
by finding mistakes you and I
made because, literally, they're
735
00:35:41,000 --> 00:35:43,370
staring at you in code,
whereas the zeros and ones make
736
00:35:43,370 --> 00:35:45,080
it way less obvious.
737
00:35:45,080 --> 00:35:48,140
Other downsides of what
I called decompiling?
738
00:35:48,140 --> 00:35:49,970
Yeah?
739
00:35:49,970 --> 00:35:53,690
AUDIENCE: If stuff is copyrighted or
you don't even know how to get it--
740
00:35:53,690 --> 00:35:54,440
DAVID MALAN: Yeah.
741
00:35:54,440 --> 00:35:55,948
AUDIENCE: [INAUDIBLE]
742
00:35:55,948 --> 00:35:57,740
DAVID MALAN: Yeah, if
your code, your work,
743
00:35:57,740 --> 00:36:00,950
is your intellectual property,
copyrighted or otherwise, that's
744
00:36:00,950 --> 00:36:03,660
kind of obnoxious that someone
can just run a command, and boom,
745
00:36:03,660 --> 00:36:05,577
they can see the original
code that you wrote.
746
00:36:05,577 --> 00:36:08,490
Now, it turns out it's not
quite as simple as that.
747
00:36:08,490 --> 00:36:11,720
And so even though, yes, you
could take a program like Hello,
748
00:36:11,720 --> 00:36:15,080
or even Microsoft Word, and
convert it from zeros and ones
749
00:36:15,080 --> 00:36:19,400
back to some form of source
code-- be it in C or Java
750
00:36:19,400 --> 00:36:22,820
or Python or something else, whatever
it was originally written in-- odds
751
00:36:22,820 --> 00:36:25,800
are it's going to be an
utter mess to look at.
752
00:36:25,800 --> 00:36:26,300
Why?
753
00:36:26,300 --> 00:36:30,390
Because things variable names are
not retained in the zeros and ones,
754
00:36:30,390 --> 00:36:30,890
typically.
755
00:36:30,890 --> 00:36:33,980
Function names might not be
retained in the zeros and ones.
756
00:36:33,980 --> 00:36:36,350
The code is, the logic
is, but the computer
757
00:36:36,350 --> 00:36:38,510
doesn't care what pretty
variables you chose
758
00:36:38,510 --> 00:36:41,060
and how nicely named your
functions were, it just
759
00:36:41,060 --> 00:36:42,890
needs to know them as zeros and ones.
760
00:36:42,890 --> 00:36:46,370
Moreover, if you think about last week,
we introduced things like loops in C.
761
00:36:46,370 --> 00:36:49,745
And besides for loops, there's what
other kind of loop, for instance?
762
00:36:49,745 --> 00:36:50,620
AUDIENCE: [INAUDIBLE]
763
00:36:50,620 --> 00:36:53,412
DAVID MALAN: So, a while loop--
and even though they look different
764
00:36:53,412 --> 00:36:55,920
and you have to write different
code, they achieve exactly
765
00:36:55,920 --> 00:36:59,910
the same functionality, which is
to say, when you compile a for loop
766
00:36:59,910 --> 00:37:04,140
or you compile a while loop, if
they logically do the same thing,
767
00:37:04,140 --> 00:37:07,420
they might end up looking
identical as zeros and ones.
768
00:37:07,420 --> 00:37:09,780
And so, therefore, it's
not necessarily predictable
769
00:37:09,780 --> 00:37:11,820
that you'll get back
the original code, why?
770
00:37:11,820 --> 00:37:15,110
Because the zeros and ones
might not know, so to speak,
771
00:37:15,110 --> 00:37:16,860
whether it was a for
loop or a while loop,
772
00:37:16,860 --> 00:37:19,350
so maybe compiling will
show you one or the other.
773
00:37:19,350 --> 00:37:21,870
And honestly, decompiling,
while possible-- and it's
774
00:37:21,870 --> 00:37:24,570
one way of reverse
engineering someone's product.
775
00:37:24,570 --> 00:37:28,662
Odds are, if you're good enough to start
reading code that's been decompiled
776
00:37:28,662 --> 00:37:30,870
and reading through the
messiness of it, odds are you
777
00:37:30,870 --> 00:37:34,020
have the talent probably to just
write that same program from scratch
778
00:37:34,020 --> 00:37:34,650
yourself.
779
00:37:34,650 --> 00:37:36,870
Now, that's an overstatement,
perhaps, but it's not
780
00:37:36,870 --> 00:37:40,410
quite as easy or threatening
as you might first think.
781
00:37:40,410 --> 00:37:43,290
So in general, once
code is compiled, it's
782
00:37:43,290 --> 00:37:48,290
pretty challenging, time consuming,
costly to reverse engineer it, much
783
00:37:48,290 --> 00:37:50,040
like it would be in
the real world, right?
784
00:37:50,040 --> 00:37:52,860
Like all of us have some kind of phone,
probably, nowadays in our pocket.
785
00:37:52,860 --> 00:37:55,193
There's nothing stopping you
from opening it up somehow,
786
00:37:55,193 --> 00:37:57,060
poking around, recreating what's there.
787
00:37:57,060 --> 00:37:59,130
That's a huge amount
of effort, most likely.
788
00:37:59,130 --> 00:38:01,880
And at that point, maybe you should
just invent the phone, instead
789
00:38:01,880 --> 00:38:03,310
of trying to reverse engineer it.
790
00:38:03,310 --> 00:38:06,330
So same kind of idea
in the physical world.
791
00:38:06,330 --> 00:38:13,050
Any questions, then, on compiling,
or even decompiling in these forms?
792
00:38:13,050 --> 00:38:17,160
All right, so odds are, at this point,
not only I, but you have made mistakes.
793
00:38:17,160 --> 00:38:19,050
And you've written buggy code--
794
00:38:19,050 --> 00:38:22,350
a bug in a code is just a
mistake, a logical error
795
00:38:22,350 --> 00:38:26,490
or otherwise, where the code just does
not behave correctly as you intend.
796
00:38:26,490 --> 00:38:29,880
And up until now, odds are,
your debugging techniques
797
00:38:29,880 --> 00:38:32,910
have been to maybe look back
at what I did in class, maybe
798
00:38:32,910 --> 00:38:35,320
ask a question online or in-person.
799
00:38:35,320 --> 00:38:38,190
But ultimately, it'd be nice if
you had some tools of your own
800
00:38:38,190 --> 00:38:39,570
with which to debug code.
801
00:38:39,570 --> 00:38:41,587
And this, honestly, is a lifelong skill.
802
00:38:41,587 --> 00:38:43,170
You're not going to emerge from CS50--
803
00:38:43,170 --> 00:38:44,490
and even 20 years from
now, you're not going
804
00:38:44,490 --> 00:38:47,910
to be writing-- if you're writing code
at all-- correct code all of the time.
805
00:38:47,910 --> 00:38:50,820
Like, all of us on the staff
continue to write bugs.
806
00:38:50,820 --> 00:38:54,120
Hopefully, they get a little more
sophisticated, and not sort of like,
807
00:38:54,120 --> 00:38:55,540
oops, I missed a semicolon.
808
00:38:55,540 --> 00:38:57,660
But even those kinds of
mistakes, we make too.
809
00:38:57,660 --> 00:39:00,150
But there's tools out
there and techniques
810
00:39:00,150 --> 00:39:03,550
that can make your life easier when
it comes to solving those problems.
811
00:39:03,550 --> 00:39:06,360
Now, the term bug has actually
been around for decades.
812
00:39:06,360 --> 00:39:11,790
But a fun story to tell is that
the first documented actual bug was
813
00:39:11,790 --> 00:39:13,650
actually somehow connected to Harvard.
814
00:39:13,650 --> 00:39:18,870
In fact, this is the logbook relating
to the Harvard Mark II computer
815
00:39:18,870 --> 00:39:22,890
from 1947, whereby if you read the
notes here-- and I'll Zoom in-- this
816
00:39:22,890 --> 00:39:27,630
was an actual moth discovered inside
of this big mainframe computer that
817
00:39:27,630 --> 00:39:29,160
was causing some kind of problems.
818
00:39:29,160 --> 00:39:30,450
And the engineers there
at the time actually
819
00:39:30,450 --> 00:39:33,610
thought it was funny that, wow, physical
bug actually explains the issue.
820
00:39:33,610 --> 00:39:36,450
And it's been forever taped to this
sheet of paper, which I believe
821
00:39:36,450 --> 00:39:39,090
now is on display in the Smithsonian.
822
00:39:39,090 --> 00:39:43,260
With that said, this is just
representative, too, of a logical bug.
823
00:39:43,260 --> 00:39:45,390
And that story is actually--
824
00:39:45,390 --> 00:39:49,170
that story was often retold by a famous
mathematician, then computer scientist
825
00:39:49,170 --> 00:39:53,640
really, Dr. Grace Hopper, who actually
worked not only on the Harvard Mark II
826
00:39:53,640 --> 00:39:57,210
computer, but its predecessor,
the Harvard Mark I.
827
00:39:57,210 --> 00:40:01,020
And if you ever spent time, yet, in the
engineering building across the river
828
00:40:01,020 --> 00:40:04,103
here, you can actually see
much of this computer, which
829
00:40:04,103 --> 00:40:07,020
is along the wall when you first
walk into the Science and Engineering
830
00:40:07,020 --> 00:40:07,530
Complex.
831
00:40:07,530 --> 00:40:09,530
And indeed, as you've
probably heard growing up,
832
00:40:09,530 --> 00:40:11,070
this is a mainframe computer.
833
00:40:11,070 --> 00:40:15,210
This is what Macs and PCs, so to
speak, looked like back in the day,
834
00:40:15,210 --> 00:40:18,240
with very physical things that
essentially implemented the zeros
835
00:40:18,240 --> 00:40:21,900
and ones that you and I take for granted
now being miniaturized in our laptops
836
00:40:21,900 --> 00:40:22,410
and phones.
837
00:40:22,410 --> 00:40:23,910
So there's a piece of history there.
838
00:40:23,910 --> 00:40:27,390
If you visit that side of
campus sometime, do take a look.
839
00:40:27,390 --> 00:40:30,480
But let's consider, then, how we
solve not, of course, physical bugs,
840
00:40:30,480 --> 00:40:31,350
but logical bugs.
841
00:40:31,350 --> 00:40:33,600
And let's consider something
like this from last week,
842
00:40:33,600 --> 00:40:38,820
whereby, we were trying very simply to
print like this column of three bricks
843
00:40:38,820 --> 00:40:40,320
using hashtags of sorts.
844
00:40:40,320 --> 00:40:44,400
So let me go over here in
just a moment to VS Code.
845
00:40:44,400 --> 00:40:47,080
And I'm going to go ahead and
open a program I wrote in advance.
846
00:40:47,080 --> 00:40:49,455
And I'm bringing it to class
because there's a bug in it,
847
00:40:49,455 --> 00:40:51,510
and I'd like to figure
out how to solve this bug.
848
00:40:51,510 --> 00:40:56,160
So let me open up a buggy0.c,
which is version 0 of my code.
849
00:40:56,160 --> 00:40:58,200
And let's just take a
quick peek at what's here.
850
00:40:58,200 --> 00:40:58,950
It's pretty short.
851
00:40:58,950 --> 00:41:03,750
It includes only stdio.h, it
uses printf, it uses a for loop,
852
00:41:03,750 --> 00:41:07,797
and the goal, quite simply, is to
print out that column of three bricks.
853
00:41:07,797 --> 00:41:11,130
Now, it's short enough that some of you,
if you're getting comfy already with C,
854
00:41:11,130 --> 00:41:13,360
you might already see the logical bug.
855
00:41:13,360 --> 00:41:16,200
It's not a syntax error,
like it will compile and run.
856
00:41:16,200 --> 00:41:17,280
But there's a bug there.
857
00:41:17,280 --> 00:41:22,320
And suppose that I'm very new to C, I'm
very uncomfortable with C, it's 2:00 AM
858
00:41:22,320 --> 00:41:26,130
and I just can't see the bug, what
are my recourses here for actually
859
00:41:26,130 --> 00:41:27,745
finding a mistake like this?
860
00:41:27,745 --> 00:41:29,370
Well, first, let's look at the symptom.
861
00:41:29,370 --> 00:41:31,740
Let me go down to my terminal window.
862
00:41:31,740 --> 00:41:36,120
I'm going to use make buggy0 because,
again, the file is called buggyo.c.
863
00:41:36,120 --> 00:41:37,260
I'm not going to use clang.
864
00:41:37,260 --> 00:41:39,880
In fact, I'm never really going
to use clang manually here on out.
865
00:41:39,880 --> 00:41:42,430
I'm just going to use make
because it makes our lives easier.
866
00:41:42,430 --> 00:41:43,560
It does compile.
867
00:41:43,560 --> 00:41:45,390
No errors, so it's not syntax.
868
00:41:45,390 --> 00:41:47,670
It's not something silly
like a missing semicolon.
869
00:41:47,670 --> 00:41:53,190
But when I run ./buggy0, I, of
course, see one, two, three, four--
870
00:41:53,190 --> 00:41:57,990
and this, of course, does not match the
one, two, three bricks that I actually
871
00:41:57,990 --> 00:41:59,610
intended for that column.
872
00:41:59,610 --> 00:42:02,970
And yet, I'm starting counting
at 0, as I usually do.
873
00:42:02,970 --> 00:42:03,930
I've got three.
874
00:42:03,930 --> 00:42:05,280
I'm going up to three.
875
00:42:05,280 --> 00:42:06,780
So where is my logical error?
876
00:42:06,780 --> 00:42:10,150
If it hasn't obviously jumped out at
you already, well, how can I solve this?
877
00:42:10,150 --> 00:42:13,080
Well, first and foremost,
perhaps the best technique
878
00:42:13,080 --> 00:42:16,080
for solving bugs, at least
early on, is just use printf.
879
00:42:16,080 --> 00:42:20,020
Like thus far, we've used sprint say,
Hello, and other things on the screen.
880
00:42:20,020 --> 00:42:22,530
But printf is just a function
for printing anything.
881
00:42:22,530 --> 00:42:24,570
And there's no reason
you can't temporarily
882
00:42:24,570 --> 00:42:27,900
use printf to print out
the contents of variables,
883
00:42:27,900 --> 00:42:29,850
what's going on inside
of your program, just
884
00:42:29,850 --> 00:42:31,350
to figure out where your mistake is.
885
00:42:31,350 --> 00:42:32,940
And then you can delete
that line of code later.
886
00:42:32,940 --> 00:42:34,600
It doesn't have to stay there forever.
887
00:42:34,600 --> 00:42:35,740
So let me do this.
888
00:42:35,740 --> 00:42:39,450
Instead of just printing out
in VS Code the hash symbol,
889
00:42:39,450 --> 00:42:45,690
let me do a little safety check
here and print out the value of i.
890
00:42:45,690 --> 00:42:49,170
So let me go ahead and
say something like, i is--
891
00:42:49,170 --> 00:42:51,610
now I want to say i is this.
892
00:42:51,610 --> 00:42:54,540
But, of course, this is not
how I print out the value of i.
893
00:42:54,540 --> 00:42:58,930
If I want to print out the value
of i, what should I put here?
894
00:42:58,930 --> 00:43:02,160
So %i for integer,
instead of %s for string.
895
00:43:02,160 --> 00:43:03,410
So they're still placeholders.
896
00:43:03,410 --> 00:43:04,930
But we use %s for integers.
897
00:43:04,930 --> 00:43:08,450
And now if I want to print out i, I just
need the comma as the second argument,
898
00:43:08,450 --> 00:43:09,250
and then i.
899
00:43:09,250 --> 00:43:13,000
All right, let me go ahead and
back to my terminal window.
900
00:43:13,000 --> 00:43:15,760
Let me recompile the program
because I've changed it.
901
00:43:15,760 --> 00:43:18,880
That still works fine, ./buggy0.
902
00:43:18,880 --> 00:43:22,540
And now, let me increase the
size of my terminal window here.
903
00:43:22,540 --> 00:43:25,510
You just see some diagnostic
information, if you will.
904
00:43:25,510 --> 00:43:26,560
This is not the goal.
905
00:43:26,560 --> 00:43:29,393
This is not what you should be
submitting for this homework problem,
906
00:43:29,393 --> 00:43:30,070
were it one.
907
00:43:30,070 --> 00:43:33,730
But it is helping us diagnostically
know that, OK, when i is zero,
908
00:43:33,730 --> 00:43:34,450
here's a hash.
909
00:43:34,450 --> 00:43:36,182
When i is 1, here's a hash.
910
00:43:36,182 --> 00:43:37,390
When i is two, here's a hash.
911
00:43:37,390 --> 00:43:39,017
When i is 3, here's a hash.
912
00:43:39,017 --> 00:43:39,850
Well, wait a minute.
913
00:43:39,850 --> 00:43:41,530
That's one, two, three, four.
914
00:43:41,530 --> 00:43:44,360
So clearly, I'm printing
it one too many times.
915
00:43:44,360 --> 00:43:48,130
So let me look back at the code here
by shrinking my terminal window.
916
00:43:48,130 --> 00:43:53,080
And let me just ask the group,
where is, in fact, the mistake?
917
00:43:53,080 --> 00:43:56,080
Or what, equivalently,
would be the solution?
918
00:43:56,080 --> 00:43:57,561
Yeah, in the middle.
919
00:43:57,561 --> 00:44:00,020
AUDIENCE: [INAUDIBLE]
920
00:44:00,020 --> 00:44:03,550
DAVID MALAN: Yeah, instead of less
than or equal to, use just less than.
921
00:44:03,550 --> 00:44:05,300
So you've got to kind
of pick a lane here.
922
00:44:05,300 --> 00:44:08,630
If you're going to start counting
from 0, you generally use less than,
923
00:44:08,630 --> 00:44:10,880
and go up to, but not through the value.
924
00:44:10,880 --> 00:44:13,970
Or if you prefer, like in the
human world, counting from 1 on up,
925
00:44:13,970 --> 00:44:17,300
you can use less than or equal
to, but you have to be consistent.
926
00:44:17,300 --> 00:44:19,790
And in general, as a
programmer, just always start
927
00:44:19,790 --> 00:44:22,610
counting from 0 if you're doing
something canonical like this.
928
00:44:22,610 --> 00:44:25,160
But the solution is,
indeed, just to change this
929
00:44:25,160 --> 00:44:27,860
by changing the greater less
than or equal to the less than.
930
00:44:27,860 --> 00:44:34,340
If I recompile this program with make
buggy0, and then do .buggy0 again--
931
00:44:34,340 --> 00:44:36,500
and let me increase the
size of my terminal window.
932
00:44:36,500 --> 00:44:39,050
Now, you see, OK,
almost the same output.
933
00:44:39,050 --> 00:44:44,330
But indeed, i starts at 0 and goes
up to, but not through, three.
934
00:44:44,330 --> 00:44:48,920
All right, so printf, in short,
can be your first diagnostic tool.
935
00:44:48,920 --> 00:44:51,500
Instead of just staring at the
screen or raising your hand--
936
00:44:51,500 --> 00:44:55,490
I mean, use printf to see, literally,
what's going on inside of your program
937
00:44:55,490 --> 00:44:57,287
by just printing out things of interest.
938
00:44:57,287 --> 00:44:59,120
And then once you've
solved the problem, you
939
00:44:59,120 --> 00:45:02,840
can go back into your code, as I'll do
here, by shrinking my terminal window.
940
00:45:02,840 --> 00:45:04,610
I'll delete the printf line.
941
00:45:04,610 --> 00:45:07,100
And now I'm ready to share
this program with the world
942
00:45:07,100 --> 00:45:08,870
or submit it as homework or the like.
943
00:45:08,870 --> 00:45:11,390
It's just meant there to be temporary.
944
00:45:11,390 --> 00:45:15,440
Any questions on printf
as a debugging tool?
945
00:45:15,440 --> 00:45:18,010
946
00:45:18,010 --> 00:45:18,510
No?
947
00:45:18,510 --> 00:45:20,970
All right, well, that
only gets us so far.
948
00:45:20,970 --> 00:45:23,430
And honestly, as your programs
grow and grow and grow,
949
00:45:23,430 --> 00:45:25,180
it's going to actually
get really annoying
950
00:45:25,180 --> 00:45:28,860
to start going in and adding printf's,
then removing them, and figuring out,
951
00:45:28,860 --> 00:45:31,860
if you've got multiple printf's,
well, which one printed what?
952
00:45:31,860 --> 00:45:34,560
It just gets messy, eventually,
to rely on printf alone.
953
00:45:34,560 --> 00:45:37,740
So being a computer
scientist, computer scientists
954
00:45:37,740 --> 00:45:41,040
have written software to
make it easier to debug code.
955
00:45:41,040 --> 00:45:44,040
That software is what we would
generally call a debugger, which
956
00:45:44,040 --> 00:45:47,040
would be the second tool of the trade
that you can use to actually solve
957
00:45:47,040 --> 00:45:48,610
problems in your code.
958
00:45:48,610 --> 00:45:52,690
Now, in the world of VS code,
there's actually a debugger built in.
959
00:45:52,690 --> 00:45:54,840
So the graphical user
interface you're about to see
960
00:45:54,840 --> 00:45:58,260
in VS Code isn't specific to CS50,
it actually comes with VS Code.
961
00:45:58,260 --> 00:46:01,230
And it supports C, and
C++, and Java, and Python,
962
00:46:01,230 --> 00:46:03,030
and lots of other languages too.
963
00:46:03,030 --> 00:46:05,640
But it's, admittedly,
a little complicated
964
00:46:05,640 --> 00:46:07,650
to just start using the debugger.
965
00:46:07,650 --> 00:46:10,200
You have to create a
configuration file and do
966
00:46:10,200 --> 00:46:13,480
some annoying steps that just get
in the way of solving real problems.
967
00:46:13,480 --> 00:46:17,070
So we have automated the process for
you of just starting the debugger.
968
00:46:17,070 --> 00:46:19,680
And thereafter, it's sort of
industry standard how you use it.
969
00:46:19,680 --> 00:46:23,380
But we save you the headache of having
to create those configuration files.
970
00:46:23,380 --> 00:46:25,330
So, suppose I want to do this.
971
00:46:25,330 --> 00:46:27,600
Suppose I want to try
to debug this program
972
00:46:27,600 --> 00:46:30,330
step by step using special software.
973
00:46:30,330 --> 00:46:31,810
Well, how can I do that?
974
00:46:31,810 --> 00:46:36,240
Well, let me propose that if I revert
this back to the original version
975
00:46:36,240 --> 00:46:40,530
where i was less than or equal
to 3, I'm pretty sure that I
976
00:46:40,530 --> 00:46:41,790
was printing too many hashes.
977
00:46:41,790 --> 00:46:43,350
So I'm going to do this--
and you might have done this
978
00:46:43,350 --> 00:46:45,160
accidentally or never at all.
979
00:46:45,160 --> 00:46:49,500
But notice if you hover over the gutter,
so to speak, in VS Code, the part of it
980
00:46:49,500 --> 00:46:52,590
all the way to the left of the
editor, you see this sort of grayed
981
00:46:52,590 --> 00:46:54,390
out red dot.
982
00:46:54,390 --> 00:46:57,240
If you click there, it
becomes a brighter red dot.
983
00:46:57,240 --> 00:46:59,670
And this represents what we're
going to call a breakpoint.
984
00:46:59,670 --> 00:47:03,090
And this is just a visual indicator that
you've put like a stop sign equivalent
985
00:47:03,090 --> 00:47:06,270
there, and you're telling the
debugger in a moment, stop
986
00:47:06,270 --> 00:47:07,350
running my code there.
987
00:47:07,350 --> 00:47:07,920
Why?
988
00:47:07,920 --> 00:47:11,610
Because I prefer to step through
my code at sort of a human speed,
989
00:47:11,610 --> 00:47:14,380
and not as computer speed
where it runs all at once.
990
00:47:14,380 --> 00:47:16,750
So I've set my breakpoint,
which is step one.
991
00:47:16,750 --> 00:47:18,580
And then step two is quite simply this.
992
00:47:18,580 --> 00:47:23,190
Instead of running the program itself,
run the command called debug50,
993
00:47:23,190 --> 00:47:26,010
and then ./buggy0.
994
00:47:26,010 --> 00:47:29,220
And now this will start
your program, but inside
995
00:47:29,220 --> 00:47:31,200
of the debugger, which
is a special program
996
00:47:31,200 --> 00:47:33,060
that smart people
wrote that will empower
997
00:47:33,060 --> 00:47:38,190
you to now step through your code line
by line, and again, at your own comfort
998
00:47:38,190 --> 00:47:38,970
pace.
999
00:47:38,970 --> 00:47:43,080
I'm going to hit Enter, some stuff's
going to happen on the screen-- whoops.
1000
00:47:43,080 --> 00:47:45,767
Notice, this is a common mistake
that I made accidentally here.
1001
00:47:45,767 --> 00:47:47,100
Looks like I've changed my code.
1002
00:47:47,100 --> 00:47:49,892
I did because I went in and changed
the less than or equal to sign.
1003
00:47:49,892 --> 00:47:52,860
So let me go ahead and
rerun make buggy0--
1004
00:47:52,860 --> 00:47:53,520
Enter.
1005
00:47:53,520 --> 00:47:55,590
Good, now let me rerun debug50--
1006
00:47:55,590 --> 00:47:57,810
Enter.
1007
00:47:57,810 --> 00:47:59,760
And now some stuff just
happened on the screen
1008
00:47:59,760 --> 00:48:03,270
and it takes a moment to get
started but once it's started you'll
1009
00:48:03,270 --> 00:48:06,010
see this you'll still see your code.
1010
00:48:06,010 --> 00:48:09,410
But you'll see this yellow highlight,
which you've probably not seen before.
1011
00:48:09,410 --> 00:48:11,910
And notice that it's specifically
highlighting the same line
1012
00:48:11,910 --> 00:48:13,440
that I set a breakpoint on.
1013
00:48:13,440 --> 00:48:13,950
Why?
1014
00:48:13,950 --> 00:48:18,870
That just means the debugger
has executed all of these lines,
1015
00:48:18,870 --> 00:48:20,670
except for line 7.
1016
00:48:20,670 --> 00:48:23,340
It has broken at-- not in a bad way.
1017
00:48:23,340 --> 00:48:27,580
But it has paused execution on line 7,
so it hasn't yet printed any hashes.
1018
00:48:27,580 --> 00:48:30,450
And you can see that-- no hashes
in the terminal window yet.
1019
00:48:30,450 --> 00:48:31,980
It's paused execution.
1020
00:48:31,980 --> 00:48:35,190
But what's interesting with
the debugger is the stuff
1021
00:48:35,190 --> 00:48:37,410
over here on the left-hand side.
1022
00:48:37,410 --> 00:48:39,960
In the debugger here,
you'll see, under variables,
1023
00:48:39,960 --> 00:48:41,910
all of your so-called local variables.
1024
00:48:41,910 --> 00:48:44,160
And we haven't really made
a distinction between local
1025
00:48:44,160 --> 00:48:45,327
and something called global.
1026
00:48:45,327 --> 00:48:48,000
But for now, local variables
just means all of the variables
1027
00:48:48,000 --> 00:48:49,390
that exist in your function.
1028
00:48:49,390 --> 00:48:52,110
So i currently has a value of 0.
1029
00:48:52,110 --> 00:48:53,410
OK, and that makes sense.
1030
00:48:53,410 --> 00:48:57,360
So now, how do I step through
my code and see what it's doing?
1031
00:48:57,360 --> 00:48:59,610
Well, at the top of
the screen here, you'll
1032
00:48:59,610 --> 00:49:02,250
see some playback icons,
kind of like a video player,
1033
00:49:02,250 --> 00:49:03,630
but they have special meaning.
1034
00:49:03,630 --> 00:49:07,892
This first one will just play the rest
of your program all the way to the end.
1035
00:49:07,892 --> 00:49:10,350
So you only click that if you've
sort of solved the problem
1036
00:49:10,350 --> 00:49:13,110
and you just want to run it
to completion like before.
1037
00:49:13,110 --> 00:49:14,370
But the next three--
1038
00:49:14,370 --> 00:49:16,920
or next two, really,
are really the juiciest.
1039
00:49:16,920 --> 00:49:19,710
The second one here, if you
hover over it, eventually,
1040
00:49:19,710 --> 00:49:21,930
you'll see that it's called Step Over.
1041
00:49:21,930 --> 00:49:25,170
Step Over means that
the debugger will run
1042
00:49:25,170 --> 00:49:28,630
this currently highlighted line of code,
but it's not going to dive into it.
1043
00:49:28,630 --> 00:49:30,660
So if it's a function
like printf, it's not
1044
00:49:30,660 --> 00:49:32,827
going to start stepping
through printf line by line.
1045
00:49:32,827 --> 00:49:33,327
Why?
1046
00:49:33,327 --> 00:49:36,420
Because I can pretty much assume
printf, written decades ago, is correct.
1047
00:49:36,420 --> 00:49:38,050
Problem's probably with me.
1048
00:49:38,050 --> 00:49:42,690
But this next line, if I did really
want to step into the printf code
1049
00:49:42,690 --> 00:49:46,110
to figure out how it works or find some
problem in it all these years later,
1050
00:49:46,110 --> 00:49:48,810
you can step into printf, and
then the screen would change,
1051
00:49:48,810 --> 00:49:50,910
and you'd see each of
the lines for printf,
1052
00:49:50,910 --> 00:49:54,250
line by line-- at least if you have
the source code for printf installed.
1053
00:49:54,250 --> 00:49:56,490
All right, I'm going to use
the first one, Step Over.
1054
00:49:56,490 --> 00:49:59,130
And watch as the yellow highlight moves.
1055
00:49:59,130 --> 00:50:03,060
And watch as, in the terminal
window, there's a hash symbol.
1056
00:50:03,060 --> 00:50:03,780
Here we go.
1057
00:50:03,780 --> 00:50:05,130
There's one hash.
1058
00:50:05,130 --> 00:50:07,230
Now, notice line 5 is highlighted.
1059
00:50:07,230 --> 00:50:09,480
That means it has paused on line 5.
1060
00:50:09,480 --> 00:50:11,350
Line 5 has not yet been executed.
1061
00:50:11,350 --> 00:50:12,600
So what does that mean?
1062
00:50:12,600 --> 00:50:16,320
The value of i, per the top
left-hand corner, is still 0.
1063
00:50:16,320 --> 00:50:18,920
But as soon as I click
Step Over again, watch
1064
00:50:18,920 --> 00:50:24,470
what happens at the top left, where
i is a variable on the screen.
1065
00:50:24,470 --> 00:50:26,420
Now i-- and it flashed briefly--
1066
00:50:26,420 --> 00:50:27,920
has a value of 1.
1067
00:50:27,920 --> 00:50:30,650
And now if I step over again,
watch the terminal window.
1068
00:50:30,650 --> 00:50:32,120
There's my second hash.
1069
00:50:32,120 --> 00:50:36,380
Now, let me click Step Over on for
loop, watch the variable at top left.
1070
00:50:36,380 --> 00:50:38,567
Now 1 goes to 2.
1071
00:50:38,567 --> 00:50:39,650
Now let me click it again.
1072
00:50:39,650 --> 00:50:43,220
Third hash-- and here's where the
logical error is perhaps revealed.
1073
00:50:43,220 --> 00:50:45,210
Let me go ahead and step over the loop.
1074
00:50:45,210 --> 00:50:46,520
Now i is 3.
1075
00:50:46,520 --> 00:50:49,280
Wait a minute, I'm still
going to print out a hash.
1076
00:50:49,280 --> 00:50:49,810
There it is.
1077
00:50:49,810 --> 00:50:50,810
There's the fourth hash.
1078
00:50:50,810 --> 00:50:53,852
And at this point, hopefully, the
light bulb, proverbially, has gone off.
1079
00:50:53,852 --> 00:50:55,020
I realize, oh, I screwed up.
1080
00:50:55,020 --> 00:50:58,580
I can either stop the program
altogether with the red square,
1081
00:50:58,580 --> 00:51:01,100
or I can just let it run all
the way to the end, which
1082
00:51:01,100 --> 00:51:02,493
just terminates everything.
1083
00:51:02,493 --> 00:51:05,660
At this point, I just want to get back
into my code and start fixing things.
1084
00:51:05,660 --> 00:51:07,700
And you can close, for
instance, as I will here,
1085
00:51:07,700 --> 00:51:10,670
the File Explorer, just to
hide the panel that opened.
1086
00:51:10,670 --> 00:51:12,320
So that's debug50.
1087
00:51:12,320 --> 00:51:15,920
But it's not a CS50 thing, that just
starts the debugger for you, which
1088
00:51:15,920 --> 00:51:19,520
is something you'd find in most any
programming environment nowadays.
1089
00:51:19,520 --> 00:51:23,670
Questions on debugging?
1090
00:51:23,670 --> 00:51:24,170
Questions?
1091
00:51:24,170 --> 00:51:24,670
Yeah?
1092
00:51:24,670 --> 00:51:27,295
AUDIENCE: Where does it tell
you where it went wrong?
1093
00:51:27,295 --> 00:51:28,420
DAVID MALAN: Good question.
1094
00:51:28,420 --> 00:51:30,310
Where does it tell you
where it went wrong?
1095
00:51:30,310 --> 00:51:33,190
So, sadly, it does not
tell you any of that.
1096
00:51:33,190 --> 00:51:37,570
The onus is still on you, the human,
to use this tool productively to walk
1097
00:51:37,570 --> 00:51:39,580
through your code at a saner pace.
1098
00:51:39,580 --> 00:51:42,070
But your brain is the one
that still needs to solve it.
1099
00:51:42,070 --> 00:51:45,190
And I don't doubt, down the line,
with artificial intelligence and more,
1100
00:51:45,190 --> 00:51:47,350
programs like this will
get all the more helpful,
1101
00:51:47,350 --> 00:51:49,160
and start answering
questions like that for us.
1102
00:51:49,160 --> 00:51:51,340
And there are other tools we'll
introduce you this semester
1103
00:51:51,340 --> 00:51:52,990
that are even more powerful than this.
1104
00:51:52,990 --> 00:51:56,770
But for now, it's just a tool,
really, to slow things down and not
1105
00:51:56,770 --> 00:51:57,820
have to change your code.
1106
00:51:57,820 --> 00:52:01,420
The fact that I had that panel on the
left that just showed me i's changing
1107
00:52:01,420 --> 00:52:04,150
value is just an alternative
to printf, and I can
1108
00:52:04,150 --> 00:52:06,820
step through it a little more slowly.
1109
00:52:06,820 --> 00:52:10,580
Other questions on debugging?
1110
00:52:10,580 --> 00:52:11,080
No?
1111
00:52:11,080 --> 00:52:14,950
Let me show you one final
example with this debugger here.
1112
00:52:14,950 --> 00:52:16,750
And this one, too, I wrote in advance.
1113
00:52:16,750 --> 00:52:18,730
Let me close buggy0.c.
1114
00:52:18,730 --> 00:52:22,327
And let me open up buggy1.c,
my second version thereof.
1115
00:52:22,327 --> 00:52:24,160
Let me close my terminal
window for a second
1116
00:52:24,160 --> 00:52:26,350
and give you a quick tour
of this program, which
1117
00:52:26,350 --> 00:52:28,030
similarly, has a mistake.
1118
00:52:28,030 --> 00:52:32,830
Now, at the top of this program, some
familiar includes, cs50.h and stdio.h.
1119
00:52:32,830 --> 00:52:34,730
This is not something we've seen before.
1120
00:52:34,730 --> 00:52:36,190
It's specific to this example--
1121
00:52:36,190 --> 00:52:38,830
a function called getNegativeInt.
1122
00:52:38,830 --> 00:52:41,043
Takes no arguments, and
it returns an integer.
1123
00:52:41,043 --> 00:52:41,710
What does it do?
1124
00:52:41,710 --> 00:52:45,040
It literally gets a negative
integer, ideally, from the user.
1125
00:52:45,040 --> 00:52:47,200
Fun fact, though, it doesn't correctly.
1126
00:52:47,200 --> 00:52:50,090
That's the bug. getNegativeInt
is broken at the moment.
1127
00:52:50,090 --> 00:52:51,470
So what does main do?
1128
00:52:51,470 --> 00:52:54,130
Well, main just calls this
function, passing in nothing
1129
00:52:54,130 --> 00:52:55,690
in parentheses, no inputs.
1130
00:52:55,690 --> 00:52:58,240
And it stores the return value in i.
1131
00:52:58,240 --> 00:53:00,260
And then it just prints
out i on the screen.
1132
00:53:00,260 --> 00:53:03,910
So honestly, just by eyeballing
this, I feel comfortable enough
1133
00:53:03,910 --> 00:53:06,365
with programming in C,
I think main is correct.
1134
00:53:06,365 --> 00:53:07,990
Let me just stipulate, main is correct.
1135
00:53:07,990 --> 00:53:09,698
But there is going to
be a bug down here.
1136
00:53:09,698 --> 00:53:11,210
Now, what's the bug down here?
1137
00:53:11,210 --> 00:53:14,830
Well, let me look at
getNegativeInt's implementation.
1138
00:53:14,830 --> 00:53:18,970
Notice, this first line, 12, is
identical to the prototype up here.
1139
00:53:18,970 --> 00:53:22,690
The prototype is sort of
stupidly required up here
1140
00:53:22,690 --> 00:53:25,300
because C reads things top
to bottom, left to right--
1141
00:53:25,300 --> 00:53:26,690
the compiler technically does.
1142
00:53:26,690 --> 00:53:29,680
So if you reference
getNegativeInt here, but you
1143
00:53:29,680 --> 00:53:33,490
don't implement it until down here,
and you haven't told C in advance
1144
00:53:33,490 --> 00:53:36,820
that it will exist, again, you
get the error we saw last week.
1145
00:53:36,820 --> 00:53:39,010
All right, so how does
getNegativeInt work?
1146
00:53:39,010 --> 00:53:40,960
We declare a variable called n.
1147
00:53:40,960 --> 00:53:43,540
We've got to do while
loop that does what?
1148
00:53:43,540 --> 00:53:47,110
It uses getInt, which comes with
the cs50 library, per last week.
1149
00:53:47,110 --> 00:53:49,480
It prompts the user for
negative integer, quote unquote,
1150
00:53:49,480 --> 00:53:51,670
and stores the value in n.
1151
00:53:51,670 --> 00:53:56,800
I then do all of this while
n is less than 0, right?
1152
00:53:56,800 --> 00:54:00,400
Remember, we used to do while loop last
week to make sure the human cooperates
1153
00:54:00,400 --> 00:54:03,970
and doesn't give us the wrong type
of value, be it positive or negative
1154
00:54:03,970 --> 00:54:04,970
or something else.
1155
00:54:04,970 --> 00:54:06,400
And then we return n.
1156
00:54:06,400 --> 00:54:07,570
And there's some subtleties.
1157
00:54:07,570 --> 00:54:12,970
Anyone recall-- or have an intuition
for why I've declared n on line 14,
1158
00:54:12,970 --> 00:54:15,790
instead of line 17?
1159
00:54:15,790 --> 00:54:17,620
This is a C specific thing.
1160
00:54:17,620 --> 00:54:23,465
AUDIENCE: [INAUDIBLE]
1161
00:54:23,465 --> 00:54:24,340
DAVID MALAN: Exactly.
1162
00:54:24,340 --> 00:54:27,610
There's this notion of scope in C. And
we'll continue to see this over time,
1163
00:54:27,610 --> 00:54:32,590
whereby, a variable only exists
inside of the most recent curly braces
1164
00:54:32,590 --> 00:54:33,560
that you've opened.
1165
00:54:33,560 --> 00:54:36,910
So if I've declared n here
on line 14, I can use it
1166
00:54:36,910 --> 00:54:40,900
anywhere between lines 13 and 21 because
those are the nearest curly braces.
1167
00:54:40,900 --> 00:54:43,540
If by contrast, as you note,
if I instead said this,
1168
00:54:43,540 --> 00:54:49,180
int n equals getInt and so forth,
and didn't have the current line 14,
1169
00:54:49,180 --> 00:54:53,470
well, n would exist inside of these
curly braces, but not here, which
1170
00:54:53,470 --> 00:54:55,340
is too late, and definitely not here.
1171
00:54:55,340 --> 00:54:59,480
So you just have to declare it first,
and then use and reuse it as such.
1172
00:54:59,480 --> 00:55:01,545
Now, let me just show
you how I can debug this.
1173
00:55:01,545 --> 00:55:03,170
But let me show you the symptoms first.
1174
00:55:03,170 --> 00:55:04,930
Let me open my terminal window.
1175
00:55:04,930 --> 00:55:06,970
Let me run make buggy1.
1176
00:55:06,970 --> 00:55:11,710
Compiles OK, so it's not something
silly like a semicolon. ./buggy1,
1177
00:55:11,710 --> 00:55:13,660
and I'm asked for a negative integer.
1178
00:55:13,660 --> 00:55:15,280
All right, let me give it negative 1--
1179
00:55:15,280 --> 00:55:16,710
Enter.
1180
00:55:16,710 --> 00:55:19,920
Well, the main function is
supposed to print out what I typed,
1181
00:55:19,920 --> 00:55:20,880
but it clearly didn't.
1182
00:55:20,880 --> 00:55:21,880
It's prompting me again.
1183
00:55:21,880 --> 00:55:23,830
All right, so maybe
it'll like negative 2.
1184
00:55:23,830 --> 00:55:24,330
No?
1185
00:55:24,330 --> 00:55:26,380
Maybe negative 3.
1186
00:55:26,380 --> 00:55:27,570
50?
1187
00:55:27,570 --> 00:55:29,160
OK, so it's definitely broken, right?
1188
00:55:29,160 --> 00:55:31,528
It kind of seems logically
to be doing the opposite.
1189
00:55:31,528 --> 00:55:33,820
Now, you can perhaps see why
this is happening already.
1190
00:55:33,820 --> 00:55:37,170
These are deliberately simple
programs for demonstrations sake.
1191
00:55:37,170 --> 00:55:38,470
But let's do this.
1192
00:55:38,470 --> 00:55:41,037
Let me go ahead and set
a breakpoint in main,
1193
00:55:41,037 --> 00:55:42,870
even though I'm pretty
sure main is correct.
1194
00:55:42,870 --> 00:55:45,810
But it just helps me start my
thought process-- start with main,
1195
00:55:45,810 --> 00:55:47,010
and then take it from there.
1196
00:55:47,010 --> 00:55:51,840
Let me run now, debug50 ./buggy1--
1197
00:55:51,840 --> 00:55:52,920
Enter.
1198
00:55:52,920 --> 00:55:53,700
And let's see.
1199
00:55:53,700 --> 00:55:56,880
With that breakpoint now, the GUI
is going to reconfigure itself.
1200
00:55:56,880 --> 00:56:00,360
It's going to pause on line 8 because
that's the first interesting line
1201
00:56:00,360 --> 00:56:01,260
inside of main.
1202
00:56:01,260 --> 00:56:03,780
So I could have just put the
breakpoint on line 8 too.
1203
00:56:03,780 --> 00:56:06,480
It's smart enough to know
that if I set it on 6,
1204
00:56:06,480 --> 00:56:09,570
you really mean line 8 because
that's the first actual line of code.
1205
00:56:09,570 --> 00:56:11,280
And watch, now, what happens.
1206
00:56:11,280 --> 00:56:15,780
If I step over this line, notice
that i, which at the moment
1207
00:56:15,780 --> 00:56:18,090
seems to have a default value of 0--
1208
00:56:18,090 --> 00:56:19,470
more on that another time.
1209
00:56:19,470 --> 00:56:24,750
But if I click Step Over like before,
I'm prompted for a negative integer.
1210
00:56:24,750 --> 00:56:25,750
Let me type negative 1--
1211
00:56:25,750 --> 00:56:27,300
Enter.
1212
00:56:27,300 --> 00:56:32,470
And now, notice, there's no
additional yellow highlight.
1213
00:56:32,470 --> 00:56:32,970
Why?
1214
00:56:32,970 --> 00:56:35,160
Where am I currently stuck, logically?
1215
00:56:35,160 --> 00:56:37,937
AUDIENCE: [INAUDIBLE]
1216
00:56:37,937 --> 00:56:40,770
DAVID MALAN: Yeah, just logically,
I must be in that do, while loop.
1217
00:56:40,770 --> 00:56:43,560
And even if you don't understand it,
like that's the only explanation.
1218
00:56:43,560 --> 00:56:46,143
If you keep getting prompted,
surely, there's a loop going on.
1219
00:56:46,143 --> 00:56:49,270
There's only one loop in my code,
so there's probably a problem there.
1220
00:56:49,270 --> 00:56:52,900
So I can't just set a breakpoint in
main, and then wait for this to work.
1221
00:56:52,900 --> 00:56:53,610
So let me just--
1222
00:56:53,610 --> 00:56:56,280
let me stop this with the red square.
1223
00:56:56,280 --> 00:56:58,860
And let me think, all
right, instead of--
1224
00:56:58,860 --> 00:57:02,770
I can still set my breakpoint in main,
but let me rerun the debugger instead.
1225
00:57:02,770 --> 00:57:05,470
And this time, not step
over that line of code,
1226
00:57:05,470 --> 00:57:07,930
let me step into that line of code.
1227
00:57:07,930 --> 00:57:09,270
So watch what happens now.
1228
00:57:09,270 --> 00:57:11,430
Instead of clicking
the second icon here,
1229
00:57:11,430 --> 00:57:14,610
let me click the third, whose
name is, indeed, Step Into.
1230
00:57:14,610 --> 00:57:17,880
And watch as the yellow highlight
does not move to line 9.
1231
00:57:17,880 --> 00:57:21,930
It dives into line 8--
the function on line 8,
1232
00:57:21,930 --> 00:57:25,170
thereby, bringing me down to line 17.
1233
00:57:25,170 --> 00:57:28,270
It's kind of going down
into that next function.
1234
00:57:28,270 --> 00:57:31,422
Now, it didn't bother pausing
on line 12 or 13 or 14
1235
00:57:31,422 --> 00:57:34,380
because there's nothing intellectually
interesting there happening yet.
1236
00:57:34,380 --> 00:57:37,080
The juicy part really starts,
it would seem, in line 17.
1237
00:57:37,080 --> 00:57:40,980
So, now notice, n is my
variable at the top left.
1238
00:57:40,980 --> 00:57:42,270
If I click--
1239
00:57:42,270 --> 00:57:45,420
I don't want to click
Step Into now, though.
1240
00:57:45,420 --> 00:57:48,090
What would go wrong if
I click on Step Into--
1241
00:57:48,090 --> 00:57:52,480
or what would it do that I
don't think I want to do?
1242
00:57:52,480 --> 00:57:52,990
Yeah?
1243
00:57:52,990 --> 00:57:54,755
AUDIENCE: [INAUDIBLE]
1244
00:57:54,755 --> 00:57:56,630
DAVID MALAN: Yeah, it
would step into getInt.
1245
00:57:56,630 --> 00:57:59,620
But I'd like to think that the
staff's version of getInt is correct,
1246
00:57:59,620 --> 00:58:02,120
and that's not our problem
today, so I want to step over it.
1247
00:58:02,120 --> 00:58:06,710
And watch now at top left that
nothing happens yet to the value of n
1248
00:58:06,710 --> 00:58:09,530
until I go to the terminal window
now, and I type in something
1249
00:58:09,530 --> 00:58:10,670
like negative 1.
1250
00:58:10,670 --> 00:58:14,600
Now notice, it jumps to line 19,
which is the next interesting line.
1251
00:58:14,600 --> 00:58:17,240
Top left, n, indeed, is negative 1.
1252
00:58:17,240 --> 00:58:19,160
And here's where I can
now pause as a human
1253
00:58:19,160 --> 00:58:22,760
and think, all right, so
while n is less than 0.
1254
00:58:22,760 --> 00:58:25,280
All right, n, per the top
left corner, is negative 1.
1255
00:58:25,280 --> 00:58:27,830
So all right, while
negative 1 is less than 0,
1256
00:58:27,830 --> 00:58:29,780
well, obviously that's
true mathematically.
1257
00:58:29,780 --> 00:58:30,930
So what's going to happen?
1258
00:58:30,930 --> 00:58:32,130
It's a do while loop.
1259
00:58:32,130 --> 00:58:37,285
So when I click on Step Over again,
it's going to go to this line
1260
00:58:37,285 --> 00:58:39,410
because it's at the end of
the inside of that loop.
1261
00:58:39,410 --> 00:58:42,710
And now here, it's looping
through again and again.
1262
00:58:42,710 --> 00:58:44,240
All right, let me do this once more.
1263
00:58:44,240 --> 00:58:45,980
I'm going to step over, all right?
1264
00:58:45,980 --> 00:58:48,777
I'm going to type in negative 2,
and it's the exact same thing.
1265
00:58:48,777 --> 00:58:50,360
Now is my chance, on the yellow line--
1266
00:58:50,360 --> 00:58:51,260
OK, wait a minute.
1267
00:58:51,260 --> 00:58:53,450
Negative 2 is obviously less than 0.
1268
00:58:53,450 --> 00:58:56,080
Let me try this one more time.
1269
00:58:56,080 --> 00:58:57,570
Click it once here.
1270
00:58:57,570 --> 00:58:59,040
All right, let me give it 50.
1271
00:58:59,040 --> 00:59:05,020
And now, OK, while 50 is
less than 0, that's not true,
1272
00:59:05,020 --> 00:59:08,970
so the loop is over because it's not
going to do it while 50 is less than 0.
1273
00:59:08,970 --> 00:59:09,730
That's not true.
1274
00:59:09,730 --> 00:59:12,240
So now watch, when I
click Step Over once more,
1275
00:59:12,240 --> 00:59:15,810
it then finishes the loop, even
though there's nothing more to do.
1276
00:59:15,810 --> 00:59:17,610
It's now about to return n.
1277
00:59:17,610 --> 00:59:21,360
It jumps back up to main,
where I left off on line 9.
1278
00:59:21,360 --> 00:59:23,778
It now prints, in my terminal
window, the number 50.
1279
00:59:23,778 --> 00:59:26,070
And hopefully, at this point,
to your question earlier,
1280
00:59:26,070 --> 00:59:30,700
my human brain has realized, oh, I'm
an idiot, like I flipped my sign there.
1281
00:59:30,700 --> 00:59:32,460
So I probably-- let me stop this.
1282
00:59:32,460 --> 00:59:34,780
I probably want to do
something like this.
1283
00:59:34,780 --> 00:59:38,860
If the goal is to get a negative
integer, I probably want to say,
1284
00:59:38,860 --> 00:59:45,070
while n is, for instance, greater
than or equal to 0 would work.
1285
00:59:45,070 --> 00:59:48,630
So while n is greater than or
equal to 0, keep doing this.
1286
00:59:48,630 --> 00:59:50,430
And that's the logic
I wanted to express.
1287
00:59:50,430 --> 00:59:53,733
So the debugger just saves me from
staring at the screen, raising a hand,
1288
00:59:53,733 --> 00:59:54,900
sort of asking someone else.
1289
00:59:54,900 --> 00:59:58,650
At least in this case, it allows me
to go through it at a healthier pace.
1290
00:59:58,650 --> 01:00:03,000
Questions now on debug50, which should
be your new friend, even if it's not
1291
01:00:03,000 --> 01:00:04,940
your first instinct after printf?
1292
01:00:04,940 --> 01:00:07,690
1293
01:00:07,690 --> 01:00:09,190
Any questions on debug50?
1294
01:00:09,190 --> 01:00:09,730
No?
1295
01:00:09,730 --> 01:00:13,960
All right, well, there's one last
technique we can equip you with here.
1296
01:00:13,960 --> 01:00:17,470
And that is, in addition to
printf and a debugger, no joke,
1297
01:00:17,470 --> 01:00:21,400
a rubber duck is actually a
reasonably recommended solution
1298
01:00:21,400 --> 01:00:22,720
to finding bugs in your code.
1299
01:00:22,720 --> 01:00:24,640
To your question earlier,
the duck two is not
1300
01:00:24,640 --> 01:00:26,390
going to solve the problem for you.
1301
01:00:26,390 --> 01:00:29,710
But if you've wondered why this
little guy has been here for so long,
1302
01:00:29,710 --> 01:00:32,080
there's this technique, has
its own Wikipedia article
1303
01:00:32,080 --> 01:00:33,760
of called rubber duck debugging.
1304
01:00:33,760 --> 01:00:37,390
The idea of which is that if
you're home in your dorm room,
1305
01:00:37,390 --> 01:00:39,520
wrestling with some bug
in your code, printf
1306
01:00:39,520 --> 01:00:42,820
didn't quite reveal the source to
you, debugger isn't really helping,
1307
01:00:42,820 --> 01:00:46,960
honestly, maybe it would help to just
sound out what problem you're having.
1308
01:00:46,960 --> 01:00:50,260
Similar to going to office hours,
talking to a TA or a professor,
1309
01:00:50,260 --> 01:00:52,030
just walking through
your problems because
1310
01:00:52,030 --> 01:00:54,730
in sort of talking to
the duck about the fact
1311
01:00:54,730 --> 01:01:00,550
that you're doing this while n is
less than 0, and then if it is--
1312
01:01:00,550 --> 01:01:01,180
wait a minute.
1313
01:01:01,180 --> 01:01:03,820
I'm an idiot, not just for
talking to the rubber duck.
1314
01:01:03,820 --> 01:01:05,980
You realize, hopefully,
in expressing yourself,
1315
01:01:05,980 --> 01:01:09,910
literally verbally, you probably
will hear with non-zero probability,
1316
01:01:09,910 --> 01:01:11,860
like some illogic in your statement.
1317
01:01:11,860 --> 01:01:16,430
And just by sounding things out, you'll
realize like, oh, that's my problem.
1318
01:01:16,430 --> 01:01:19,720
And so, frankly, if you have roommates,
you can also use a roommate for this.
1319
01:01:19,720 --> 01:01:21,700
But the rubber duck is
just sort of a go-to
1320
01:01:21,700 --> 01:01:24,700
when your roommates have no
interest in your C problem set,
1321
01:01:24,700 --> 01:01:28,150
talking something through that as such.
1322
01:01:28,150 --> 01:01:29,933
And this is an invaluable technique.
1323
01:01:29,933 --> 01:01:32,350
I admittedly tend not to do
it so much with a rubber duck,
1324
01:01:32,350 --> 01:01:34,510
but ideally with colleagues,
human colleagues.
1325
01:01:34,510 --> 01:01:38,260
But just talking through things
often will help you just realize,
1326
01:01:38,260 --> 01:01:40,360
oh, I said something illogical.
1327
01:01:40,360 --> 01:01:41,860
Now I can go back to the code.
1328
01:01:41,860 --> 01:01:44,650
So don't solve problems
by staring at your screen
1329
01:01:44,650 --> 01:01:46,240
endlessly for minutes, for hours.
1330
01:01:46,240 --> 01:01:48,100
At that point, it's
time for a break, time
1331
01:01:48,100 --> 01:01:50,475
to walk away, time to talk to
the duck, if you've already
1332
01:01:50,475 --> 01:01:52,900
exhausted some of those other tools.
1333
01:01:52,900 --> 01:01:55,330
As an aside, on your way out
today at the end of class,
1334
01:01:55,330 --> 01:01:59,020
we have, clearly, plenty
of rubber ducks for you.
1335
01:01:59,020 --> 01:02:01,600
And it's become a thing
over the years, at least
1336
01:02:01,600 --> 01:02:05,770
among some, to bring the duck with them
when they travel and send us photos.
1337
01:02:05,770 --> 01:02:10,480
Here, for instance, is CS50's
rubber duck debugger, A.K.A. DDB,
1338
01:02:10,480 --> 01:02:15,940
for Duck Debugger, which is a pun on
a geekier program called GDB, the GNU
1339
01:02:15,940 --> 01:02:18,740
Debugger, which is an actual
piece of software for debugging.
1340
01:02:18,740 --> 01:02:25,270
This is CS50's debugger in the hills
of Puerto Rico, also, here on the sea.
1341
01:02:25,270 --> 01:02:28,310
He made its way to San Francisco here.
1342
01:02:28,310 --> 01:02:30,640
Also, down by Fisherman's
Wharf by the sea lions.
1343
01:02:30,640 --> 01:02:31,660
Familiar?
1344
01:02:31,660 --> 01:02:34,570
Here at Stanford, where there's
a William Gates Computer Science
1345
01:02:34,570 --> 01:02:38,950
building for computer science,
down the road in SF at Google.
1346
01:02:38,950 --> 01:02:41,650
And this is the Trevi Fountain in Rome.
1347
01:02:41,650 --> 01:02:43,810
And lastly, the Colosseum.
1348
01:02:43,810 --> 01:02:46,990
So we'll be curious to see in the coming
years where your duck two travels.
1349
01:02:46,990 --> 01:02:49,120
So that, then, was quite a bit.
1350
01:02:49,120 --> 01:02:51,850
Why don't we go ahead here and
take a short 5 minute break?
1351
01:02:51,850 --> 01:02:52,760
No snacks yet.
1352
01:02:52,760 --> 01:02:54,400
You're welcome to get up or sit down.
1353
01:02:54,400 --> 01:02:56,620
We'll return in about five.
1354
01:02:56,620 --> 01:03:00,020
All right, so we are back.
1355
01:03:00,020 --> 01:03:04,000
And if the goal, ultimately, today is
to have a better understanding of things
1356
01:03:04,000 --> 01:03:06,940
like strings so that we can
solve problems with text,
1357
01:03:06,940 --> 01:03:09,190
let's consider some
simpler types of data
1358
01:03:09,190 --> 01:03:11,290
first, how we might
represent those, and then
1359
01:03:11,290 --> 01:03:14,290
see if that doesn't lead us to
a discovery as to how strings,
1360
01:03:14,290 --> 01:03:17,330
and just today's modern software
is using things like that.
1361
01:03:17,330 --> 01:03:21,850
So when we talked on week zero
about representation of data,
1362
01:03:21,850 --> 01:03:25,930
we had different ways of doing it,
in terms of binary and decimal,
1363
01:03:25,930 --> 01:03:27,640
and unary even.
1364
01:03:27,640 --> 01:03:30,520
When we started talking about
the same last week in code,
1365
01:03:30,520 --> 01:03:33,980
we started talking about
data types instead.
1366
01:03:33,980 --> 01:03:36,820
And these data types
were a way of telling
1367
01:03:36,820 --> 01:03:40,000
the computer, like do you want an
integer, do you want a character,
1368
01:03:40,000 --> 01:03:44,260
do you want a floating point value,
like a real number, or even a string,
1369
01:03:44,260 --> 01:03:45,070
as we've seen?
1370
01:03:45,070 --> 01:03:47,350
But it turns out that
computers, of course,
1371
01:03:47,350 --> 01:03:49,930
only have finite amounts of resources.
1372
01:03:49,930 --> 01:03:53,740
Your computer only has a
fixed amount of memory or RAM.
1373
01:03:53,740 --> 01:03:55,910
And that actually has very
real world implications.
1374
01:03:55,910 --> 01:03:59,630
So for instance, here are some of
the data types we've seen thus far.
1375
01:03:59,630 --> 01:04:04,090
And it turns out that each of
these in C has a specific number
1376
01:04:04,090 --> 01:04:05,650
of bits allocated to it.
1377
01:04:05,650 --> 01:04:08,350
Now, admittedly, this
can vary by system.
1378
01:04:08,350 --> 01:04:10,850
It's not so much the case
nowadays, but for many years,
1379
01:04:10,850 --> 01:04:13,100
for decades, computers were
getting better and better.
1380
01:04:13,100 --> 01:04:15,392
The earliest computers
might have used fewer bits
1381
01:04:15,392 --> 01:04:16,600
for some of these data types.
1382
01:04:16,600 --> 01:04:18,663
More modern computers
might use more bits.
1383
01:04:18,663 --> 01:04:21,830
So the numbers you're about to see are
pretty much where we are present day.
1384
01:04:21,830 --> 01:04:25,030
So when it comes to
these data types, a bool,
1385
01:04:25,030 --> 01:04:29,020
which is true or false, somewhat
curiously, uses a whole byte,
1386
01:04:29,020 --> 01:04:32,380
even though that's way overkill
because for a bool, true or false,
1387
01:04:32,380 --> 01:04:33,940
you, of course, only need one bit.
1388
01:04:33,940 --> 01:04:36,520
But it turns out, even
though it's wasteful to use
1389
01:04:36,520 --> 01:04:39,938
eight bits, or one byte, just
to represent true or false,
1390
01:04:39,938 --> 01:04:41,230
it's just easier for computers.
1391
01:04:41,230 --> 01:04:42,820
So a bool tends to be one byte.
1392
01:04:42,820 --> 01:04:47,590
An int, which we've been using a lot,
uses 4 bytes, typically, or 32 bits.
1393
01:04:47,590 --> 01:04:50,590
And if I do some quick math
from week zero, with 32 bits,
1394
01:04:50,590 --> 01:04:54,040
you have 4 billion
possible values, roughly.
1395
01:04:54,040 --> 01:04:56,290
But if you want to represent
positive and negative,
1396
01:04:56,290 --> 01:04:59,710
that means you can represent roughly
negative 2 billion, all the way up
1397
01:04:59,710 --> 01:05:01,020
to positive 2 billion.
1398
01:05:01,020 --> 01:05:02,770
So that's the range,
typically, with ints.
1399
01:05:02,770 --> 01:05:06,820
If that's too few numbers for you,
turns out there's things called longs.
1400
01:05:06,820 --> 01:05:10,120
And longs use 64 bits,
which allow you to have
1401
01:05:10,120 --> 01:05:13,220
like a quintillion
number of possibilities,
1402
01:05:13,220 --> 01:05:15,730
which is a lot, certainly,
a lot more than 4 billion.
1403
01:05:15,730 --> 01:05:17,410
So sometimes you might use a long.
1404
01:05:17,410 --> 01:05:18,670
But even that's finite.
1405
01:05:18,670 --> 01:05:21,640
And so as we discussed
at the end of last week,
1406
01:05:21,640 --> 01:05:23,980
bad things can happen if
you make certain assumptions
1407
01:05:23,980 --> 01:05:27,220
as to the data because of things
like integer overflow or the like,
1408
01:05:27,220 --> 01:05:28,330
where things wrap around.
1409
01:05:28,330 --> 01:05:31,538
Then there's a float, which is a real
number, something with a decimal point.
1410
01:05:31,538 --> 01:05:36,040
By convention, it's 4 bytes or 32
bits, which gives you, in short,
1411
01:05:36,040 --> 01:05:37,810
only a specific amount of precision.
1412
01:05:37,810 --> 01:05:41,620
It doesn't necessarily dictate how many
numbers to the left or to the right.
1413
01:05:41,620 --> 01:05:45,250
In the aggregate,
ultimately, you have though,
1414
01:05:45,250 --> 01:05:47,650
4 billion possible permutations still.
1415
01:05:47,650 --> 01:05:50,110
If you need more precision
for scientific, for medical,
1416
01:05:50,110 --> 01:05:54,790
for financial applications, you
might use 8 bytes, A.K.A. a double,
1417
01:05:54,790 --> 01:05:57,700
which just gives you
more digits of precision.
1418
01:05:57,700 --> 01:06:01,360
They eventually get imprecise per
the example we looked at last week,
1419
01:06:01,360 --> 01:06:03,610
but it at least gets you
further down the line.
1420
01:06:03,610 --> 01:06:07,930
As an aside, in really, really
important applications, in finance,
1421
01:06:07,930 --> 01:06:10,030
in medicine, in military
operations, and the
1422
01:06:10,030 --> 01:06:12,640
like where you really can't
have rounding errors--
1423
01:06:12,640 --> 01:06:17,470
long story short, humans have developed
libraries in C and other languages
1424
01:06:17,470 --> 01:06:19,317
that use more, even, than 8 bytes.
1425
01:06:19,317 --> 01:06:22,150
So there are solutions to these
problems, but they're always finite.
1426
01:06:22,150 --> 01:06:24,070
You have to pick an upper bound.
1427
01:06:24,070 --> 01:06:27,070
Then there's char, which we saw
briefly last week when I asked
1428
01:06:27,070 --> 01:06:29,470
the user for y or n, for yes or no.
1429
01:06:29,470 --> 01:06:32,470
And then there's a string, which I'm
going to propose as a question mark
1430
01:06:32,470 --> 01:06:34,360
because a string totally depends.
1431
01:06:34,360 --> 01:06:35,380
Like, Hi!
1432
01:06:35,380 --> 01:06:38,890
H-I, exclamation point,
would seem to be three bytes.
1433
01:06:38,890 --> 01:06:41,140
D-A-V-I-D, would seem to be five.
1434
01:06:41,140 --> 01:06:45,400
So the strings, clearly, are variable
based on what you or the human type in.
1435
01:06:45,400 --> 01:06:48,140
So we'll see what this
means, though, in just a bit.
1436
01:06:48,140 --> 01:06:51,580
This though, is the thing inside
of your Mac, your PC, your phone.
1437
01:06:51,580 --> 01:06:53,680
It might not look exactly
like this, but this is
1438
01:06:53,680 --> 01:06:56,187
a memory module for a modern computer.
1439
01:06:56,187 --> 01:06:57,520
And let's go ahead and use this.
1440
01:06:57,520 --> 01:06:59,920
Really, it's just representative
of the finite amount of memory
1441
01:06:59,920 --> 01:07:01,360
that any computer, indeed, has.
1442
01:07:01,360 --> 01:07:06,160
Let's zoom in on one of these little
black chips on the circuit board here.
1443
01:07:06,160 --> 01:07:10,180
Zoom in, and let me propose that
this rectangle really represents
1444
01:07:10,180 --> 01:07:14,380
some number of bytes, like tucked
inside of this little black circuit
1445
01:07:14,380 --> 01:07:16,750
on the board is maybe, I
don't know, a gigabyte,
1446
01:07:16,750 --> 01:07:19,300
a billion bytes, maybe it's 100
bytes-- some number of bytes.
1447
01:07:19,300 --> 01:07:21,258
It totally depends on
the computer and how much
1448
01:07:21,258 --> 01:07:22,700
you paid for the stick of memory.
1449
01:07:22,700 --> 01:07:27,850
But if there's a finite number of
bytes physically implemented somehow
1450
01:07:27,850 --> 01:07:30,327
digitally inside of this
hardware, well, then it
1451
01:07:30,327 --> 01:07:32,410
stands to reason that we
could number those bytes.
1452
01:07:32,410 --> 01:07:36,940
We can just arbitrarily decide that
the top left corner is byte number
1453
01:07:36,940 --> 01:07:38,800
one, or really byte number zero.
1454
01:07:38,800 --> 01:07:41,170
The one next to it is
number one, then number two,
1455
01:07:41,170 --> 01:07:43,450
number 3, dot, dot,
dot, number 2 billion
1456
01:07:43,450 --> 01:07:46,090
or whatever it is, however
big this memory is.
1457
01:07:46,090 --> 01:07:50,530
So if you use a variable in a C
program, that's only one byte.
1458
01:07:50,530 --> 01:07:54,190
Like a char, it might literally be
stored in that top left-hand corner
1459
01:07:54,190 --> 01:07:55,120
of the memory.
1460
01:07:55,120 --> 01:07:57,760
In practice, you don't care
where, physically, it is.
1461
01:07:57,760 --> 01:07:59,830
But really, the artist's
rendition would be
1462
01:07:59,830 --> 01:08:02,872
this-- a char might use
one of those single bytes
1463
01:08:02,872 --> 01:08:04,330
somewhere in the computer's memory.
1464
01:08:04,330 --> 01:08:07,450
If you use an int, which is
4 bytes, it would give you
1465
01:08:07,450 --> 01:08:10,840
4 bytes, contiguous-- that is
left to right, top to bottom.
1466
01:08:10,840 --> 01:08:13,274
But all 32 bits would
be next to each other
1467
01:08:13,274 --> 01:08:16,149
so the computer knows that those,
indeed, all belong to the same int.
1468
01:08:16,149 --> 01:08:18,680
If you need a long, or a
double for that matter,
1469
01:08:18,680 --> 01:08:21,140
then you might use a full
8 bytes in this case.
1470
01:08:21,140 --> 01:08:23,439
And you just keep using
and using this memory,
1471
01:08:23,439 --> 01:08:26,170
kind of like a canvas,
almost in Photoshop
1472
01:08:26,170 --> 01:08:29,845
or a spreadsheet where you can just
move pixels or you can move data around,
1473
01:08:29,845 --> 01:08:31,720
that's really what your
computer's memory is,
1474
01:08:31,720 --> 01:08:36,702
a canvas for storing information
in units of bytes or 8 bits.
1475
01:08:36,702 --> 01:08:39,160
Now, we don't need to keep
looking at these circuit boards.
1476
01:08:39,160 --> 01:08:41,287
We can abstract it away, as we often do.
1477
01:08:41,287 --> 01:08:43,120
And let's go ahead and
zoom in on this grid,
1478
01:08:43,120 --> 01:08:45,740
just to consider some
very specific variables.
1479
01:08:45,740 --> 01:08:49,180
So let me zoom in, and now I
see fewer, but larger boxes
1480
01:08:49,180 --> 01:08:51,580
on the screen, each of which,
again, represents a byte.
1481
01:08:51,580 --> 01:08:55,130
And now let me propose that
we play with some actual code.
1482
01:08:55,130 --> 01:08:58,029
So here in C, albeit
without a full program,
1483
01:08:58,029 --> 01:09:01,060
are three ints-- score1, score2, score3.
1484
01:09:01,060 --> 01:09:07,359
I have, coincidentally, given
myself two scores around 72 and 73,
1485
01:09:07,359 --> 01:09:09,040
and then a pretty low score at 33.
1486
01:09:09,040 --> 01:09:12,048
Of course, last week or two weeks
ago, this would have been high.
1487
01:09:12,048 --> 01:09:13,840
But now we're dealing
with actual integers.
1488
01:09:13,840 --> 01:09:17,750
So these are three so-so scores on
my quizzes or tests or the like.
1489
01:09:17,750 --> 01:09:19,250
So let me go to VS Code here.
1490
01:09:19,250 --> 01:09:22,210
And let's make a
program called scores.c.
1491
01:09:22,210 --> 01:09:24,399
So I'm going to write, code scores.c.
1492
01:09:24,399 --> 01:09:26,149
That's going to give me my new file.
1493
01:09:26,149 --> 01:09:28,420
And let me go ahead and
implement something like this.
1494
01:09:28,420 --> 01:09:34,149
Include stdio.h, int main(void),
and then inside of here,
1495
01:09:34,149 --> 01:09:37,689
let me do int score1 will be 72.
1496
01:09:37,689 --> 01:09:40,029
Int score2 will be 73.
1497
01:09:40,029 --> 01:09:43,149
And int score3 will be 33.
1498
01:09:43,149 --> 01:09:45,460
And then let me just do
something like write a program
1499
01:09:45,460 --> 01:09:48,043
to average my three test scores
together, something like that.
1500
01:09:48,043 --> 01:09:52,240
So let me do printf, quote
unquote, my average is--
1501
01:09:52,240 --> 01:09:56,470
and I'm going to go ahead
and do, say, %i, /n.
1502
01:09:56,470 --> 01:09:58,290
And now, let me plug in the results.
1503
01:09:58,290 --> 01:10:00,040
And this is kind of
grade school math now.
1504
01:10:00,040 --> 01:10:02,210
How do I compute the
average of three values?
1505
01:10:02,210 --> 01:10:09,110
Well, just like on paper, I can
do score1 plus score2 plus score3
1506
01:10:09,110 --> 01:10:12,830
in parentheses, because of order
of operations, divided by 3,
1507
01:10:12,830 --> 01:10:14,457
since there's three total scores.
1508
01:10:14,457 --> 01:10:16,040
All right, so I think this checks out.
1509
01:10:16,040 --> 01:10:19,040
And indeed, you can use parentheses
and operators like plus in your code
1510
01:10:19,040 --> 01:10:23,180
like this in C. Let me go
ahead now and do make scores.
1511
01:10:23,180 --> 01:10:24,327
No syntax error.
1512
01:10:24,327 --> 01:10:25,910
So that's good, nothing missing there.
1513
01:10:25,910 --> 01:10:28,850
And now let me do ./scores and
see what my test average is.
1514
01:10:28,850 --> 01:10:32,270
All right, it's not great,
but I think I still passed.
1515
01:10:32,270 --> 01:10:36,050
And indeed, my average here is 59.
1516
01:10:36,050 --> 01:10:38,360
Is it precisely 59 though?
1517
01:10:38,360 --> 01:10:39,140
Well, let's see.
1518
01:10:39,140 --> 01:10:42,110
Let's actually, instead of using
an int, how about we go ahead
1519
01:10:42,110 --> 01:10:44,870
and use something like a
floating point value here?
1520
01:10:44,870 --> 01:10:46,250
And let me go ahead and do this.
1521
01:10:46,250 --> 01:10:48,710
So let me recompile
my code, make scores.
1522
01:10:48,710 --> 01:10:50,600
Huh, all right, I've got an issue.
1523
01:10:50,600 --> 01:10:52,340
Let me zoom in on my terminal window.
1524
01:10:52,340 --> 01:10:54,710
We've not seen this one,
necessarily, before.
1525
01:10:54,710 --> 01:10:56,510
But error on line 9.
1526
01:10:56,510 --> 01:11:00,410
Format specifies type double,
which is a lot of precision,
1527
01:11:00,410 --> 01:11:02,180
but the argument has type int.
1528
01:11:02,180 --> 01:11:03,300
So what does this mean?
1529
01:11:03,300 --> 01:11:06,508
Well, it's showing me with these green
squiggles that something's bad between
1530
01:11:06,508 --> 01:11:09,060
the %f and this thing over here.
1531
01:11:09,060 --> 01:11:13,020
Well, on the left, I'm implying a
float, or a double for that matter.
1532
01:11:13,020 --> 01:11:16,835
On the right, though, what data
type are score1, score2, score3?
1533
01:11:16,835 --> 01:11:17,960
All right, so they're ints.
1534
01:11:17,960 --> 01:11:19,583
So clang does not like this.
1535
01:11:19,583 --> 01:11:22,250
The compiler just doesn't like
that I'm using ints on the right,
1536
01:11:22,250 --> 01:11:24,170
but I want floats on the left.
1537
01:11:24,170 --> 01:11:26,670
So there's going to be
different ways of solving this.
1538
01:11:26,670 --> 01:11:29,870
One way would be to just ignore
the problem like I originally did,
1539
01:11:29,870 --> 01:11:32,450
and just go back to %i.
1540
01:11:32,450 --> 01:11:38,330
Or as an aside, %d is often an
alternative to %i for a decimal number.
1541
01:11:38,330 --> 01:11:42,358
But we use %i because it sounds
like int, so %i is fine here too.
1542
01:11:42,358 --> 01:11:44,150
But I don't want to
just avoid the problem.
1543
01:11:44,150 --> 01:11:46,500
I want to actually display
a floating point value.
1544
01:11:46,500 --> 01:11:47,730
So how can I fix this?
1545
01:11:47,730 --> 01:11:50,272
Well, it turns out, I can solve
this in a few different ways.
1546
01:11:50,272 --> 01:11:53,990
The simplest is just to make sure
that at least one number on the right
1547
01:11:53,990 --> 01:11:59,330
is a floating point value,
like 3.0 instead of just 3.
1548
01:11:59,330 --> 01:12:01,700
Now I think clang will be happier.
1549
01:12:01,700 --> 01:12:03,320
Let me do make scores--
1550
01:12:03,320 --> 01:12:04,400
Enter.
1551
01:12:04,400 --> 01:12:05,330
And indeed, it's OK.
1552
01:12:05,330 --> 01:12:05,930
Why?
1553
01:12:05,930 --> 01:12:10,050
As soon as you have at least one
more precise data type on the right,
1554
01:12:10,050 --> 01:12:13,170
it just treats everything, at that
point, as floating point value
1555
01:12:13,170 --> 01:12:14,330
so that the math works out.
1556
01:12:14,330 --> 01:12:17,720
So ./scores, Enter-- and
now, there we go, right?
1557
01:12:17,720 --> 01:12:20,390
Some of us might really
want that 1/3 of a point.
1558
01:12:20,390 --> 01:12:21,980
Our average was not 59.
1559
01:12:21,980 --> 01:12:25,010
It's 59 1/3, as in this case here.
1560
01:12:25,010 --> 01:12:26,750
All right, so we've solved that there.
1561
01:12:26,750 --> 01:12:30,890
As an aside, though, there's one
other technique to show here.
1562
01:12:30,890 --> 01:12:33,320
If you didn't want to change
it to 3.0 because that's
1563
01:12:33,320 --> 01:12:36,410
a little weird, because there
were literally three scores,
1564
01:12:36,410 --> 01:12:38,760
it's not like that needs
to have a decimal point,
1565
01:12:38,760 --> 01:12:43,970
you could also explicitly
convert the 3 to a float
1566
01:12:43,970 --> 01:12:46,230
by saying, in parentheses, float.
1567
01:12:46,230 --> 01:12:48,050
This is what's called typecasting.
1568
01:12:48,050 --> 01:12:51,840
And this will just convert the thing
right after it to that data type,
1569
01:12:51,840 --> 01:12:52,560
if it's possible.
1570
01:12:52,560 --> 01:12:56,970
So if I do this again, make scores,
no errors now. ./scores, and I get,
1571
01:12:56,970 --> 01:12:59,960
in fact, the same result. There's
a bit of a rounding issue here,
1572
01:12:59,960 --> 01:13:03,650
but we know the rounding relates
to the imprecision from last week.
1573
01:13:03,650 --> 01:13:06,980
For now, let me just be
happy with my 59.3 something.
1574
01:13:06,980 --> 01:13:08,360
I'll take that for now.
1575
01:13:08,360 --> 01:13:14,660
But this is as close to a good
enough correct answer for me now.
1576
01:13:14,660 --> 01:13:15,942
But how do I--
1577
01:13:15,942 --> 01:13:18,650
think about now, what's going on
inside of the computer's memory?
1578
01:13:18,650 --> 01:13:19,310
Well, let's consider.
1579
01:13:19,310 --> 01:13:20,643
Here's that same grid of memory.
1580
01:13:20,643 --> 01:13:22,490
Each box represents a byte.
1581
01:13:22,490 --> 01:13:25,790
Where are score1, score2,
and score3 in my memory?
1582
01:13:25,790 --> 01:13:28,790
Well, score1, let me just
propose, is at the top left.
1583
01:13:28,790 --> 01:13:32,060
But it's taking up
four boxes for 4 bytes.
1584
01:13:32,060 --> 01:13:34,842
Score2 probably ends up
right next to it in memory,
1585
01:13:34,842 --> 01:13:36,800
though, this isn't always
going to be the case,
1586
01:13:36,800 --> 01:13:38,180
but I've chosen simple examples.
1587
01:13:38,180 --> 01:13:40,910
73 is next to it, also
taking up 4 bytes.
1588
01:13:40,910 --> 01:13:45,320
And then lastly, 33 is in
score3, down there underneath.
1589
01:13:45,320 --> 01:13:48,343
Now, if we really look
at the computer's memory,
1590
01:13:48,343 --> 01:13:50,510
look at it with some kind
of microscope or the like,
1591
01:13:50,510 --> 01:13:54,110
there's actually 32
bits, 32 bits, 32 bits
1592
01:13:54,110 --> 01:13:59,308
in each of those four groups of four
bytes representing those values.
1593
01:13:59,308 --> 01:14:01,100
But again, for today's
purposes onwards, we
1594
01:14:01,100 --> 01:14:03,308
don't really need to think
again and again in binary.
1595
01:14:03,308 --> 01:14:05,940
It's just, indeed, these decimal
numbers being stored there.
1596
01:14:05,940 --> 01:14:08,240
But I claim now, this
isn't the best design.
1597
01:14:08,240 --> 01:14:11,300
Even if you have never
programmed before CS50,
1598
01:14:11,300 --> 01:14:13,220
what you're looking
at here on the screen,
1599
01:14:13,220 --> 01:14:16,970
as an excerpt, in what sense is this
perhaps bad design, even though it's
1600
01:14:16,970 --> 01:14:19,960
a correct way of storing
three test scores?
1601
01:14:19,960 --> 01:14:20,960
What's kind of bad here?
1602
01:14:20,960 --> 01:14:21,882
Yeah?
1603
01:14:21,882 --> 01:14:26,220
AUDIENCE: The more scores you
have, the more you [INAUDIBLE]..
1604
01:14:26,220 --> 01:14:28,950
DAVID MALAN: Yeah, always do
exactly what you did-- extrapolate
1605
01:14:28,950 --> 01:14:31,740
to 4 scores, 5 scores 50 scores.
1606
01:14:31,740 --> 01:14:34,020
This can't be that
well-designed because now you're
1607
01:14:34,020 --> 01:14:36,300
going to have 4 lines of
code, 5 lines of code,
1608
01:14:36,300 --> 01:14:38,550
50 lines of code that
are almost identical,
1609
01:14:38,550 --> 01:14:40,770
except for this like
arbitrary number that we're
1610
01:14:40,770 --> 01:14:42,430
updating at the end of the variable.
1611
01:14:42,430 --> 01:14:44,940
So indeed, there's probably
going to be a better
1612
01:14:44,940 --> 01:14:48,690
way, even though, at least in C,
we haven't yet seen that technique.
1613
01:14:48,690 --> 01:14:52,440
But the solution, today onward, is
going to be something called an array.
1614
01:14:52,440 --> 01:14:57,180
An array is a way of
storing your data back
1615
01:14:57,180 --> 01:15:00,630
to back to back in the
computer's memory in such a way
1616
01:15:00,630 --> 01:15:03,960
that you can access each
individual member easily.
1617
01:15:03,960 --> 01:15:08,530
Put another way, with an array, you
can instead do something like this.
1618
01:15:08,530 --> 01:15:12,300
Instead of saying int score1,
int score2, int score3,
1619
01:15:12,300 --> 01:15:15,790
giving each a value, you
can first tell the computer,
1620
01:15:15,790 --> 01:15:18,330
please give me a
variable called scores--
1621
01:15:18,330 --> 01:15:20,700
plural, though you can
call it anything you want--
1622
01:15:20,700 --> 01:15:24,090
of size three, each of
which will be an integer.
1623
01:15:24,090 --> 01:15:28,680
That is to say, this is how you
declare an array in C that will have
1624
01:15:28,680 --> 01:15:30,930
enough room to store three integers.
1625
01:15:30,930 --> 01:15:34,540
Put another way, this is the
technical way of telling the computer,
1626
01:15:34,540 --> 01:15:38,880
please give me 12 bytes in total--
1627
01:15:38,880 --> 01:15:42,660
3 times 4 each for an int,
so give me 12 bytes in total.
1628
01:15:42,660 --> 01:15:44,640
And what the computer
will do is guarantee
1629
01:15:44,640 --> 01:15:47,350
that they're back to back to
back in the computer's memory.
1630
01:15:47,350 --> 01:15:49,360
And that'll be useful in just a moment.
1631
01:15:49,360 --> 01:15:51,820
So let me go ahead and do
something useful with this.
1632
01:15:51,820 --> 01:15:53,640
Let me store three actual scores.
1633
01:15:53,640 --> 01:15:58,500
Here's how I could now store those
same numeric scores in this array.
1634
01:15:58,500 --> 01:16:03,040
Syntax is a little different, but
there's one variable called scores.
1635
01:16:03,040 --> 01:16:05,010
But if you want to go
to its first location,
1636
01:16:05,010 --> 01:16:08,520
starting today, you use square
brackets and go to location 0
1637
01:16:08,520 --> 01:16:13,080
first, which because things in
C are 0 indexed, so to speak,
1638
01:16:13,080 --> 01:16:14,280
you start counting at 0.
1639
01:16:14,280 --> 01:16:16,410
The first int is at [0].
1640
01:16:16,410 --> 01:16:18,030
Second int is at [1].
1641
01:16:18,030 --> 01:16:19,530
Third int is at [2].
1642
01:16:19,530 --> 01:16:20,730
So it's not one, two, three.
1643
01:16:20,730 --> 01:16:22,090
It's literally 0, 1, 2.
1644
01:16:22,090 --> 01:16:24,090
And this is not something
you have control over.
1645
01:16:24,090 --> 01:16:26,250
You must start at 0.
1646
01:16:26,250 --> 01:16:29,940
So these lines now create
an array of size three,
1647
01:16:29,940 --> 01:16:33,510
and then insert one, two,
three values into that array.
1648
01:16:33,510 --> 01:16:37,770
But the upside now is that you only have
one name of the variable to remember.
1649
01:16:37,770 --> 01:16:39,240
It's just called scores.
1650
01:16:39,240 --> 01:16:43,380
Yes, you need to go into the
array to get individual values.
1651
01:16:43,380 --> 01:16:46,618
You need to index into it
using those square brackets.
1652
01:16:46,618 --> 01:16:48,660
But at least you don't
have this hackish approach
1653
01:16:48,660 --> 01:16:53,050
of declaring a separate variable for
each and every one of these values.
1654
01:16:53,050 --> 01:16:56,070
So let me go back to scores.c here.
1655
01:16:56,070 --> 01:16:57,580
And let me propose that I do this.
1656
01:16:57,580 --> 01:17:00,580
Let me just use that same
idea to do the following.
1657
01:17:00,580 --> 01:17:02,580
Let me get rid of these
three separate integers.
1658
01:17:02,580 --> 01:17:06,210
Let me give myself an int
scores array of size 3.
1659
01:17:06,210 --> 01:17:10,470
And then scores[0]
will, as before, be 72.
1660
01:17:10,470 --> 01:17:14,070
Scores[1] will be 73.
1661
01:17:14,070 --> 01:17:16,830
And scores[2] will be 33.
1662
01:17:16,830 --> 01:17:18,780
And let me get rid of
the little dot there.
1663
01:17:18,780 --> 01:17:23,490
All right, so now, if I go ahead and
run this again with make scores--
1664
01:17:23,490 --> 01:17:24,642
Enter.
1665
01:17:24,642 --> 01:17:29,060
Huh, what did I do wrong here?
1666
01:17:29,060 --> 01:17:31,680
I think I got a little
too ahead of myself.
1667
01:17:31,680 --> 01:17:36,100
Let me increase my terminal window.
1668
01:17:36,100 --> 01:17:38,830
Let's focus on line 10 here, first.
1669
01:17:38,830 --> 01:17:42,310
Error, use of undeclared
identifier, score1.
1670
01:17:42,310 --> 01:17:44,170
What did I do here that was dumb?
1671
01:17:44,170 --> 01:17:45,430
Yeah?
1672
01:17:45,430 --> 01:17:47,440
AUDIENCE: You didn't
declare it a variable.
1673
01:17:47,440 --> 01:17:49,420
DAVID MALAN: Right, so
I didn't declare score1.
1674
01:17:49,420 --> 01:17:50,530
I've got old code.
1675
01:17:50,530 --> 01:17:53,798
So I just kind of, honestly, got ahead
of myself here, not even intentionally.
1676
01:17:53,798 --> 01:17:56,090
So let me go ahead and shrink
my terminal window again.
1677
01:17:56,090 --> 01:17:57,740
I need to finish my thought here.
1678
01:17:57,740 --> 01:17:58,960
So let me clear my terminal.
1679
01:17:58,960 --> 01:18:04,960
And let me change this now to
be scores[0] plus scores[1] plus
1680
01:18:04,960 --> 01:18:05,610
scores[2].
1681
01:18:05,610 --> 01:18:07,360
So it's a little more
verbose because I've
1682
01:18:07,360 --> 01:18:10,040
got these square brackets, so to speak.
1683
01:18:10,040 --> 01:18:12,220
But I think now my code is consistent.
1684
01:18:12,220 --> 01:18:13,870
So let me make scores now.
1685
01:18:13,870 --> 01:18:14,950
It now compiles.
1686
01:18:14,950 --> 01:18:19,870
./scores gives me, indeed, the same
rough average with those same values.
1687
01:18:19,870 --> 01:18:24,280
All right, so let me go ahead and
maybe enhance this a little bit.
1688
01:18:24,280 --> 01:18:26,920
It's a little silly to have to
write a special program just
1689
01:18:26,920 --> 01:18:31,610
to check your average of three
test scores like 72, 73, 33.
1690
01:18:31,610 --> 01:18:33,550
Why don't I actually
make the program dynamic
1691
01:18:33,550 --> 01:18:37,250
and ask the human for those scores?
1692
01:18:37,250 --> 01:18:39,140
So instead, let me do this.
1693
01:18:39,140 --> 01:18:43,480
How about we get rid of the
72, and change this to getInt.
1694
01:18:43,480 --> 01:18:46,300
And I'll just prompt
the user for a score.
1695
01:18:46,300 --> 01:18:52,510
Let me get rid of the 73 and get this
to be getInt score, quote unquote.
1696
01:18:52,510 --> 01:18:56,560
And then lastly, get rid of the 33, and
replace it with getInt, quote unquote,
1697
01:18:56,560 --> 01:18:57,670
score.
1698
01:18:57,670 --> 01:19:03,680
getInt is a CS50 thing for now, so
I need to include cs50.h, as always.
1699
01:19:03,680 --> 01:19:05,650
But I think now, it's
sort of a better program
1700
01:19:05,650 --> 01:19:08,680
because now I can compile it once,
I can even share it with my friends.
1701
01:19:08,680 --> 01:19:12,490
And now any of us can average
three scores on some classes test.
1702
01:19:12,490 --> 01:19:15,190
They don't need to know the
code or rewrite the code just
1703
01:19:15,190 --> 01:19:16,910
to type in their scores.
1704
01:19:16,910 --> 01:19:19,150
So make scores worked.
1705
01:19:19,150 --> 01:19:25,120
./scores, now I can type anything
I want-- maybe it's a 72, 73, 33,
1706
01:19:25,120 --> 01:19:26,320
still get the same answer.
1707
01:19:26,320 --> 01:19:31,210
Or maybe I'm having a better
semester, 100, 100, maybe 99,
1708
01:19:31,210 --> 01:19:33,520
and now we get still a
pretty high score there.
1709
01:19:33,520 --> 01:19:34,600
But now it's dynamic.
1710
01:19:34,600 --> 01:19:36,080
Now you don't need the source code.
1711
01:19:36,080 --> 01:19:37,747
You don't need to recompile the program.
1712
01:19:37,747 --> 01:19:39,670
It's just going to work again and again.
1713
01:19:39,670 --> 01:19:41,090
But this, too.
1714
01:19:41,090 --> 01:19:43,660
Let me propose that this
code is correct if I
1715
01:19:43,660 --> 01:19:45,910
want to get three scores from the user.
1716
01:19:45,910 --> 01:19:50,950
But these highlighted lines now, 6
through 9, are they well-designed,
1717
01:19:50,950 --> 01:19:53,170
would you say?
1718
01:19:53,170 --> 01:19:53,680
Yeah?
1719
01:19:53,680 --> 01:19:54,898
AUDIENCE: Can you loop?
1720
01:19:54,898 --> 01:19:55,940
DAVID MALAN: Yeah, right?
1721
01:19:55,940 --> 01:19:58,220
This is-- we can use a
loop, is the spoiler here.
1722
01:19:58,220 --> 01:19:58,820
Why?
1723
01:19:58,820 --> 01:20:01,590
I mean, my God, it's like the same
code again and again and again.
1724
01:20:01,590 --> 01:20:03,465
The only thing that's
changing is the number.
1725
01:20:03,465 --> 01:20:06,170
And this should have kind of
had some code smell again,
1726
01:20:06,170 --> 01:20:09,080
because if I keep typing the
same thing again and again,
1727
01:20:09,080 --> 01:20:11,810
that's clearly an opportunity
to better design something.
1728
01:20:11,810 --> 01:20:13,650
So let me do this.
1729
01:20:13,650 --> 01:20:18,590
Let me go ahead and still
create my array of size three.
1730
01:20:18,590 --> 01:20:23,270
But let me use our old friend,
the for loop, for int i equals 0,
1731
01:20:23,270 --> 01:20:26,610
i less than 3, i++.
1732
01:20:26,610 --> 01:20:29,510
And then in here, let
me do scores bracket--
1733
01:20:29,510 --> 01:20:32,920
we haven't seen this
before, but any intuition?
1734
01:20:32,920 --> 01:20:34,220
Scores bracket--
1735
01:20:34,220 --> 01:20:34,720
AUDIENCE: i.
1736
01:20:34,720 --> 01:20:39,730
DAVID MALAN: i, because that will
use whatever i is, be it 0 or 1 or 2
1737
01:20:39,730 --> 01:20:40,720
in iteration.
1738
01:20:40,720 --> 01:20:43,780
And then I can get an int,
asking the user for score,
1739
01:20:43,780 --> 01:20:47,000
without having to repeat
myself again and again.
1740
01:20:47,000 --> 01:20:50,560
So hopefully, if I didn't make
any typos, make scores, all good.
1741
01:20:50,560 --> 01:20:54,665
./scores, 72, 73, 33, and
we're back in business.
1742
01:20:54,665 --> 01:20:56,540
But the code is arguably
now better designed,
1743
01:20:56,540 --> 01:21:01,240
because now, I haven't
actually hardcoded the scores,
1744
01:21:01,240 --> 01:21:04,940
and I haven't actually copied
and pasted any of that code.
1745
01:21:04,940 --> 01:21:08,230
Well, if we consider now what's going
on inside of the computer's memory,
1746
01:21:08,230 --> 01:21:10,510
it's pretty much the same
in terms of the values.
1747
01:21:10,510 --> 01:21:15,490
But instead of the variables being,
literally, score1, score2, score3,
1748
01:21:15,490 --> 01:21:17,210
there's just one variable.
1749
01:21:17,210 --> 01:21:19,030
It's an array called scores.
1750
01:21:19,030 --> 01:21:24,550
But you can index into its three
locations by using scores[0] to get
1751
01:21:24,550 --> 01:21:28,810
the first, scores[1] to get the
second, scores[2] to get the third.
1752
01:21:28,810 --> 01:21:29,990
But this is key.
1753
01:21:29,990 --> 01:21:33,040
The memory is contiguous.
1754
01:21:33,040 --> 01:21:35,380
The screen is only so
large, so it wraps around.
1755
01:21:35,380 --> 01:21:38,950
But physically, digitally,
the memory is contiguous-- top
1756
01:21:38,950 --> 01:21:40,270
to bottom, left to right.
1757
01:21:40,270 --> 01:21:41,530
And that's important, why?
1758
01:21:41,530 --> 01:21:46,060
Because the brackets indicate 0,
1, 2, that each of these integers
1759
01:21:46,060 --> 01:21:48,790
is just one integer away from the next.
1760
01:21:48,790 --> 01:21:51,220
It can't be randomly down
here all of a sudden.
1761
01:21:51,220 --> 01:21:54,070
It's got to be back to back to back.
1762
01:21:54,070 --> 01:21:57,130
All right, now equipped
with that paradigm,
1763
01:21:57,130 --> 01:22:00,710
what more could we actually do here?
1764
01:22:00,710 --> 01:22:04,270
Well, it turns out, it's worth
knowing that it's possible in code
1765
01:22:04,270 --> 01:22:06,850
to even pass arrays around as arguments.
1766
01:22:06,850 --> 01:22:09,100
And let me just whip this
program up somewhat quickly,
1767
01:22:09,100 --> 01:22:11,320
just so you've seen it before long.
1768
01:22:11,320 --> 01:22:13,190
But let me go ahead and do this.
1769
01:22:13,190 --> 01:22:18,130
Let me propose that I create a function
that does this averaging for me.
1770
01:22:18,130 --> 01:22:22,510
So I'm going to create a function
called average that returns a float.
1771
01:22:22,510 --> 01:22:26,860
And the arguments this
thing is going to take--
1772
01:22:26,860 --> 01:22:28,640
let's see, it's going to be the array.
1773
01:22:28,640 --> 01:22:31,480
So it turns out, if you want to
take in an array of numbers--
1774
01:22:31,480 --> 01:22:33,050
you can call it anything you want.
1775
01:22:33,050 --> 01:22:36,970
This is how you tell C
that a function takes, not
1776
01:22:36,970 --> 01:22:39,790
an integer, but an array of integers.
1777
01:22:39,790 --> 01:22:41,290
And you don't have to call it array.
1778
01:22:41,290 --> 01:22:42,790
I'm doing that just for
the sake of discussion.
1779
01:22:42,790 --> 01:22:43,660
It can be called x.
1780
01:22:43,660 --> 01:22:44,490
It can be numbers.
1781
01:22:44,490 --> 01:22:45,490
It can be anything else.
1782
01:22:45,490 --> 01:22:49,060
I'm just calling an array to be super
explicit as to what it is there.
1783
01:22:49,060 --> 01:22:51,730
Now, how do I change my code down here?
1784
01:22:51,730 --> 01:22:55,130
What I think I'm going to do
for the moment is just this.
1785
01:22:55,130 --> 01:22:59,110
I'm going to get rid of this code here,
where I manually computed the average.
1786
01:22:59,110 --> 01:23:01,480
And let me just call the
average function here
1787
01:23:01,480 --> 01:23:05,000
by passing in the whole array of scores.
1788
01:23:05,000 --> 01:23:07,030
So this is just an
example of abstraction,
1789
01:23:07,030 --> 01:23:08,890
like now I have a
function called average.
1790
01:23:08,890 --> 01:23:09,670
I don't care.
1791
01:23:09,670 --> 01:23:12,490
I don't have to remember how
it works once I implement it.
1792
01:23:12,490 --> 01:23:15,010
It just kind of tightens up
my main code a little bit.
1793
01:23:15,010 --> 01:23:17,030
But I do still have to implement this.
1794
01:23:17,030 --> 01:23:19,360
So later in my file-- let
me repeat myself before,
1795
01:23:19,360 --> 01:23:22,270
the only time it's OK in C to
repeat yourself again and again,
1796
01:23:22,270 --> 01:23:27,010
by typing out again, average,
and then int array open bracket--
1797
01:23:27,010 --> 01:23:28,580
but now not a semicolon.
1798
01:23:28,580 --> 01:23:30,250
Now I have to implement this thing.
1799
01:23:30,250 --> 01:23:33,400
And I can implement this in
a bunch of different ways,
1800
01:23:33,400 --> 01:23:37,630
but I don't know in advance--
1801
01:23:37,630 --> 01:23:39,040
I can't just do this.
1802
01:23:39,040 --> 01:23:48,400
I can't just do array[0]
plus array[1] plus array[2],
1803
01:23:48,400 --> 01:23:52,130
unless this program's only ever
going to work on three numbers.
1804
01:23:52,130 --> 01:23:55,460
So let me go ahead and do this.
1805
01:23:55,460 --> 01:23:58,570
Let me first propose that
there's a poor design here.
1806
01:23:58,570 --> 01:24:01,930
In my main function, what
value have I repeated twice?
1807
01:24:01,930 --> 01:24:05,050
1808
01:24:05,050 --> 01:24:07,550
Among the highlighted lines,
what jumps out at you as twice?
1809
01:24:07,550 --> 01:24:09,020
AUDIENCE: The length of the array?
1810
01:24:09,020 --> 01:24:11,520
DAVID MALAN: Yeah, the length
of the array, it's just three.
1811
01:24:11,520 --> 01:24:14,720
Now it's not a huge deal that I typed
the number three on line 8 and line 9,
1812
01:24:14,720 --> 01:24:17,120
but this is exactly the
kind of like shortcut
1813
01:24:17,120 --> 01:24:18,440
that's going to get you
in trouble eventually.
1814
01:24:18,440 --> 01:24:18,860
Why?
1815
01:24:18,860 --> 01:24:20,240
Because, eventually,
you or someone else is
1816
01:24:20,240 --> 01:24:22,407
going to go in and make the
array bigger or smaller,
1817
01:24:22,407 --> 01:24:24,410
and you're not going to
realize that magically,
1818
01:24:24,410 --> 01:24:26,270
that same number is in two places.
1819
01:24:26,270 --> 01:24:29,270
And indeed, this is what a programmer
would often call a magic number.
1820
01:24:29,270 --> 01:24:31,940
A magic number is one that
just kind of appears magically.
1821
01:24:31,940 --> 01:24:35,210
And you're on the honor system to
change it here, if you change it here,
1822
01:24:35,210 --> 01:24:36,688
and then you change it over here.
1823
01:24:36,688 --> 01:24:39,230
That's not going to end well if
the onus is on the programmer
1824
01:24:39,230 --> 01:24:43,190
to remember where they hardcoded--
that is, wrote out three explicitly.
1825
01:24:43,190 --> 01:24:46,250
So any time you reuse a value
like this, you know what?
1826
01:24:46,250 --> 01:24:50,690
We should probably do what we did last
week, which was to declare a variable,
1827
01:24:50,690 --> 01:24:53,510
perhaps at the very top of my
program, so it's super obvious
1828
01:24:53,510 --> 01:24:56,990
what it is, called, maybe
n, and set that equal to 3.
1829
01:24:56,990 --> 01:24:59,030
Better yet, what did I
do last week to make sure
1830
01:24:59,030 --> 01:25:02,390
that I can't screw up and
accidentally change that value?
1831
01:25:02,390 --> 01:25:03,440
Yeah, constant.
1832
01:25:03,440 --> 01:25:05,810
And the keyword there
was just const for short.
1833
01:25:05,810 --> 01:25:09,110
And now I have a global variable--
global in the sense that I can
1834
01:25:09,110 --> 01:25:11,870
access it anywhere-- that is called n.
1835
01:25:11,870 --> 01:25:12,680
It's an int.
1836
01:25:12,680 --> 01:25:14,450
And it's always going to be 3.
1837
01:25:14,450 --> 01:25:18,500
And now I can improve my main
function a little bit by just changing
1838
01:25:18,500 --> 01:25:22,662
the 3's to n, so now if I, if a
colleague realized, oh, wait a minute,
1839
01:25:22,662 --> 01:25:23,870
there's four tests this year.
1840
01:25:23,870 --> 01:25:25,610
You change n to four,
recompile the code,
1841
01:25:25,610 --> 01:25:31,190
and it just works everywhere else,
except in my average function.
1842
01:25:31,190 --> 01:25:33,830
Let me change it back to
3, just for consistency.
1843
01:25:33,830 --> 01:25:39,770
This is not going to fly now, to just
sum up things like this, for instance,
1844
01:25:39,770 --> 01:25:43,610
and then return this divided by 3.
1845
01:25:43,610 --> 01:25:51,130
Why will this not work
now as I've defined it?
1846
01:25:51,130 --> 01:25:52,159
Yeah?
1847
01:25:52,159 --> 01:25:58,030
AUDIENCE: [INAUDIBLE]
1848
01:25:58,030 --> 01:26:00,980
DAVID MALAN: OK, I might be
returning an integer value when
1849
01:26:00,980 --> 01:26:02,870
I intend to return a float per this.
1850
01:26:02,870 --> 01:26:05,870
But I think I'm OK because I used
that little trick where I made sure
1851
01:26:05,870 --> 01:26:08,810
that at least one of the numbers
in my arithmetic expression
1852
01:26:08,810 --> 01:26:11,010
is, in fact, a floating point value.
1853
01:26:11,010 --> 01:26:14,180
And just by adding the point
0, make sure that everything
1854
01:26:14,180 --> 01:26:15,650
gets treated as a float.
1855
01:26:15,650 --> 01:26:17,864
So I think that's OK.
1856
01:26:17,864 --> 01:26:19,034
AUDIENCE: [INAUDIBLE]
1857
01:26:19,034 --> 01:26:20,701
DAVID MALAN: I'm sorry, a little louder.
1858
01:26:20,701 --> 01:26:24,385
AUDIENCE: It just seems
like you're [INAUDIBLE]..
1859
01:26:24,385 --> 01:26:25,260
DAVID MALAN: Exactly.
1860
01:26:25,260 --> 01:26:27,093
So left hand's not
talking to the right hand
1861
01:26:27,093 --> 01:26:30,210
here, in that my current
implementation of average
1862
01:26:30,210 --> 01:26:33,510
is still assuming that there's only
going to be three tests or whatever.
1863
01:26:33,510 --> 01:26:35,670
But wait a minute, I just
went through the trouble
1864
01:26:35,670 --> 01:26:39,480
of modifying this to be n, generically.
1865
01:26:39,480 --> 01:26:43,205
And if I change this to 4, I'm
not going to be happy, perhaps,
1866
01:26:43,205 --> 01:26:46,080
with my average because now I'm
going to ignore one of my test scores
1867
01:26:46,080 --> 01:26:46,690
altogether.
1868
01:26:46,690 --> 01:26:48,450
So let me change this back to 3.
1869
01:26:48,450 --> 01:26:51,180
And unfortunately, if
it's a variable now,
1870
01:26:51,180 --> 01:26:55,500
n, and therefore, I have literally
a variable number of scores,
1871
01:26:55,500 --> 01:27:00,920
how do I take the average of
a variable number of things?
1872
01:27:00,920 --> 01:27:02,630
I mean, what's my building block there?
1873
01:27:02,630 --> 01:27:03,170
Yeah?
1874
01:27:03,170 --> 01:27:10,100
AUDIENCE: [INAUDIBLE]
1875
01:27:10,100 --> 01:27:10,850
DAVID MALAN: Yeah.
1876
01:27:10,850 --> 01:27:14,880
Why don't I use a loop that goes through
the array and adds things up as you go?
1877
01:27:14,880 --> 01:27:17,360
I mean, kind of like grade school, as
you take the average on your calculator
1878
01:27:17,360 --> 01:27:19,730
or paper and pencil, you just
keep adding the numbers together,
1879
01:27:19,730 --> 01:27:22,380
and then you divide at the end
by the total number of things.
1880
01:27:22,380 --> 01:27:23,520
So how can I do this?
1881
01:27:23,520 --> 01:27:25,730
Well, let me change my
implementation of average
1882
01:27:25,730 --> 01:27:30,515
to first declare a variable called
sum, or whatever, set it equal to 0.
1883
01:27:30,515 --> 01:27:33,140
So this is like me on my piece
of paper getting ready to count,
1884
01:27:33,140 --> 01:27:36,590
or my calculator, of course, when you
turn it on, typically defaults to zero.
1885
01:27:36,590 --> 01:27:41,570
And now, let me do for, int i
equals 0. i is less than a--
1886
01:27:41,570 --> 01:27:43,700
well, no, I didn't do that.
1887
01:27:43,700 --> 01:27:46,730
i is less than n, i++.
1888
01:27:46,730 --> 01:27:52,640
And now in here, let me go ahead
and add to the current sum, whatever
1889
01:27:52,640 --> 01:27:55,910
is in the array's location, i.
1890
01:27:55,910 --> 01:28:00,740
And then down here, I think I can
just return some divided by 3.0--
1891
01:28:00,740 --> 01:28:04,560
not 3.0, n, perhaps here.
1892
01:28:04,560 --> 01:28:08,492
And actually, I think I'm going to
get-- let's make sure it's a float.
1893
01:28:08,492 --> 01:28:11,450
Let's use the type casting trick just
to make sure I don't accidentally
1894
01:28:11,450 --> 01:28:15,540
shortchange someone and throw away
everything after the decimal point.
1895
01:28:15,540 --> 01:28:17,300
So it just escalated quickly, right?
1896
01:28:17,300 --> 01:28:18,990
Average just got a lot more involved.
1897
01:28:18,990 --> 01:28:22,130
It's not just a single one line
of code, but now it's dynamic.
1898
01:28:22,130 --> 01:28:25,070
I initialize a variable called sum to 0.
1899
01:28:25,070 --> 01:28:30,920
In this loop, I go through and just keep
adding to sum, which is initially 0,
1900
01:28:30,920 --> 01:28:33,200
whatever's in array[i]--
1901
01:28:33,200 --> 01:28:36,740
or specifically array[0],
array[1], array[2].
1902
01:28:36,740 --> 01:28:40,970
That gives me a total sum that I return,
divided by the total number of things.
1903
01:28:40,970 --> 01:28:42,560
Now, this I can tighten slightly.
1904
01:28:42,560 --> 01:28:45,650
Recall that this is syntactic
sugar for just adding things.
1905
01:28:45,650 --> 01:28:48,620
I can't use plus plus because
that only literally adds one.
1906
01:28:48,620 --> 01:28:52,630
But I can use here, plus equals.
1907
01:28:52,630 --> 01:28:54,880
Questions on this implementation here?
1908
01:28:54,880 --> 01:28:58,000
Really the only takeaway-- or
the most important takeaway
1909
01:28:58,000 --> 01:29:00,730
is that this is the
syntax for how you tell
1910
01:29:00,730 --> 01:29:04,210
a function that it
expects a whole array, not
1911
01:29:04,210 --> 01:29:06,450
a single variable like
an int or the like.
1912
01:29:06,450 --> 01:29:08,200
You literally use
square brackets, but you
1913
01:29:08,200 --> 01:29:11,530
don't specify the length inside there.
1914
01:29:11,530 --> 01:29:12,748
Yeah?
1915
01:29:12,748 --> 01:29:16,410
AUDIENCE: What variable
[INAUDIBLE] at the top?
1916
01:29:16,410 --> 01:29:18,410
DAVID MALAN: What about
the variable at the top?
1917
01:29:18,410 --> 01:29:22,205
AUDIENCE: [INAUDIBLE]
1918
01:29:22,205 --> 01:29:23,330
DAVID MALAN: Good question.
1919
01:29:23,330 --> 01:29:25,220
What do I have it defined as at the top?
1920
01:29:25,220 --> 01:29:31,280
This variable, N, it must be an integer
if you're going to use it inside
1921
01:29:31,280 --> 01:29:33,840
of an arrays square brackets here.
1922
01:29:33,840 --> 01:29:38,360
So this line 10, notice, no
longer says 3, it says N.
1923
01:29:38,360 --> 01:29:42,350
And so whatever N is 3 or 4 or
something else, that's how many
1924
01:29:42,350 --> 01:29:43,970
integers I will get in that array.
1925
01:29:43,970 --> 01:29:47,070
And it must be, by definition
of an array, an integer that
1926
01:29:47,070 --> 01:29:48,320
goes in those square brackets.
1927
01:29:48,320 --> 01:29:50,000
And here's a common source of confusion.
1928
01:29:50,000 --> 01:29:52,350
When you create the
array, that is declare it,
1929
01:29:52,350 --> 01:29:54,350
you use square brackets
like this, where you put
1930
01:29:54,350 --> 01:29:56,210
the total number of elements you want.
1931
01:29:56,210 --> 01:29:59,820
When you subsequently use the
array, like I'm doing here,
1932
01:29:59,820 --> 01:30:02,690
you don't mention int again--
just like you don't mention int
1933
01:30:02,690 --> 01:30:04,610
again and again once a variable exists.
1934
01:30:04,610 --> 01:30:10,220
You use the square brackets still, but
you don't use N. You use 0 or 1 or 2
1935
01:30:10,220 --> 01:30:11,990
or, generically here, i.
1936
01:30:11,990 --> 01:30:14,810
So when C was designed, they
sometimes used the same syntax
1937
01:30:14,810 --> 01:30:17,060
for two different ideas or contexts.
1938
01:30:17,060 --> 01:30:17,984
Yeah?
1939
01:30:17,984 --> 01:30:22,645
AUDIENCE: Do you have to
include line 6 [INAUDIBLE]??
1940
01:30:22,645 --> 01:30:23,770
DAVID MALAN: Good question.
1941
01:30:23,770 --> 01:30:25,900
Do I have to include line 6?
1942
01:30:25,900 --> 01:30:29,290
Short answer, yes, because of
the reason we ran into last week.
1943
01:30:29,290 --> 01:30:32,750
C, or clang really, reads your
code top to bottom, left to right.
1944
01:30:32,750 --> 01:30:38,890
And so if the compiler sees some mention
of this function average on line 16,
1945
01:30:38,890 --> 01:30:41,800
but you haven't told the
compiler that average exists,
1946
01:30:41,800 --> 01:30:43,610
you're going to get an
error on the screen.
1947
01:30:43,610 --> 01:30:45,490
So the conventional
way to do that is you
1948
01:30:45,490 --> 01:30:48,670
just copy paste the first line
of code from the function,
1949
01:30:48,670 --> 01:30:51,260
it's so-called prototype or declaration.
1950
01:30:51,260 --> 01:30:51,760
Yeah?
1951
01:30:51,760 --> 01:30:55,662
AUDIENCE: Is there a library if you
don't know the size of the array?
1952
01:30:55,662 --> 01:30:58,120
DAVID MALAN: Really good
question, and a perfect segue way.
1953
01:30:58,120 --> 01:31:01,078
Is there a library you can use if
you don't know the size of the array?
1954
01:31:01,078 --> 01:31:01,720
No.
1955
01:31:01,720 --> 01:31:07,660
And so if any of you have programmed
in Java or Python or other languages,
1956
01:31:07,660 --> 01:31:11,020
you can actually just ask
the array, how big is it?
1957
01:31:11,020 --> 01:31:13,778
In C, you and I, the
programmers, have to remember it.
1958
01:31:13,778 --> 01:31:15,820
And so short answer, no,
there's no function that
1959
01:31:15,820 --> 01:31:17,445
will just automatically do this for us.
1960
01:31:17,445 --> 01:31:20,230
And in fact, let me
make a more subtle claim
1961
01:31:20,230 --> 01:31:23,950
that it's fine to use global
variables like this if they're really
1962
01:31:23,950 --> 01:31:25,160
for configuration options.
1963
01:31:25,160 --> 01:31:25,660
Why?
1964
01:31:25,660 --> 01:31:28,160
It's just convenient to put
them at the very top of the file
1965
01:31:28,160 --> 01:31:30,565
because everyone, you,
your colleagues, your TAs
1966
01:31:30,565 --> 01:31:32,440
are going to see them
at the top of the code.
1967
01:31:32,440 --> 01:31:36,130
But you really shouldn't be using
them everywhere throughout your code.
1968
01:31:36,130 --> 01:31:38,380
It'd be better if the average
function, itself, were
1969
01:31:38,380 --> 01:31:40,610
independent of that special variable.
1970
01:31:40,610 --> 01:31:42,025
So by that, I mean this.
1971
01:31:42,025 --> 01:31:46,240
You know what I should really do, if
I really want to be well-designed?
1972
01:31:46,240 --> 01:31:51,400
I should pass in the length of
the array to the average function.
1973
01:31:51,400 --> 01:31:54,310
I should give the average
function a second argument--
1974
01:31:54,310 --> 01:31:57,800
I'll call it length, for instance,
but I could call it anything I want.
1975
01:31:57,800 --> 01:32:02,500
And so rather than putting N all the
way down here at the bottom of my file,
1976
01:32:02,500 --> 01:32:05,745
let me just dynamically
say length instead.
1977
01:32:05,745 --> 01:32:08,620
And this is a subtlety-- and no need
to get too tripped up over this.
1978
01:32:08,620 --> 01:32:11,830
But this, now, is just an example
of how the same function can
1979
01:32:11,830 --> 01:32:13,690
take not one, but two arguments.
1980
01:32:13,690 --> 01:32:19,400
But indeed, in C, you must remember,
yourself, what the length of an array
1981
01:32:19,400 --> 01:32:19,900
is.
1982
01:32:19,900 --> 01:32:22,810
You can't just ask the
array via some syntax
1983
01:32:22,810 --> 01:32:26,560
like you can, those of you who've
programmed before in Java or Python.
1984
01:32:26,560 --> 01:32:27,070
Yeah?
1985
01:32:27,070 --> 01:32:35,115
AUDIENCE: [INAUDIBLE]
1986
01:32:35,115 --> 01:32:36,240
DAVID MALAN: Good question.
1987
01:32:36,240 --> 01:32:39,198
Would it be better designed to write
a function that computes the size?
1988
01:32:39,198 --> 01:32:42,570
Short answer, can't do that in
C. As soon as you pass an array
1989
01:32:42,570 --> 01:32:47,263
into a function in C, you cannot figure
out its size if it's a generic array
1990
01:32:47,263 --> 01:32:48,180
like that of integers.
1991
01:32:48,180 --> 01:32:51,040
There are special cases
that you can do that.
1992
01:32:51,040 --> 01:32:53,283
But in general, no, it's
just not possible in C.
1993
01:32:53,283 --> 01:32:55,200
And if that's some
frustration, honestly, this
1994
01:32:55,200 --> 01:32:57,180
is why more modern
languages add that feature.
1995
01:32:57,180 --> 01:32:57,680
Why?
1996
01:32:57,680 --> 01:32:59,910
Because it was really
annoying, as I'm alluding here
1997
01:32:59,910 --> 01:33:01,560
to not having that information.
1998
01:33:01,560 --> 01:33:03,643
Now, just to make sure I
didn't screw up anywhere,
1999
01:33:03,643 --> 01:33:07,540
let me compile this
final version of scores.
2000
01:33:07,540 --> 01:33:08,620
Suspense.
2001
01:33:08,620 --> 01:33:14,030
All good. ./scores, 72, 73, 33,
and we're still back in business.
2002
01:33:14,030 --> 01:33:15,530
So this version is more complicated.
2003
01:33:15,530 --> 01:33:18,738
And as always, we'll have this version
on the course's website for reference.
2004
01:33:18,738 --> 01:33:20,740
But the point, really,
is that arrays, not only
2005
01:33:20,740 --> 01:33:23,290
can be used as containers
to store multiple values--
2006
01:33:23,290 --> 01:33:25,490
three or more in this case--
2007
01:33:25,490 --> 01:33:30,440
you can also even pass them
around as arguments, as such.
2008
01:33:30,440 --> 01:33:34,300
All right, now besides that,
let's simplify for just a moment,
2009
01:33:34,300 --> 01:33:36,100
and consider now the world of chars.
2010
01:33:36,100 --> 01:33:39,200
If we've just got single
bytes, where does this lead us?
2011
01:33:39,200 --> 01:33:41,200
And how does this get us,
ultimately, to strings
2012
01:33:41,200 --> 01:33:44,170
to solve problems like readability
and cryptography and the like?
2013
01:33:44,170 --> 01:33:46,390
Well here, for instance,
are three lines of code,
2014
01:33:46,390 --> 01:33:48,967
out of context, that
simply store three chars.
2015
01:33:48,967 --> 01:33:50,800
And you can already see
where this is going.
2016
01:33:50,800 --> 01:33:53,920
Having three variables
called c1, c2, c3 is clearly
2017
01:33:53,920 --> 01:33:57,470
going to end up being bad design because
of all the silly redundancy here.
2018
01:33:57,470 --> 01:33:59,650
But notice, I'm using
single quotes like last week
2019
01:33:59,650 --> 01:34:01,330
because these are single chars.
2020
01:34:01,330 --> 01:34:03,647
What does this look like
in the computer's memory?
2021
01:34:03,647 --> 01:34:05,480
Well, it looks a little
something like this.
2022
01:34:05,480 --> 01:34:09,730
If we clear out the old
memory, c1, c2, c3 probably
2023
01:34:09,730 --> 01:34:12,562
will end up here, maybe not literally
in the top left-hand corner.
2024
01:34:12,562 --> 01:34:14,020
This is just an artist's rendition.
2025
01:34:14,020 --> 01:34:18,440
But c1, c2, c3 will
probably end up like that.
2026
01:34:18,440 --> 01:34:20,020
Now, what's really there?
2027
01:34:20,020 --> 01:34:21,730
It's really those same three numbers--
2028
01:34:21,730 --> 01:34:23,350
72, 73, 33.
2029
01:34:23,350 --> 01:34:27,920
But how many bits does a byte have?
2030
01:34:27,920 --> 01:34:28,880
Just eight.
2031
01:34:28,880 --> 01:34:33,830
So if we were to look at the binary
representation of these characters,
2032
01:34:33,830 --> 01:34:35,330
it would only be eight bits each.
2033
01:34:35,330 --> 01:34:39,140
That's enough to store small
numbers like 72, 73, 33.
2034
01:34:39,140 --> 01:34:41,580
We're not dealing with Unicode
and emoji and the like.
2035
01:34:41,580 --> 01:34:42,837
But the point is the same.
2036
01:34:42,837 --> 01:34:45,170
You don't have to use four
bytes to store these numbers.
2037
01:34:45,170 --> 01:34:48,087
You can use a different data type
like chars, and underneath the hood,
2038
01:34:48,087 --> 01:34:51,420
it's, indeed, going to use
just single bytes for each.
2039
01:34:51,420 --> 01:34:55,850
But this is sort of like a-- this isn't
really how we implement strings, right?
2040
01:34:55,850 --> 01:34:59,270
When you wanted to say, hi, last
week, or this, we used double quotes.
2041
01:34:59,270 --> 01:35:02,400
And we wrote all of the things together
and used one variable, not three,
2042
01:35:02,400 --> 01:35:02,900
right?
2043
01:35:02,900 --> 01:35:06,260
When I typed in David, I didn't
have a variable for D-A-V-I-D.
2044
01:35:06,260 --> 01:35:09,750
I had one variable called name
that stored the whole thing.
2045
01:35:09,750 --> 01:35:13,310
So in C, we keep talking about
these things called strings.
2046
01:35:13,310 --> 01:35:17,427
We'll see, eventually, that strings are
not necessarily what they seem to be.
2047
01:35:17,427 --> 01:35:19,760
But for now, the key thing
about strings is that they're
2048
01:35:19,760 --> 01:35:22,070
variable length, so to speak, right?
2049
01:35:22,070 --> 01:35:25,250
They might be three characters,
Hi, or five characters, David,
2050
01:35:25,250 --> 01:35:28,250
or anything smaller or larger.
2051
01:35:28,250 --> 01:35:30,980
So how do we go about
implementing strings,
2052
01:35:30,980 --> 01:35:33,110
if all we have at the end
of the day is my memory?
2053
01:35:33,110 --> 01:35:36,290
Well, here is an example of
just creating, declaring,
2054
01:35:36,290 --> 01:35:39,650
and defining a string called s. s
because it's just a simple string,
2055
01:35:39,650 --> 01:35:41,900
and quote unquote,
HI!, in double quotes.
2056
01:35:41,900 --> 01:35:44,090
What does this look like
in the computer's memory?
2057
01:35:44,090 --> 01:35:45,230
Well, let's clear it again.
2058
01:35:45,230 --> 01:35:48,110
And here, now, because it's
technically stored in one variable,
2059
01:35:48,110 --> 01:35:50,960
s, here is how I might
draw it as an artist.
2060
01:35:50,960 --> 01:35:52,520
It's three bytes in total--
2061
01:35:52,520 --> 01:35:53,990
H-I exclamation point.
2062
01:35:53,990 --> 01:35:59,630
But there's no c1, c2, c3, it's
just, the whole thing is s.
2063
01:35:59,630 --> 01:36:03,800
But it turns out that
a string, fun fact,
2064
01:36:03,800 --> 01:36:06,990
is really just what underneath the hood?
2065
01:36:06,990 --> 01:36:09,610
Kind of leading up to this--
2066
01:36:09,610 --> 01:36:12,090
what is a string, if this is
how it's laid out in memory?
2067
01:36:12,090 --> 01:36:13,190
AUDIENCE: An array.
2068
01:36:13,190 --> 01:36:15,830
DAVID MALAN: Literally, it's
just an array of characters.
2069
01:36:15,830 --> 01:36:18,590
And we didn't have to know about
arrays last week to use strings.
2070
01:36:18,590 --> 01:36:21,382
This is where, again, the training
wheels are starting to come off.
2071
01:36:21,382 --> 01:36:23,730
But a string is just
an array of characters.
2072
01:36:23,730 --> 01:36:26,040
H-I exclamation point, for instance.
2073
01:36:26,040 --> 01:36:28,370
So technically, an array--
2074
01:36:28,370 --> 01:36:33,890
or a string called s is really a
variable called s that allows you
2075
01:36:33,890 --> 01:36:38,150
to get at the first character with
s[0], if you want-- s[1], s[2].
2076
01:36:38,150 --> 01:36:40,340
You can literally get
individual characters
2077
01:36:40,340 --> 01:36:43,820
just by treating s as though
it's an array, which it really
2078
01:36:43,820 --> 01:36:47,000
is underneath the hood, in this case.
2079
01:36:47,000 --> 01:36:48,560
But there's a catch.
2080
01:36:48,560 --> 01:36:51,500
How do you know where strings end?
2081
01:36:51,500 --> 01:36:54,560
In the past, when I drew
some integers on the screen,
2082
01:36:54,560 --> 01:36:57,080
I know, I claim they
always take up 4 bytes.
2083
01:36:57,080 --> 01:37:00,200
If I had drawn a long, it
always takes up 8 bytes.
2084
01:37:00,200 --> 01:37:03,530
If I had drawn a character,
it always takes up 1 byte.
2085
01:37:03,530 --> 01:37:06,533
But how many bytes
does a string take up?
2086
01:37:06,533 --> 01:37:08,450
Yeah, I mean, that's
kind of the right answer.
2087
01:37:08,450 --> 01:37:10,490
In this case, three, it would seem.
2088
01:37:10,490 --> 01:37:13,490
But if it's David, that's
a good five characters.
2089
01:37:13,490 --> 01:37:16,173
But where do we put the number three?
2090
01:37:16,173 --> 01:37:17,840
Where do you put the number five, right?
2091
01:37:17,840 --> 01:37:20,190
This is literally all
that's inside your computer.
2092
01:37:20,190 --> 01:37:23,430
This is all our building
blocks in front of us.
2093
01:37:23,430 --> 01:37:25,490
So how can we-- where does the three go?
2094
01:37:25,490 --> 01:37:26,540
Where does the five go?
2095
01:37:26,540 --> 01:37:29,420
Well, it turns out you can solve
this in a couple of different ways.
2096
01:37:29,420 --> 01:37:34,160
But the way humans decided to implement
strings years ago is, indeed, an array,
2097
01:37:34,160 --> 01:37:38,960
but they added one extra byte at
the end of every such string array,
2098
01:37:38,960 --> 01:37:41,840
just to make clear, with a
so-called sentinel value,
2099
01:37:41,840 --> 01:37:44,480
that the string ends here.
2100
01:37:44,480 --> 01:37:45,050
Why?
2101
01:37:45,050 --> 01:37:47,930
So that if you have two strings
in the computer's memory like, HI!
2102
01:37:47,930 --> 01:37:52,760
and bye, you know where the barrier is
between the exclamation point of one
2103
01:37:52,760 --> 01:37:54,590
and the letter B in the next, right?
2104
01:37:54,590 --> 01:37:56,000
You need some kind of delimiter.
2105
01:37:56,000 --> 01:38:00,110
And so what really is
underneath the hood is this.
2106
01:38:00,110 --> 01:38:04,460
When you store a string in memory, when
you type in a string-- as the user,
2107
01:38:04,460 --> 01:38:07,040
if you type in 3 characters,
it's going to use
2108
01:38:07,040 --> 01:38:10,280
3 plus 1 equals 4 bytes in total.
2109
01:38:10,280 --> 01:38:14,130
If you type in David, it's going to
use 5 plus 1 equals 6 bytes in total.
2110
01:38:14,130 --> 01:38:14,630
Why?
2111
01:38:14,630 --> 01:38:20,210
Because C automatically adds this
special 0 at the end of the string.
2112
01:38:20,210 --> 01:38:24,710
I've drawn it with backslash 0 because
this is how you represent 0 as a char,
2113
01:38:24,710 --> 01:38:25,710
as a character.
2114
01:38:25,710 --> 01:38:28,230
But this is literally
just 0, as we'll soon see.
2115
01:38:28,230 --> 01:38:31,100
So any time there's a string
in memory, it always takes up
2116
01:38:31,100 --> 01:38:36,197
one more byte than you, yourself,
as the programmer or human typed in.
2117
01:38:36,197 --> 01:38:38,780
In fact, if we convert this
again, just for discussion's sake,
2118
01:38:38,780 --> 01:38:41,572
to those integers, what's literally
stored in the computer's memory
2119
01:38:41,572 --> 01:38:45,170
is going to be 72, 73, 33, and now a 0.
2120
01:38:45,170 --> 01:38:48,240
And the computer, because of
C and how it was invented,
2121
01:38:48,240 --> 01:38:51,350
it's just smart enough to know
that when you print out a string,
2122
01:38:51,350 --> 01:38:54,530
it prints out every
character until it sees a 0,
2123
01:38:54,530 --> 01:38:56,150
and then it just stops printing.
2124
01:38:56,150 --> 01:38:58,470
In particular, printf
knows how this works.
2125
01:38:58,470 --> 01:39:02,050
And this is why printf
knows when to stop printing.
2126
01:39:02,050 --> 01:39:03,800
Decimal numbers are
not that enlightening.
2127
01:39:03,800 --> 01:39:05,940
We'll generally write
the characters like this.
2128
01:39:05,940 --> 01:39:09,350
And again, backslash 0 is
just special symbology.
2129
01:39:09,350 --> 01:39:13,190
It's what the programmer types to make
clear that you're not saying, HI!, 0.
2130
01:39:13,190 --> 01:39:15,980
You're saying HI!, and
then it's a special 0.
2131
01:39:15,980 --> 01:39:20,887
Specifically, it is eight
0 bits that indicate
2132
01:39:20,887 --> 01:39:22,220
that it's the end of the string.
2133
01:39:22,220 --> 01:39:26,330
Technically, that backslash zero, if
you want to be fancy, it's called null,
2134
01:39:26,330 --> 01:39:27,320
N-U-L-L.
2135
01:39:27,320 --> 01:39:30,320
And it turns out, you've seen this
before, though we didn't call it out.
2136
01:39:30,320 --> 01:39:33,230
Here's that same ASCII chart
from the past couple of weeks.
2137
01:39:33,230 --> 01:39:39,080
If I highlight this, what is
decimal number 0 mapping to?
2138
01:39:39,080 --> 01:39:42,830
NUL, which is just programmer speak
for the special null character.
2139
01:39:42,830 --> 01:39:46,550
All 0 bits that means
the string ends here.
2140
01:39:46,550 --> 01:39:48,510
This all happens automatically for you.
2141
01:39:48,510 --> 01:39:53,420
You do not need to create these
null characters or these zeros.
2142
01:39:53,420 --> 01:40:00,030
Any questions then, on this
implementation thus far?
2143
01:40:00,030 --> 01:40:01,820
Any questions here?
2144
01:40:01,820 --> 01:40:02,320
No?
2145
01:40:02,320 --> 01:40:03,195
Well, let me do this.
2146
01:40:03,195 --> 01:40:05,310
Let me go back to VS Code in a second.
2147
01:40:05,310 --> 01:40:07,770
And let's actually corroborate
this with some code.
2148
01:40:07,770 --> 01:40:10,830
Let me go ahead and create
a small program called hi.c.
2149
01:40:10,830 --> 01:40:12,070
And how about we do this?
2150
01:40:12,070 --> 01:40:14,550
Let me include stdio.h.
2151
01:40:14,550 --> 01:40:18,670
Let me include-- let me type
out int main void, as always.
2152
01:40:18,670 --> 01:40:20,910
And now let me do something
simple and kind of bad,
2153
01:40:20,910 --> 01:40:24,960
but char c1 equals quote
unquote, h, in single quotes.
2154
01:40:24,960 --> 01:40:28,590
Char c2 equals quote
unquote, I, in single quotes.
2155
01:40:28,590 --> 01:40:32,830
And lastly, char c3 equals
exclamation point, in single quotes.
2156
01:40:32,830 --> 01:40:34,500
And now, let me just print this out.
2157
01:40:34,500 --> 01:40:36,960
I can't use %s because
that is not a string.
2158
01:40:36,960 --> 01:40:40,290
That's literally three chars, because
that's the design decision I made.
2159
01:40:40,290 --> 01:40:41,430
But I could do this--
2160
01:40:41,430 --> 01:40:48,600
%c, %c, %c, which we haven't seen
before, but %s is string, %i is int,
2161
01:40:48,600 --> 01:40:51,060
%c is, indeed, char.
2162
01:40:51,060 --> 01:40:54,150
So let me put a backslash n
at the end for cleanliness,
2163
01:40:54,150 --> 01:40:56,280
and now do, c1, c2, c3.
2164
01:40:56,280 --> 01:41:00,430
So this is like a char-based
version of printing string.
2165
01:41:00,430 --> 01:41:01,650
So let me make HI!
2166
01:41:01,650 --> 01:41:05,880
And then let me do ./hi, and it
looks like I used printf with %s.
2167
01:41:05,880 --> 01:41:09,750
But I did things very manually by
printing out each individual character.
2168
01:41:09,750 --> 01:41:11,700
What's cool now,
though, is that once you
2169
01:41:11,700 --> 01:41:15,270
know that characters are just numbers
and strings are just characters,
2170
01:41:15,270 --> 01:41:16,560
you can kind of poke around.
2171
01:41:16,560 --> 01:41:21,970
Let me change all three
placeholders to %i instead.
2172
01:41:21,970 --> 01:41:23,860
And this is totally fine, too.
2173
01:41:23,860 --> 01:41:26,310
Let me rerun this, make hi.
2174
01:41:26,310 --> 01:41:31,570
Actually, let me make one
change, just so we can see this.
2175
01:41:31,570 --> 01:41:37,710
Let me add spaces, just for aesthetics
sake, let me do make hi, ./hi, Enter,
2176
01:41:37,710 --> 01:41:40,350
and voila, like now, you can
actually see the numbers,
2177
01:41:40,350 --> 01:41:44,085
that I claimed back in week zero, were
in fact happening underneath the hood.
2178
01:41:44,085 --> 01:41:45,960
Well, this is not how
you would make strings.
2179
01:41:45,960 --> 01:41:49,457
It'd be incredibly tedious to have three
variables for three letter words, five
2180
01:41:49,457 --> 01:41:50,790
variables for five letter words.
2181
01:41:50,790 --> 01:41:52,998
We've been using, of course,
strings since last week,
2182
01:41:52,998 --> 01:41:54,450
so let's do that instead.
2183
01:41:54,450 --> 01:41:59,370
String s equals quote
unquote, double quotes "HI!"
2184
01:41:59,370 --> 01:42:02,520
For this, no, because of
these training wheels,
2185
01:42:02,520 --> 01:42:04,560
I need to include the CS50 library.
2186
01:42:04,560 --> 01:42:06,580
But we'll come back to
that in the coming weeks.
2187
01:42:06,580 --> 01:42:10,530
But for now, I'm going to go ahead and
create a string s called quote unquote,
2188
01:42:10,530 --> 01:42:11,580
"HI!"
2189
01:42:11,580 --> 01:42:14,760
And now I'm going to change
this to be my familiar %s,
2190
01:42:14,760 --> 01:42:17,610
and now just print out s itself.
2191
01:42:17,610 --> 01:42:20,430
This, of course, is the same
thing as last week, ./hi,
2192
01:42:20,430 --> 01:42:24,750
gives me the exact same thing, but now,
we're dealing, of course, with strings.
2193
01:42:24,750 --> 01:42:27,610
But how can we see a little beyond that?
2194
01:42:27,610 --> 01:42:28,810
Well, how about this?
2195
01:42:28,810 --> 01:42:31,530
Let's poke around further
with today's primitives.
2196
01:42:31,530 --> 01:42:35,580
Even though s is a string, I could
technically print out its first
2197
01:42:35,580 --> 01:42:39,000
character with %c by doing s[0].
2198
01:42:39,000 --> 01:42:43,110
I could technically print out its
second character with %c by doing s[1].
2199
01:42:43,110 --> 01:42:47,820
I could print out its third character
with %c and printing out s[2].
2200
01:42:47,820 --> 01:42:50,430
So again, this just derives
logically from my understanding
2201
01:42:50,430 --> 01:42:52,770
now that strings are
arrays, as you note.
2202
01:42:52,770 --> 01:42:54,540
Let me do make--
2203
01:42:54,540 --> 01:42:57,300
let me do make hi, ./hi.
2204
01:42:57,300 --> 01:43:00,760
And no visual change, but I'm
just kind of now tinkering around.
2205
01:43:00,760 --> 01:43:03,400
And in fact, if you're really
curious, let me do this.
2206
01:43:03,400 --> 01:43:06,870
Let me change these
back to i, back to i--
2207
01:43:06,870 --> 01:43:08,250
oops, back to i.
2208
01:43:08,250 --> 01:43:11,310
And let me add a fourth one
because if I'm really curious now,
2209
01:43:11,310 --> 01:43:14,490
let's see what's in s[3].
2210
01:43:14,490 --> 01:43:16,020
This is the fourth byte.
2211
01:43:16,020 --> 01:43:18,990
And even though the
string itself is H-I,
2212
01:43:18,990 --> 01:43:21,840
I think we can corroborate
this whole null thing.
2213
01:43:21,840 --> 01:43:26,248
Make hi, ./hi, Enter, and there it is.
2214
01:43:26,248 --> 01:43:28,290
You could have done this
last week, if you really
2215
01:43:28,290 --> 01:43:29,580
wanted to geek out on strings.
2216
01:43:29,580 --> 01:43:33,060
But for now, it's just revealing
what's going on underneath the hood.
2217
01:43:33,060 --> 01:43:36,480
Questions then, on
what these strings are?
2218
01:43:36,480 --> 01:43:37,498
Yeah?
2219
01:43:37,498 --> 01:43:41,293
AUDIENCE: [INAUDIBLE]
2220
01:43:41,293 --> 01:43:42,960
DAVID MALAN: Why do we need the bracket?
2221
01:43:42,960 --> 01:43:45,430
AUDIENCE: [INAUDIBLE]
2222
01:43:45,430 --> 01:43:47,180
DAVID MALAN: Why do
you not need brackets?
2223
01:43:47,180 --> 01:43:47,780
Good question.
2224
01:43:47,780 --> 01:43:51,620
Why do I not need brackets on line 6?
2225
01:43:51,620 --> 01:43:53,300
Because s is a string.
2226
01:43:53,300 --> 01:43:56,930
We'll see in a couple of
weeks that s is, essentially,
2227
01:43:56,930 --> 01:44:00,200
implemented underneath the
hood, indeed, as an array,
2228
01:44:00,200 --> 01:44:02,240
but that happens automatically for you.
2229
01:44:02,240 --> 01:44:06,800
You can treat s as just a variable
name without square brackets.
2230
01:44:06,800 --> 01:44:09,500
You will use square brackets
when you have arrays of ints
2231
01:44:09,500 --> 01:44:13,730
or you manually create arrays of chars
or doubles or floats or anything else.
2232
01:44:13,730 --> 01:44:14,900
But strings are special.
2233
01:44:14,900 --> 01:44:15,440
Why?
2234
01:44:15,440 --> 01:44:19,190
I mean, every program you write seems
to use strings, text in some form.
2235
01:44:19,190 --> 01:44:21,930
We're humans we like text,
not just numbers and such.
2236
01:44:21,930 --> 01:44:25,910
So this is just treated a little
specially in C and many other languages
2237
01:44:25,910 --> 01:44:28,580
as well.
2238
01:44:28,580 --> 01:44:31,170
Other questions on this here?
2239
01:44:31,170 --> 01:44:31,670
No?
2240
01:44:31,670 --> 01:44:33,530
Let's add then, one
other string to the mix.
2241
01:44:33,530 --> 01:44:36,290
So instead of just saying, HI!,
why don't we consider a version
2242
01:44:36,290 --> 01:44:38,660
of the program that
says both, HI! and BYE!.
2243
01:44:38,660 --> 01:44:41,420
And I claim now that
that backslash zero,
2244
01:44:41,420 --> 01:44:44,270
that null character is going
to be ever more important now
2245
01:44:44,270 --> 01:44:46,820
if we've got two strings
in memory, so that C knows
2246
01:44:46,820 --> 01:44:48,570
how to distinguish one from the other.
2247
01:44:48,570 --> 01:44:51,487
So let me go ahead and just get rid
of these two lines for the moment.
2248
01:44:51,487 --> 01:44:55,430
Let me recreate string s equals,
quote unquote double quotes, "HI!"
2249
01:44:55,430 --> 01:44:56,780
Let me give myself another one.
2250
01:44:56,780 --> 01:44:59,905
And because I'm just playing around,
I'll choose very short variable names.
2251
01:44:59,905 --> 01:45:04,410
String t equals quote unquote, "BYE!"
2252
01:45:04,410 --> 01:45:06,470
And then let me just
print them both out.
2253
01:45:06,470 --> 01:45:11,300
Let me go ahead and print
out %s, backslash n, comma s,
2254
01:45:11,300 --> 01:45:16,910
and then printf %s
backslash n, and then t.
2255
01:45:16,910 --> 01:45:19,970
So very simple demonstration
of just these two variables.
2256
01:45:19,970 --> 01:45:26,090
Make hi, ./hi, and of course, it prints
out two lines, one after the other.
2257
01:45:26,090 --> 01:45:27,980
What's actually going
on underneath the hood?
2258
01:45:27,980 --> 01:45:29,510
Well, let's go back to
the computer's memory.
2259
01:45:29,510 --> 01:45:32,160
HI!, I think, is going to be,
I claim, pretty much the same.
2260
01:45:32,160 --> 01:45:36,170
So s, I'll claim, is in the top
left, followed by the backslash zero.
2261
01:45:36,170 --> 01:45:40,035
And that's important now because BYE!
probably is going to end up there.
2262
01:45:40,035 --> 01:45:43,160
And visually, it wraps just by nature
of how I've drawn this grid of bytes,
2263
01:45:43,160 --> 01:45:44,330
but it's contiguous.
2264
01:45:44,330 --> 01:45:46,340
B-Y-E-!
2265
01:45:46,340 --> 01:45:51,470
null, A.K.A. backslash zero,
this is now helpful to printf
2266
01:45:51,470 --> 01:45:55,550
because now printf knows
where one begins and ends
2267
01:45:55,550 --> 01:45:58,580
by way of that special null character.
2268
01:45:58,580 --> 01:46:00,230
But we can poke around now, too.
2269
01:46:00,230 --> 01:46:01,620
What else can I do here?
2270
01:46:01,620 --> 01:46:02,840
How about this?
2271
01:46:02,840 --> 01:46:08,870
How about I go into my code here,
back to VS code, and let me go ahead
2272
01:46:08,870 --> 01:46:13,790
and say something like, well, if
I've got two of these strings,
2273
01:46:13,790 --> 01:46:15,410
you know, let's put them in an array.
2274
01:46:15,410 --> 01:46:20,520
Let's kind of do this sort of arrays in
arrays, sort of inception-style here.
2275
01:46:20,520 --> 01:46:23,060
So string words[2].
2276
01:46:23,060 --> 01:46:25,100
So give me an array
of two strings is what
2277
01:46:25,100 --> 01:46:28,100
I'm saying here in code, even though
we've not done it with strings yet.
2278
01:46:28,100 --> 01:46:29,270
We only did it with ints.
2279
01:46:29,270 --> 01:46:30,770
And now let me do this.
2280
01:46:30,770 --> 01:46:35,480
The first word A.K.A. words[0]
will equal, as before, HI!
2281
01:46:35,480 --> 01:46:40,940
And now words[1] will
equal quote unquote, "BYE!"
2282
01:46:40,940 --> 01:46:43,760
And now I've done the exact
same thing, but again, I'm
2283
01:46:43,760 --> 01:46:48,650
just avoiding having s, t, q, r, and all
these different variables in my code.
2284
01:46:48,650 --> 01:46:52,790
I just now am treating them as
one single array of strings.
2285
01:46:52,790 --> 01:46:54,750
How do I change my code down here?
2286
01:46:54,750 --> 01:46:57,380
Well, if I want to print the
first word, I do words[0].
2287
01:46:57,380 --> 01:46:59,900
And if I want to print the
second word, I do words[1].
2288
01:46:59,900 --> 01:47:02,088
This is not a useful
exercise at the moment
2289
01:47:02,088 --> 01:47:04,130
because I'm just making
my code more complicated.
2290
01:47:04,130 --> 01:47:06,830
But again, it allows us to
poke around and see what's
2291
01:47:06,830 --> 01:47:08,690
going on because there is that HI!
2292
01:47:08,690 --> 01:47:09,530
and BYE!.
2293
01:47:09,530 --> 01:47:10,700
But watch this.
2294
01:47:10,700 --> 01:47:14,670
If I really want to be
cool, I can do this.
2295
01:47:14,670 --> 01:47:24,380
Let's print out %c, %c, %c, backslash
n, and then here, %c, %c, %c, %c,
2296
01:47:24,380 --> 01:47:25,700
so four of those.
2297
01:47:25,700 --> 01:47:28,430
And now here's where
things get interesting.
2298
01:47:28,430 --> 01:47:30,620
Words is an array of strings.
2299
01:47:30,620 --> 01:47:33,400
Again, if I may, what's a string?
2300
01:47:33,400 --> 01:47:35,060
An array of characters.
2301
01:47:35,060 --> 01:47:36,790
So just use the same logic.
2302
01:47:36,790 --> 01:47:41,110
If words is an array of strings, you
get at the first string with words[0].
2303
01:47:41,110 --> 01:47:44,530
How do you get at the first
character in the first string?
2304
01:47:44,530 --> 01:47:52,150
Bracket 0, words[0][1],
and lastly, words[0][2].
2305
01:47:52,150 --> 01:47:57,460
And now down here, words[1], but
the first character is there.
2306
01:47:57,460 --> 01:48:00,400
Word[1], the second character is here.
2307
01:48:00,400 --> 01:48:03,190
Words[1], the third character is here--
2308
01:48:03,190 --> 01:48:04,720
whoops-- third character's here.
2309
01:48:04,720 --> 01:48:07,898
And words[1], the fourth
character is here.
2310
01:48:07,898 --> 01:48:09,190
This is not how people program.
2311
01:48:09,190 --> 01:48:10,840
This is only for demonstrations sake.
2312
01:48:10,840 --> 01:48:13,060
My God, it's so tedious
and verbose already.
2313
01:48:13,060 --> 01:48:20,410
But if I make hi now, ./hi, now,
I'm manually reinventing %s,
2314
01:48:20,410 --> 01:48:22,990
if I forgot it existed, using %c alone.
2315
01:48:22,990 --> 01:48:25,900
But you can indeed manipulate
arrays in this way.
2316
01:48:25,900 --> 01:48:28,300
But because strings are
arrays of characters,
2317
01:48:28,300 --> 01:48:32,200
you can manipulate
strings in this way too.
2318
01:48:32,200 --> 01:48:34,675
Any question now on this syntax?
2319
01:48:34,675 --> 01:48:37,210
2320
01:48:37,210 --> 01:48:38,800
Any questions here?
2321
01:48:38,800 --> 01:48:39,460
No?
2322
01:48:39,460 --> 01:48:39,970
No?
2323
01:48:39,970 --> 01:48:42,070
All right, well, let's
go ahead and propose
2324
01:48:42,070 --> 01:48:45,830
that we solve a couple of other
problems we might not have as before.
2325
01:48:45,830 --> 01:48:49,150
But first, a quick visual of what's
been going on underneath the hood here.
2326
01:48:49,150 --> 01:48:52,420
If here, again, is where we left
off on the screen, HI! and BYE!
2327
01:48:52,420 --> 01:48:56,470
back to back, here is really
how I just treated these things.
2328
01:48:56,470 --> 01:49:00,880
s bracket 0, 1, 2, 3 and
then t 0, 1, 2, 3, 4.
2329
01:49:00,880 --> 01:49:04,840
But really, once I put them in an
array, the picture becomes this.
2330
01:49:04,840 --> 01:49:07,030
Words[0] is the whole HI!.
2331
01:49:07,030 --> 01:49:08,680
Words[1] is the whole BYE!.
2332
01:49:08,680 --> 01:49:11,470
But if I really get into
the weeds and start indexing
2333
01:49:11,470 --> 01:49:14,980
into individual characters in
those strings, all I'm using
2334
01:49:14,980 --> 01:49:20,710
is new syntax in order to
represent these same values here.
2335
01:49:20,710 --> 01:49:28,710
Questions then, on these
representations before we forge ahead?
2336
01:49:28,710 --> 01:49:29,430
No?
2337
01:49:29,430 --> 01:49:30,030
Yeah?
2338
01:49:30,030 --> 01:49:33,390
AUDIENCE: Does the new line
character not [INAUDIBLE]??
2339
01:49:33,390 --> 01:49:36,030
DAVID MALAN: Does the new line
character-- say that once more?
2340
01:49:36,030 --> 01:49:38,597
AUDIENCE: Does the new line
character take up any space?
2341
01:49:38,597 --> 01:49:40,180
DAVID MALAN: Ah, really good question.
2342
01:49:40,180 --> 01:49:42,730
Does the new line character
take up any space?
2343
01:49:42,730 --> 01:49:45,340
It does, so far as printf is concerned.
2344
01:49:45,340 --> 01:49:48,790
But I'm not storing the
backslash n in my strings,
2345
01:49:48,790 --> 01:49:53,460
printf is being manually
handed that thing instead.
2346
01:49:53,460 --> 01:49:55,520
All right, so let's go
ahead then and consider
2347
01:49:55,520 --> 01:49:58,970
how we might solve some problems that
have arisen now with these strings,
2348
01:49:58,970 --> 01:50:00,680
as follows here.
2349
01:50:00,680 --> 01:50:02,760
Suppose I-- let's do this.
2350
01:50:02,760 --> 01:50:04,400
Let me go back to VS Code here.
2351
01:50:04,400 --> 01:50:09,980
And let me go ahead and open up a
new file called, how about, length.c.
2352
01:50:09,980 --> 01:50:12,680
And let's consider for a moment
how I might actually figure out
2353
01:50:12,680 --> 01:50:16,130
what the length of a string is, which
is distinct from the length of an array.
2354
01:50:16,130 --> 01:50:19,680
I claimed earlier, you cannot figure out
dynamically what the length of an array
2355
01:50:19,680 --> 01:50:20,180
is.
2356
01:50:20,180 --> 01:50:24,020
But I can figure out the length
of a string, specifically, because
2357
01:50:24,020 --> 01:50:26,960
of this implementation detail
of that null character.
2358
01:50:26,960 --> 01:50:28,500
So let me go ahead and do this.
2359
01:50:28,500 --> 01:50:31,940
Let me include cs50.h in
this second program here.
2360
01:50:31,940 --> 01:50:35,090
Let me include stdio.h, as before.
2361
01:50:35,090 --> 01:50:38,120
And let me do this, int main void--
2362
01:50:38,120 --> 01:50:40,970
and the first thing I'll do is
just get a string from the user.
2363
01:50:40,970 --> 01:50:43,250
I'll ask the user, as
always, for their name.
2364
01:50:43,250 --> 01:50:48,170
So I'll call getString, and say, what's
your name, question mark, as always.
2365
01:50:48,170 --> 01:50:51,620
And then down here, if I want to
figure out the length of this string
2366
01:50:51,620 --> 01:50:56,210
and print the length out
on the screen, well, I
2367
01:50:56,210 --> 01:50:58,465
can kind of do this similar
in spirit to the average,
2368
01:50:58,465 --> 01:50:59,840
where I'm accumulating something.
2369
01:50:59,840 --> 01:51:02,600
Let me go ahead and initialize N to 0.
2370
01:51:02,600 --> 01:51:05,120
Let me give myself--
2371
01:51:05,120 --> 01:51:07,035
it's not a for loop
because I don't have a--
2372
01:51:07,035 --> 01:51:08,660
I don't know in advance how long it is.
2373
01:51:08,660 --> 01:51:09,980
But what if I do this?
2374
01:51:09,980 --> 01:51:20,600
While the value at name[n]
does not equal '/0'--
2375
01:51:20,600 --> 01:51:23,390
crazy syntax at the moment,
but it's just the culmination
2376
01:51:23,390 --> 01:51:25,590
of these various building blocks.
2377
01:51:25,590 --> 01:51:28,970
Let me just finish
the thought here, n++.
2378
01:51:28,970 --> 01:51:33,656
And then down here, let's just
print out, with printf and %i,
2379
01:51:33,656 --> 01:51:38,930
that value of N. So I claim this is
going to show me the length of any
2380
01:51:38,930 --> 01:51:43,220
string I type in, whether it's hi
or bye or David or anything else.
2381
01:51:43,220 --> 01:51:45,410
I initialize a variable
to zero, and that's good
2382
01:51:45,410 --> 01:51:47,535
because that's where you
start counting in general.
2383
01:51:47,535 --> 01:51:50,990
While name[0] does not
equal backslash zero.
2384
01:51:50,990 --> 01:51:51,930
What is this saying?
2385
01:51:51,930 --> 01:51:55,580
Well, if name is the string the user
typed in-- and name is just an array,
2386
01:51:55,580 --> 01:51:56,460
as you noted--
2387
01:51:56,460 --> 01:51:59,390
the name[0] is going to
be the first character.
2388
01:51:59,390 --> 01:52:02,750
And I'm asking the question, well,
does the first character not equal
2389
01:52:02,750 --> 01:52:03,680
backslash zero?
2390
01:52:03,680 --> 01:52:08,750
And if I type in David, D, it's not,
so I keep going and I add 1 to N.
2391
01:52:08,750 --> 01:52:10,750
Then I'm going to check name[1].
2392
01:52:10,750 --> 01:52:13,895
Well, if I typed in David,
name[1] is going to be A.
2393
01:52:13,895 --> 01:52:18,020
A does not equal backslash zero, and
so it's going to go again and again
2394
01:52:18,020 --> 01:52:18,740
and again.
2395
01:52:18,740 --> 01:52:23,090
But five steps in total later,
it's going to get to the byte after
2396
01:52:23,090 --> 01:52:26,480
D-A-V-I-D, realize, wait a
minute, that is a backslash n.
2397
01:52:26,480 --> 01:52:29,750
The loop finishes, and I
print out the total length.
2398
01:52:29,750 --> 01:52:33,050
Arrays, in general, do not
have this null character.
2399
01:52:33,050 --> 01:52:34,910
However, strings do.
2400
01:52:34,910 --> 01:52:38,150
Again, strings are special versus
all of the other data types
2401
01:52:38,150 --> 01:52:39,590
we've talked about thus far.
2402
01:52:39,590 --> 01:52:43,220
But how could I, for
instance, do this differently?
2403
01:52:43,220 --> 01:52:47,220
Well, let's actually factor this out
as a function, as I've commonly done.
2404
01:52:47,220 --> 01:52:50,540
But rather than implement
it myself, you know what?
2405
01:52:50,540 --> 01:52:54,140
It turns out what's nice
about strings being so common,
2406
01:52:54,140 --> 01:52:57,260
there are many other people who
have solved these problems before.
2407
01:52:57,260 --> 01:53:00,290
And in fact, there's a
whole string library in C.
2408
01:53:00,290 --> 01:53:04,190
It is used by way of a
header file called string.h.
2409
01:53:04,190 --> 01:53:08,400
And what string.h is, is a library
of string-related functions.
2410
01:53:08,400 --> 01:53:10,760
In fact, you can see
in CS50's manual pages
2411
01:53:10,760 --> 01:53:16,217
for C, the string.h functions, at least
those that we recommend as most useful,
2412
01:53:16,217 --> 01:53:18,050
and in particular, if
you poke around there,
2413
01:53:18,050 --> 01:53:20,290
you'll see that there's
a function called strlen.
2414
01:53:20,290 --> 01:53:22,055
It means string length.
2415
01:53:22,055 --> 01:53:24,680
It was named very succinctly,
just because it's a little easier
2416
01:53:24,680 --> 01:53:25,850
to type than string length.
2417
01:53:25,850 --> 01:53:28,800
But strlen tells you
the length of a string.
2418
01:53:28,800 --> 01:53:30,990
So how might I use this in my code here?
2419
01:53:30,990 --> 01:53:34,020
Well, it turns out, I can
simplify this quite a bit.
2420
01:53:34,020 --> 01:53:37,700
Let me get rid of my loop,
get rid of my accounting
2421
01:53:37,700 --> 01:53:40,880
manually, and do something
like this-- int n
2422
01:53:40,880 --> 01:53:45,630
equals strlen of the humans name, name.
2423
01:53:45,630 --> 01:53:49,430
And now I'll just use printf,
as before, with %i backslash n,
2424
01:53:49,430 --> 01:53:51,290
and output the value of n.
2425
01:53:51,290 --> 01:53:54,380
But there's a bug at the moment.
2426
01:53:54,380 --> 01:53:58,480
What have I forgotten to do?
2427
01:53:58,480 --> 01:54:01,670
Yeah, I have to include the header
file at the top of the screen,
2428
01:54:01,670 --> 01:54:03,260
so let me-- at the top of the code.
2429
01:54:03,260 --> 01:54:07,640
So let me also include
string.h at the top of my file,
2430
01:54:07,640 --> 01:54:10,970
so that C knows that,
in fact, strlen exists.
2431
01:54:10,970 --> 01:54:14,170
Let me go ahead and
make length, as before.
2432
01:54:14,170 --> 01:54:18,670
./length-- or actually, really for
the first time, what's your name?
2433
01:54:18,670 --> 01:54:22,360
D-A-V-I-D. And hopefully,
I'm going to see, in fact, 5.
2434
01:54:22,360 --> 01:54:26,950
By contrast, if I run it again
and type in HI!, now I see three.
2435
01:54:26,950 --> 01:54:29,785
So strlen is just one of the
functions in that library.
2436
01:54:29,785 --> 01:54:30,910
And there are so many more.
2437
01:54:30,910 --> 01:54:33,700
In fact, yet another library that
might be useful moving forward
2438
01:54:33,700 --> 01:54:37,570
is this one, ctype,
which relates to C data
2439
01:54:37,570 --> 01:54:40,580
types and lots of functions
therein that can be useful.
2440
01:54:40,580 --> 01:54:43,690
For instance, if you review its
documentation in the manual pages
2441
01:54:43,690 --> 01:54:46,930
online, you'll see that
there are functions via which
2442
01:54:46,930 --> 01:54:49,460
we can solve problems like this.
2443
01:54:49,460 --> 01:54:52,480
Let me go ahead and propose here--
2444
01:54:52,480 --> 01:54:53,680
let me see.
2445
01:54:53,680 --> 01:54:59,080
Let's do an example here involving--
2446
01:54:59,080 --> 01:55:03,250
how about checking if something
is uppercase or lowercase,
2447
01:55:03,250 --> 01:55:06,700
and converting it to uppercase only.
2448
01:55:06,700 --> 01:55:10,810
Let me go back to VS Code, and
code a program called uppercase.c.
2449
01:55:10,810 --> 01:55:15,220
In this, file I'm going to start by
including now, as always, cs50.h.
2450
01:55:15,220 --> 01:55:17,710
I'm going to include stdio.h.
2451
01:55:17,710 --> 01:55:21,670
And I'm going to add one
other to the mix, which
2452
01:55:21,670 --> 01:55:26,230
is string.h now too, so I can access
the length of things as needed.
2453
01:55:26,230 --> 01:55:28,570
Int main void comes next.
2454
01:55:28,570 --> 01:55:30,460
And then within my main
function, I'm going
2455
01:55:30,460 --> 01:55:32,230
to go ahead and declare
a string called s.
2456
01:55:32,230 --> 01:55:34,240
I'm going to call getString, as before.
2457
01:55:34,240 --> 01:55:38,170
And I'm going to go ahead and just ask
the user for a string called before.
2458
01:55:38,170 --> 01:55:39,670
I want to do a before and after.
2459
01:55:39,670 --> 01:55:41,350
Whatever the user types in is before.
2460
01:55:41,350 --> 01:55:44,770
But I want to force everything
to uppercase, thereafter.
2461
01:55:44,770 --> 01:55:48,740
Let me now, in this loop here, do this.
2462
01:55:48,740 --> 01:55:53,800
Let me printf quote unquote, "After,"
just so we can see this on the screen.
2463
01:55:53,800 --> 01:56:02,440
And let me do four int i gets 0,
i is less than strlen of s, i++.
2464
01:56:02,440 --> 01:56:03,610
What am I about to do?
2465
01:56:03,610 --> 01:56:06,190
I'm about to iterate over
every character in the string
2466
01:56:06,190 --> 01:56:11,230
from left to right, from 0 on up to,
but not through, the length of s.
2467
01:56:11,230 --> 01:56:13,990
And how do I check if
something is lowercase,
2468
01:56:13,990 --> 01:56:16,990
so that I can actually
force it to uppercase?
2469
01:56:16,990 --> 01:56:19,630
Well, it turns out, I
could do this literally.
2470
01:56:19,630 --> 01:56:27,436
If the character in s at location i
is greater than or equal to capital A,
2471
01:56:27,436 --> 01:56:31,780
ampersand, ampersand, which means
and instead of or, which we saw
2472
01:56:31,780 --> 01:56:37,930
in the past, s[i] is less than
or equal to little z, that means,
2473
01:56:37,930 --> 01:56:41,800
logically in English, that
this is indeed lowercase.
2474
01:56:41,800 --> 01:56:44,830
How do I now convert it to
uppercase, this character?
2475
01:56:44,830 --> 01:56:48,160
Well, I could just literally
print out the same character.
2476
01:56:48,160 --> 01:56:52,280
But that would not be the answer here
because that's not changing the value.
2477
01:56:52,280 --> 01:56:54,470
But what could I do instead?
2478
01:56:54,470 --> 01:56:59,890
Well, let me actually pull up here
real fast the ASCII chart as before,
2479
01:56:59,890 --> 01:57:03,220
and let's see if we
can't glean some insight.
2480
01:57:03,220 --> 01:57:05,710
If I pull up the same
ASCII chart, and suppose
2481
01:57:05,710 --> 01:57:09,790
the human has typed in a
lowercase a, that's 97.
2482
01:57:09,790 --> 01:57:13,240
What letter-- I want to
convert it to uppercase
2483
01:57:13,240 --> 01:57:18,660
A, so what number do I want to
convert the 97 to, per week zero?
2484
01:57:18,660 --> 01:57:21,000
So 65, we keep coming back to that one.
2485
01:57:21,000 --> 01:57:23,010
What if the user types in lowercase b?
2486
01:57:23,010 --> 01:57:27,550
I want to change the 98
value to 66, and so forth.
2487
01:57:27,550 --> 01:57:30,130
And any quick math, how
far apart are those?
2488
01:57:30,130 --> 01:57:33,120
So it's always 32, like
uppercase to lowercase
2489
01:57:33,120 --> 01:57:37,990
is always, wonderfully, good
design, 32 away, one from the other.
2490
01:57:37,990 --> 01:57:39,100
So what does this mean?
2491
01:57:39,100 --> 01:57:41,350
Well, I think we saw earlier
that underneath the hood,
2492
01:57:41,350 --> 01:57:42,600
a char is just a number.
2493
01:57:42,600 --> 01:57:44,340
You can certainly do arithmetic on it.
2494
01:57:44,340 --> 01:57:46,507
And here, again, if you
understand these lower level
2495
01:57:46,507 --> 01:57:48,180
primitives, what if I do this?
2496
01:57:48,180 --> 01:57:53,940
Whatever s[i] is, if I know on
line 13 that it's lowercase,
2497
01:57:53,940 --> 01:57:57,048
do I want to add or subtract 32?
2498
01:57:57,048 --> 01:57:57,840
AUDIENCE: Subtract.
2499
01:57:57,840 --> 01:58:01,910
DAVID MALAN: So I want to subtract
because I want to go from like 97 to 65
2500
01:58:01,910 --> 01:58:06,560
or 98 to 66, so indeed, if you do
some quick math, that gives you 32.
2501
01:58:06,560 --> 01:58:10,970
So it's suffices to just treat
chars as numbers, subtract the 32,
2502
01:58:10,970 --> 01:58:16,370
and printing it with %c, I think, will
just convert lowercase to uppercase.
2503
01:58:16,370 --> 01:58:19,795
If you now fast forward to the real
world, Microsoft Word or Google Docs,
2504
01:58:19,795 --> 01:58:22,670
if you've ever chosen the menu option
that forces things to uppercase
2505
01:58:22,670 --> 01:58:24,980
or lowercase on occasion,
literally, that's
2506
01:58:24,980 --> 01:58:26,480
what Microsoft and Google have done.
2507
01:58:26,480 --> 01:58:29,605
They iterate over every character in
the document, check if it's lowercase,
2508
01:58:29,605 --> 01:58:33,810
and if so, they subtract 32 from
it and show you the new value.
2509
01:58:33,810 --> 01:58:36,650
What if, though, it is
not a lowercase letter?
2510
01:58:36,650 --> 01:58:40,520
I think I can keep it easy and just
print out the current letter unchanged,
2511
01:58:40,520 --> 01:58:44,850
if my goal is to simply force things
to all uppercase, and that letter,
2512
01:58:44,850 --> 01:58:46,490
then would be s[i].
2513
01:58:46,490 --> 01:58:50,750
So let me go ahead now and make
uppercase, hopefully, no errors.
2514
01:58:50,750 --> 01:58:55,670
./uppercase, and I'll now type
in David with an uppercase D,
2515
01:58:55,670 --> 01:58:57,120
but lowercase everything else.
2516
01:58:57,120 --> 01:59:00,020
But now the after version is DAVID--
2517
01:59:00,020 --> 01:59:01,190
an aesthetic bug.
2518
01:59:01,190 --> 01:59:04,400
Notice here, I forgot to include,
just for prettiness sake,
2519
01:59:04,400 --> 01:59:05,930
a backslash n at the end.
2520
01:59:05,930 --> 01:59:07,640
No problem, I'll add that.
2521
01:59:07,640 --> 01:59:08,870
Let me fix my mistake.
2522
01:59:08,870 --> 01:59:12,050
Make uppercase, ./uppercase, Enter.
2523
01:59:12,050 --> 01:59:14,240
D-A-V-I-D, Enter, and voila.
2524
01:59:14,240 --> 01:59:16,820
And I deliberately added
another space after,
2525
01:59:16,820 --> 01:59:19,130
just so they would line up
pretty, even though before
2526
01:59:19,130 --> 01:59:22,070
and after have different
numbers of letters.
2527
01:59:22,070 --> 01:59:25,630
Questions then, on this
implementation of forcing something
2528
01:59:25,630 --> 01:59:28,380
to uppercase, which in and of
itself is not all that enlightening,
2529
01:59:28,380 --> 01:59:33,990
but is representative now of how you
can leverage these low level primitives.
2530
01:59:33,990 --> 01:59:35,880
Question?
2531
01:59:35,880 --> 01:59:36,380
No?
2532
01:59:36,380 --> 01:59:38,633
All right, well, this
honestly is tedious.
2533
01:59:38,633 --> 01:59:40,550
My God, like does
Microsoft, Google, everyone,
2534
01:59:40,550 --> 01:59:43,550
you have to literally write out this
code just to do something simple?
2535
01:59:43,550 --> 01:59:46,310
Well, no, that's, again, why
we have things like libraries.
2536
01:59:46,310 --> 01:59:49,220
And increasingly now, for problem
sets, projects, and beyond,
2537
01:59:49,220 --> 01:59:52,040
well, you just use libraries
more often off-the-shelf
2538
01:59:52,040 --> 01:59:55,940
so as to solve problems that, surely,
other people have had before you.
2539
01:59:55,940 --> 01:59:59,570
So how can I now use
this library, ctype.h?
2540
01:59:59,570 --> 02:00:01,320
Well, let me go back into my code.
2541
02:00:01,320 --> 02:00:05,090
Let me include this among
my header files here.
2542
02:00:05,090 --> 02:00:08,030
Just so I can skim things easily,
I tend to alphabetize my headers.
2543
02:00:08,030 --> 02:00:11,238
But that's not strictly necessary, but
it allows me, at a glance, to realize,
2544
02:00:11,238 --> 02:00:13,400
did I or did I not
include something I need?
2545
02:00:13,400 --> 02:00:15,570
Now, let me go ahead and do this.
2546
02:00:15,570 --> 02:00:20,390
It turns out if you read the
documentation for the C type library,
2547
02:00:20,390 --> 02:00:24,710
there's a function,
wonderfully called, if islower,
2548
02:00:24,710 --> 02:00:28,910
that takes in a character as its
argument, essentially, so s[i].
2549
02:00:28,910 --> 02:00:32,182
And if that returns true, a
Boolean value, if you will,
2550
02:00:32,182 --> 02:00:33,890
well, I'm going to
force it to lowercase.
2551
02:00:33,890 --> 02:00:36,560
But I don't have to
do this math anymore.
2552
02:00:36,560 --> 02:00:40,610
Turns out, in the C type library,
there's also a function called to upper
2553
02:00:40,610 --> 02:00:43,130
that takes a character
as input, like s[i],
2554
02:00:43,130 --> 02:00:45,060
and it just does the math for you.
2555
02:00:45,060 --> 02:00:47,270
So that you can abstract
away the 32 thing,
2556
02:00:47,270 --> 02:00:50,400
and just know that someone else
has solved that problem for you.
2557
02:00:50,400 --> 02:00:53,030
Otherwise, I can leave my
code unchanged down below
2558
02:00:53,030 --> 02:00:55,200
because I'm not changing anything else.
2559
02:00:55,200 --> 02:01:00,410
So if I do make uppercase now,
and then ./uppercase, D-a-v-i-d,
2560
02:01:00,410 --> 02:01:03,710
with just a capital D,
and now it still works.
2561
02:01:03,710 --> 02:01:06,890
But if you read the documentation
further, it turns out that to upper
2562
02:01:06,890 --> 02:01:07,520
is smart.
2563
02:01:07,520 --> 02:01:10,220
If you pass in a character to
to upper, that's lowercase,
2564
02:01:10,220 --> 02:01:13,040
it obviously converts it to
uppercase by doing that math.
2565
02:01:13,040 --> 02:01:17,240
But if you pass in a character to
to upper that's already uppercase,
2566
02:01:17,240 --> 02:01:21,540
the documentation you would see tells
you that it leaves it unchanged.
2567
02:01:21,540 --> 02:01:23,910
So I can tighten all of this up.
2568
02:01:23,910 --> 02:01:25,880
I can get rid of the whole else.
2569
02:01:25,880 --> 02:01:29,150
I can get rid of the whole
if, and arguably now,
2570
02:01:29,150 --> 02:01:33,620
implement a program that's just
as correct, but better designed.
2571
02:01:33,620 --> 02:01:34,250
Why?
2572
02:01:34,250 --> 02:01:38,000
Fewer lines of code easier to read,
lower probability of mistakes,
2573
02:01:38,000 --> 02:01:39,740
assuming the library is correct.
2574
02:01:39,740 --> 02:01:43,160
It just makes it easier and
faster for me, now, to write code.
2575
02:01:43,160 --> 02:01:47,960
So if I now do, one last time,
make uppercase, Enter, ./uppercase,
2576
02:01:47,960 --> 02:01:50,190
and type in my name, still working.
2577
02:01:50,190 --> 02:01:53,810
But now notice, we've whittled this
down to far fewer lines of code,
2578
02:01:53,810 --> 02:01:57,740
albeit, using now this
additional library.
2579
02:01:57,740 --> 02:02:00,140
Questions then on how we did this?
2580
02:02:00,140 --> 02:02:03,930
2581
02:02:03,930 --> 02:02:06,230
Well, even though this
code, I daresay, is correct,
2582
02:02:06,230 --> 02:02:09,120
it's not necessarily
well-designed just yet.
2583
02:02:09,120 --> 02:02:12,590
In fact, there's one line
of code, one function
2584
02:02:12,590 --> 02:02:14,690
call in this current
implementation that's
2585
02:02:14,690 --> 02:02:17,900
more inefficient than it needs to be.
2586
02:02:17,900 --> 02:02:20,630
And allow me to draw your
attention to this here,
2587
02:02:20,630 --> 02:02:24,320
line 10, wherein we're calling strlen.
2588
02:02:24,320 --> 02:02:27,350
But we're calling it inside of
this for loop, specifically,
2589
02:02:27,350 --> 02:02:29,000
inside of the condition.
2590
02:02:29,000 --> 02:02:33,720
And why might that not
necessarily be the best idea?
2591
02:02:33,720 --> 02:02:36,810
Well, is the length of the
string as changing, ever?
2592
02:02:36,810 --> 02:02:38,950
I mean, certainly not within
the span of this loop.
2593
02:02:38,950 --> 02:02:42,840
And so here we are within our for
loop on line 10, 11, 12, and 13,
2594
02:02:42,840 --> 02:02:45,242
asking on every iteration
that same question.
2595
02:02:45,242 --> 02:02:46,200
What's the length of s?
2596
02:02:46,200 --> 02:02:47,190
What's the length of s?
2597
02:02:47,190 --> 02:02:48,330
What's the length of s?
2598
02:02:48,330 --> 02:02:50,702
And in turn, we're
calling strlen every time,
2599
02:02:50,702 --> 02:02:52,660
even though we're getting
back the same answer.
2600
02:02:52,660 --> 02:02:54,960
So I daresay a better
solution here would
2601
02:02:54,960 --> 02:02:58,230
be to maybe figure out the length
of s earlier on in my code,
2602
02:02:58,230 --> 02:02:59,490
and maybe declare a variable.
2603
02:02:59,490 --> 02:03:02,580
Or perhaps do something that's
syntactically a little more elegant,
2604
02:03:02,580 --> 02:03:05,070
and in fact, a very common
design in a loop like this,
2605
02:03:05,070 --> 02:03:07,860
would be to declare not
just one variable like i,
2606
02:03:07,860 --> 02:03:12,060
but to actually declare a second
variable called n, for instance, where
2607
02:03:12,060 --> 02:03:16,530
n is just some number, set
n equal to the length of s.
2608
02:03:16,530 --> 02:03:18,900
But thereafter, inside
of this condition,
2609
02:03:18,900 --> 02:03:24,540
instead of calling strlen of s again and
again and again, what might I now do?
2610
02:03:24,540 --> 02:03:28,110
I could instead just
compare i against n itself,
2611
02:03:28,110 --> 02:03:31,080
because n now will only be calculated
once when it's initialized,
2612
02:03:31,080 --> 02:03:32,730
just as i is initialize to zero.
2613
02:03:32,730 --> 02:03:36,000
And thereafter, we're going to be
comparing i, which is changing,
2614
02:03:36,000 --> 02:03:37,350
against n, which will not be.
2615
02:03:37,350 --> 02:03:40,330
So it's going to be marginally
more efficient by design.
2616
02:03:40,330 --> 02:03:42,900
Now with that said, a
good compiler could also
2617
02:03:42,900 --> 02:03:46,080
recognize that there is this
optimization possibility,
2618
02:03:46,080 --> 02:03:47,100
and maybe do it for us.
2619
02:03:47,100 --> 02:03:49,080
But for now, best to
get into the habit, best
2620
02:03:49,080 --> 02:03:52,260
to develop the muscle memory for
making those better design decisions
2621
02:03:52,260 --> 02:03:54,010
yourselves.
2622
02:03:54,010 --> 02:03:56,380
Questions, then, on how we did this?
2623
02:03:56,380 --> 02:03:58,900
2624
02:03:58,900 --> 02:03:59,650
No?
2625
02:03:59,650 --> 02:04:03,050
All right, a few final
building blocks for the day.
2626
02:04:03,050 --> 02:04:07,870
So we started by talking about those
command line arguments that clang uses,
2627
02:04:07,870 --> 02:04:13,090
whereby, anything after the command
that you type at a prompt, be it make
2628
02:04:13,090 --> 02:04:18,160
or clang or even CD in Linux,
any word thereafter, or something
2629
02:04:18,160 --> 02:04:21,350
cryptic like -o is a
command line argument.
2630
02:04:21,350 --> 02:04:22,840
It's an input to the command.
2631
02:04:22,840 --> 02:04:26,132
It's different from a function argument
because a function argument, of course,
2632
02:04:26,132 --> 02:04:27,280
is an input to a function.
2633
02:04:27,280 --> 02:04:28,345
But it's the same idea.
2634
02:04:28,345 --> 02:04:30,970
It's just different syntax after
the dollar sign at the prompt.
2635
02:04:30,970 --> 02:04:33,880
Well, it turns out that
command line arguments
2636
02:04:33,880 --> 02:04:37,660
are something you can now
use in your own programs
2637
02:04:37,660 --> 02:04:41,800
by accessing words after the prompt.
2638
02:04:41,800 --> 02:04:45,410
And let me propose that
we invent this as follows.
2639
02:04:45,410 --> 02:04:49,540
Let me propose that we
switch back to VS Code here,
2640
02:04:49,540 --> 02:04:53,560
and I'll open a new file
here called greet.c.
2641
02:04:53,560 --> 02:04:56,410
So in greet.c, it's going to be
a program that very simply greets
2642
02:04:56,410 --> 02:04:57,070
the user.
2643
02:04:57,070 --> 02:04:59,440
Had we written this last
week, we would have done this.
2644
02:04:59,440 --> 02:05:08,200
Include cs50.h, and then include
stdio.h, and then int main void,
2645
02:05:08,200 --> 02:05:13,060
and then we might do something simple
like string name equals getString,
2646
02:05:13,060 --> 02:05:15,980
quote unquote, "What's your name?"
2647
02:05:15,980 --> 02:05:20,020
And then we would have printed
out, as always, Hello, %s,
2648
02:05:20,020 --> 02:05:21,490
and then plugging in that name.
2649
02:05:21,490 --> 02:05:25,300
So this is the same program we've
implemented many times, just
2650
02:05:25,300 --> 02:05:26,590
to make sure it works--
2651
02:05:26,590 --> 02:05:29,140
although, nope, that's not
quite the same program.
2652
02:05:29,140 --> 02:05:30,940
Semicolon's in the wrong place.
2653
02:05:30,940 --> 02:05:32,960
This now is the same program.
2654
02:05:32,960 --> 02:05:37,610
So make greet, dot ./greet, and I'll
type in my own name. hello, David.
2655
02:05:37,610 --> 02:05:38,770
So we're back there.
2656
02:05:38,770 --> 02:05:41,770
Now, what's arguably a little
annoying about this program,
2657
02:05:41,770 --> 02:05:44,110
if I type in something
else like, Carter,
2658
02:05:44,110 --> 02:05:48,130
Enter, I have to run the program,
wait for the prompt, type in my name,
2659
02:05:48,130 --> 02:05:48,910
hit Enter.
2660
02:05:48,910 --> 02:05:52,360
And that's fine, but imagine if
every program worked like this.
2661
02:05:52,360 --> 02:05:55,415
Like make, suppose you could only
type make, then you wait for a prompt,
2662
02:05:55,415 --> 02:05:58,540
then you type the name of the program
you want to make, then you hit Enter.
2663
02:05:58,540 --> 02:06:01,720
Or worse, in Linux when you
have to change directories,
2664
02:06:01,720 --> 02:06:05,263
as you might have for problem set one,
what if you had to type CD, Enter,
2665
02:06:05,263 --> 02:06:07,930
now type the name of the folder
you want to change into, Enter--
2666
02:06:07,930 --> 02:06:09,710
I mean, it just slows life down.
2667
02:06:09,710 --> 02:06:11,470
And so it just gets annoying quickly.
2668
02:06:11,470 --> 02:06:16,070
So command line arguments just let you
express your whole thought all at once.
2669
02:06:16,070 --> 02:06:18,200
So how can I do this?
2670
02:06:18,200 --> 02:06:22,450
Well, if I want to express the notion
of command line arguments in my code,
2671
02:06:22,450 --> 02:06:25,640
I could do something like this.
2672
02:06:25,640 --> 02:06:28,750
I could, for the very
first time, go up and get
2673
02:06:28,750 --> 02:06:33,730
rid of this void, which as of today
means, this program takes no command
2674
02:06:33,730 --> 02:06:34,780
line arguments.
2675
02:06:34,780 --> 02:06:37,540
And I can change it to exactly this.
2676
02:06:37,540 --> 02:06:43,490
Int argc, string argv, with brackets.
2677
02:06:43,490 --> 02:06:44,950
Now it's cryptic, admittedly.
2678
02:06:44,950 --> 02:06:46,150
And let me zoom in.
2679
02:06:46,150 --> 02:06:49,300
But I think we can perhaps
infer now, what's going on.
2680
02:06:49,300 --> 02:06:52,750
If main now does not have
void as its input, which
2681
02:06:52,750 --> 02:06:55,600
means it takes no arguments,
surely, the spoiler
2682
02:06:55,600 --> 02:06:59,230
here is that now main will take
command line arguments somehow.
2683
02:06:59,230 --> 02:07:05,180
Any guesses as to what
argv is or will be?
2684
02:07:05,180 --> 02:07:08,330
What might this represent?
2685
02:07:08,330 --> 02:07:11,390
It's an array of strings,
right, by way of the syntax.
2686
02:07:11,390 --> 02:07:13,223
Yeah?
2687
02:07:13,223 --> 02:07:15,480
AUDIENCE: All the characters
will be typed out.
2688
02:07:15,480 --> 02:07:16,050
DAVID MALAN: Exactly.
2689
02:07:16,050 --> 02:07:18,550
It will be all of the characters,
or really all of the words
2690
02:07:18,550 --> 02:07:19,830
that you type at the prompt.
2691
02:07:19,830 --> 02:07:21,765
Argc, as an int, any guess?
2692
02:07:21,765 --> 02:07:24,360
2693
02:07:24,360 --> 02:07:28,700
Argument count is what it generally
stands for, though technically,
2694
02:07:28,700 --> 02:07:30,290
you could call these things anything.
2695
02:07:30,290 --> 02:07:31,520
But this is the convention.
2696
02:07:31,520 --> 02:07:35,780
Because I claimed earlier that arrays
don't keep track of their own length,
2697
02:07:35,780 --> 02:07:38,930
if you want to know how many words
the human typed at the prompt
2698
02:07:38,930 --> 02:07:41,420
after your program's
name, you have to be told,
2699
02:07:41,420 --> 02:07:45,650
not just the array of the words,
but the length of that array.
2700
02:07:45,650 --> 02:07:48,530
The strings, you can figure
out the length of using strlen,
2701
02:07:48,530 --> 02:07:53,360
but you can't figure out the length of
the array of strings, the collection
2702
02:07:53,360 --> 02:07:55,020
of words that the human typed in.
2703
02:07:55,020 --> 02:07:56,760
So how can I now use this?
2704
02:07:56,760 --> 02:07:59,190
Well, let me go ahead and do this.
2705
02:07:59,190 --> 02:08:04,190
Let me go ahead and change this program
now just to be printf, quote unquote,
2706
02:08:04,190 --> 02:08:11,630
"hello, %2 /n", then argv[1].
2707
02:08:11,630 --> 02:08:14,780
So this is not the best version
of my code yet, but it's my first.
2708
02:08:14,780 --> 02:08:21,020
Make greet, and now let me do
./greet, David all at once.
2709
02:08:21,020 --> 02:08:23,210
Enter, hello, David.
2710
02:08:23,210 --> 02:08:25,820
Now let me run it
again, ./greet, Carter.
2711
02:08:25,820 --> 02:08:27,620
Enter, hello, Carter.
2712
02:08:27,620 --> 02:08:29,840
It's a marginal improvement,
but I don't have
2713
02:08:29,840 --> 02:08:32,330
to wait for getString to
prompt me to hit Enter.
2714
02:08:32,330 --> 02:08:34,370
It's just speeding
things up, twice as fast.
2715
02:08:34,370 --> 02:08:36,890
One less command to type in.
2716
02:08:36,890 --> 02:08:41,390
But I deliberately did [1], but
what's the beginning of argv?
2717
02:08:41,390 --> 02:08:42,170
It would be [0].
2718
02:08:42,170 --> 02:08:44,730
2719
02:08:44,730 --> 02:08:45,780
Well, what's that?
2720
02:08:45,780 --> 02:08:48,840
This is sometimes useful,
though for now, it's not.
2721
02:08:48,840 --> 02:08:54,110
Suppose I recompile my code and
run this program now, greet David.
2722
02:08:54,110 --> 02:08:58,598
Anyone want to guess what's in argv[0]?
2723
02:08:58,598 --> 02:08:59,530
AUDIENCE: [INAUDIBLE]
2724
02:08:59,530 --> 02:09:00,220
DAVID MALAN: Say again?
2725
02:09:00,220 --> 02:09:01,230
AUDIENCE: Greet, hello.
2726
02:09:01,230 --> 02:09:04,530
DAVID MALAN: Greet,
Enter, hello, ./greet.
2727
02:09:04,530 --> 02:09:08,280
So if you want to sort of inception
style your program to figure out what
2728
02:09:08,280 --> 02:09:11,910
its own name is, or at least how it
was executed at the command line,
2729
02:09:11,910 --> 02:09:14,460
at the terminal, you
can look at argv[0].
2730
02:09:14,460 --> 02:09:17,160
In general, probably not
that useful, probably better
2731
02:09:17,160 --> 02:09:21,900
to start looking at [1], which was
the first word after the program name.
2732
02:09:21,900 --> 02:09:25,320
And if there were more, I could
do this how about argv[2],
2733
02:09:25,320 --> 02:09:27,690
let me add in a second %s.
2734
02:09:27,690 --> 02:09:29,550
Let me recompile greet.
2735
02:09:29,550 --> 02:09:35,490
Let me do ./greet David Malan,
Enter, and that, too, now works,
2736
02:09:35,490 --> 02:09:37,112
taking in two words at the prompt.
2737
02:09:37,112 --> 02:09:38,820
If I really want to
be smart at this now,
2738
02:09:38,820 --> 02:09:40,445
I could do something like this, though.
2739
02:09:40,445 --> 02:09:44,700
How about if the count of
arguments, A.K.A. argc,
2740
02:09:44,700 --> 02:09:49,890
equals equals to, then assume that the
human typed in only their first name,
2741
02:09:49,890 --> 02:09:58,440
and do printf hello comma
%s /n, and then argv[1].
2742
02:09:58,440 --> 02:10:01,470
Else, if the human did
not provide exactly two
2743
02:10:01,470 --> 02:10:04,920
arguments, the name of the
program and their own name,
2744
02:10:04,920 --> 02:10:07,890
let's just print out a default
value, lest they forgot their name
2745
02:10:07,890 --> 02:10:09,990
or they typed in two
names or three names.
2746
02:10:09,990 --> 02:10:13,110
Let's just do, hello
comma world as a default.
2747
02:10:13,110 --> 02:10:15,270
And we'll just ignore
what the human typed in.
2748
02:10:15,270 --> 02:10:20,850
If I recompile this, make greet, I
can do ./greet and David again, Enter.
2749
02:10:20,850 --> 02:10:24,840
Oops-- sorry, what am I missing?
2750
02:10:24,840 --> 02:10:26,640
Yeah, so newbie mistake.
2751
02:10:26,640 --> 02:10:30,090
Else, all right, make greet again.
2752
02:10:30,090 --> 02:10:34,050
./greet, David, Enter,
there's my hello, David.
2753
02:10:34,050 --> 02:10:37,870
But if I omit my name, I just get
the generic, like a default value.
2754
02:10:37,870 --> 02:10:41,590
And if I get a little curious and I type
in both names, then I get ignored too.
2755
02:10:41,590 --> 02:10:42,090
Why?
2756
02:10:42,090 --> 02:10:44,880
Because I just haven't built
in support for argc of three.
2757
02:10:44,880 --> 02:10:47,610
I could do anything I want,
but now we have access
2758
02:10:47,610 --> 02:10:50,730
to these kinds of building blocks.
2759
02:10:50,730 --> 02:10:52,780
All right, what else might I do here?
2760
02:10:52,780 --> 02:10:57,660
Well, it turns out there might be some
final features for us to now execute.
2761
02:10:57,660 --> 02:11:00,090
Notice, though, that
in C, despite what you
2762
02:11:00,090 --> 02:11:02,820
might see in books or
online tutorials, nowadays,
2763
02:11:02,820 --> 02:11:06,180
the two official formats
for defining a main function
2764
02:11:06,180 --> 02:11:11,130
are either this, which we've been using
now for two plus weeks or now this,
2765
02:11:11,130 --> 02:11:14,250
whereby, you change
the void to int argc,
2766
02:11:14,250 --> 02:11:17,880
and then for now, string
argv, and then empty brackets.
2767
02:11:17,880 --> 02:11:20,608
And we'll see that this, too, is
a simplification, some training
2768
02:11:20,608 --> 02:11:21,400
wheels if you will.
2769
02:11:21,400 --> 02:11:23,550
But for now, those are
the two forms, even
2770
02:11:23,550 --> 02:11:26,550
though you will see in online
tutorials and even books, some people
2771
02:11:26,550 --> 02:11:27,840
use main in different ways.
2772
02:11:27,840 --> 02:11:30,142
These are the two now to keep in mind.
2773
02:11:30,142 --> 02:11:32,100
And I'll note that these
command line arguments
2774
02:11:32,100 --> 02:11:33,360
are kind of all over the place.
2775
02:11:33,360 --> 02:11:35,590
Didn't probably expect to see
this word on the screen here.
2776
02:11:35,590 --> 02:11:36,490
And what does it mean?
2777
02:11:36,490 --> 02:11:37,920
Well, it turns out that
for decades-- there's
2778
02:11:37,920 --> 02:11:40,080
actually this program that
comes with Linux systems
2779
02:11:40,080 --> 02:11:41,880
in particular called cowsay.
2780
02:11:41,880 --> 02:11:42,510
Why?
2781
02:11:42,510 --> 02:11:45,300
Probably because someone had too
much free time once and decided
2782
02:11:45,300 --> 02:11:49,920
to write a program that creates ASCII
art out of a cow saying something
2783
02:11:49,920 --> 02:11:51,520
textually on the screen.
2784
02:11:51,520 --> 02:11:55,780
But you use cowsay, just for fun,
by way of command line arguments.
2785
02:11:55,780 --> 02:12:00,660
So for instance, let me propose
that I go back to VS Code
2786
02:12:00,660 --> 02:12:03,020
here, not because I
want to write any code,
2787
02:12:03,020 --> 02:12:04,770
but I just want to use
my terminal window.
2788
02:12:04,770 --> 02:12:07,320
And let me maximize my
terminal window here.
2789
02:12:07,320 --> 02:12:11,880
And let me go ahead and type in
something like, how about cowsay,
2790
02:12:11,880 --> 02:12:13,170
space moo?
2791
02:12:13,170 --> 02:12:14,822
So cowsay is not a program I wrote.
2792
02:12:14,822 --> 02:12:16,030
It's been around for decades.
2793
02:12:16,030 --> 02:12:18,870
But we installed it in VS
Code for you in the cloud.
2794
02:12:18,870 --> 02:12:21,330
It takes at least one
command line argument.
2795
02:12:21,330 --> 02:12:23,070
What do you want the cow to say?
2796
02:12:23,070 --> 02:12:26,190
I can say, cowsay moo, and
hit Enter, and voila, there
2797
02:12:26,190 --> 02:12:29,490
is my ASCII art of a cow
saying moo on the screen.
2798
02:12:29,490 --> 02:12:31,090
It can say multiple words.
2799
02:12:31,090 --> 02:12:33,960
So I can say, Hello, world, Enter.
2800
02:12:33,960 --> 02:12:35,800
And now it says, Hello, world.
2801
02:12:35,800 --> 02:12:38,730
So this is just an example of a
silly program that uses command line
2802
02:12:38,730 --> 02:12:40,470
arguments, but it takes others too.
2803
02:12:40,470 --> 02:12:43,650
Just like clang, use this
convention of hyphens
2804
02:12:43,650 --> 02:12:45,750
to change the output of the program.
2805
02:12:45,750 --> 02:12:49,350
Dash something is just a super common
convention with command line arguments
2806
02:12:49,350 --> 02:12:53,520
when you want a very terse notation
for some option like output.
2807
02:12:53,520 --> 02:12:56,460
In cowsay, I read the
documentation, and it turns out
2808
02:12:56,460 --> 02:12:59,040
there's a dash f command
line argument that
2809
02:12:59,040 --> 02:13:03,460
allows you to change the
appearance of the cow, if you will.
2810
02:13:03,460 --> 02:13:10,170
So if I do cowsay dash f, duck, and
then some other word like quack,
2811
02:13:10,170 --> 02:13:11,640
it's no longer a cow.
2812
02:13:11,640 --> 02:13:15,850
That command line argument turns it
into a tiny, adorable duck instead.
2813
02:13:15,850 --> 02:13:19,020
And then lastly, just for fun,
because I spent way too much time
2814
02:13:19,020 --> 02:13:20,790
playing with command line arguments.
2815
02:13:20,790 --> 02:13:25,260
Cowsay dash f, dragon, and
then how about, rawr, Enter,
2816
02:13:25,260 --> 02:13:27,910
you can even get this
on the screen here.
2817
02:13:27,910 --> 02:13:30,150
So this, too, is just
an example of what you
2818
02:13:30,150 --> 02:13:34,230
can do with these command line arguments
now that we have this building block.
2819
02:13:34,230 --> 02:13:36,960
And there's one final thing
we can now do with code.
2820
02:13:36,960 --> 02:13:39,150
There's one last
feature today that we'll
2821
02:13:39,150 --> 02:13:41,610
introduce before we now
connect all of these dots
2822
02:13:41,610 --> 02:13:47,520
to readability and encryption by
talking, lastly, about something called
2823
02:13:47,520 --> 02:13:48,450
exit status.
2824
02:13:48,450 --> 02:13:52,380
It turns out that whenever
your main function exits,
2825
02:13:52,380 --> 02:13:55,590
it returns a secret integer
that you can figure out,
2826
02:13:55,590 --> 02:13:58,260
as the programmer or an
advanced user, what it was.
2827
02:13:58,260 --> 02:14:02,398
And these exit codes, exit statuses,
are typically used to indicate errors.
2828
02:14:02,398 --> 02:14:05,190
So for instance, over the past
couple of years, if you've used zoom
2829
02:14:05,190 --> 02:14:08,560
and you ever got some kind of error,
you might have seen a screen like this.
2830
02:14:08,560 --> 02:14:11,040
It's usually not that helpful,
maybe tells you to click
2831
02:14:11,040 --> 02:14:13,050
Report Problem or Contact Support.
2832
02:14:13,050 --> 02:14:16,980
But very often in our human
world on Macs, PCs, and phones,
2833
02:14:16,980 --> 02:14:20,010
you see cryptic error codes,
like literally numbers
2834
02:14:20,010 --> 02:14:23,640
that probably only Zoom knows, or
Microsoft or Google or whatever company
2835
02:14:23,640 --> 02:14:25,050
wrote the software you're using.
2836
02:14:25,050 --> 02:14:28,260
But that number corresponds
to a specific error
2837
02:14:28,260 --> 02:14:32,070
that some human somewhere
knows might very well happen.
2838
02:14:32,070 --> 02:14:34,950
These are used similarly,
although under a different name
2839
02:14:34,950 --> 02:14:38,260
that we'll talk about later in
the term, on the web as well.
2840
02:14:38,260 --> 02:14:41,350
Have you ever seen this-- maybe
not character, but number?
2841
02:14:41,350 --> 02:14:43,485
So, 404 means what?
2842
02:14:43,485 --> 02:14:44,880
AUDIENCE: Error.
2843
02:14:44,880 --> 02:14:47,790
DAVID MALAN: So error,
yes, but really, not found.
2844
02:14:47,790 --> 02:14:48,410
So, why?
2845
02:14:48,410 --> 02:14:49,993
I mean, this is the most arcane thing.
2846
02:14:49,993 --> 02:14:53,000
And we'll talk in a few weeks about
what this and other numbers mean,
2847
02:14:53,000 --> 02:14:54,917
but numbers are all
around us in technology,
2848
02:14:54,917 --> 02:14:57,500
and they very often mean something
to the technical people who
2849
02:14:57,500 --> 02:15:00,270
wrote the software, less so
to humans like you and me.
2850
02:15:00,270 --> 02:15:03,230
Why so many of us recognize
404 is kind of weird,
2851
02:15:03,230 --> 02:15:05,900
that like that's been around
long enough that we all know it.
2852
02:15:05,900 --> 02:15:10,250
But it really is just a special number
that represents an error of some sort.
2853
02:15:10,250 --> 02:15:13,100
So it turns out, the last
thing we'll reveal today
2854
02:15:13,100 --> 02:15:15,530
about what we've been taking
for granted for two weeks,
2855
02:15:15,530 --> 02:15:18,200
is what the int is in main.
2856
02:15:18,200 --> 02:15:21,650
We've seen, just a moment ago, that
the thing in the parentheses, which
2857
02:15:21,650 --> 02:15:24,680
up until now has been void, which
means no command line arguments.
2858
02:15:24,680 --> 02:15:29,690
now int argc string argv brackets just
means, yes, command line arguments.
2859
02:15:29,690 --> 02:15:31,290
And we've seen how to access them.
2860
02:15:31,290 --> 02:15:33,620
So the last piece of
the puzzle, honestly,
2861
02:15:33,620 --> 02:15:37,460
of all the cryptic syntax the past
two weeks, is just what int means.
2862
02:15:37,460 --> 02:15:40,610
Int is always there for
main, and it indicates
2863
02:15:40,610 --> 02:15:44,300
that main will always return an integer,
even though you and I have never
2864
02:15:44,300 --> 02:15:46,010
done so explicitly.
2865
02:15:46,010 --> 02:15:50,450
Usually, main returns
0, by default. But it
2866
02:15:50,450 --> 02:15:53,928
would be weird if you saw an error
message saying 0, so 0 is just hidden.
2867
02:15:53,928 --> 02:15:55,470
You would never see it on the screen.
2868
02:15:55,470 --> 02:15:58,670
But it's happening automatically
by way of how C is designed.
2869
02:15:58,670 --> 02:16:01,550
So let me write one final program here.
2870
02:16:01,550 --> 02:16:05,750
I'll call it, for instance, status.c
to show you these exit statuses.
2871
02:16:05,750 --> 02:16:10,790
Code of status.c, and then up here,
let me do something simple like include
2872
02:16:10,790 --> 02:16:18,020
cs50.h, then include
stdio.h, and then int main--
2873
02:16:18,020 --> 02:16:21,350
actually, let's use a command line
argument. int argc, string argv[],
2874
02:16:21,350 --> 02:16:23,180
so that's copy, paste.
2875
02:16:23,180 --> 02:16:26,000
But now let's do this.
2876
02:16:26,000 --> 02:16:29,280
If argc does not equal to--
2877
02:16:29,280 --> 02:16:30,780
why don't we do something like this?
2878
02:16:30,780 --> 02:16:33,740
Let's not just default to
hello, world like last time.
2879
02:16:33,740 --> 02:16:34,770
Let's yell at the user.
2880
02:16:34,770 --> 02:16:38,802
So let's say something like printf
missing command line argument,
2881
02:16:38,802 --> 02:16:40,760
so that they know they
screwed up and they need
2882
02:16:40,760 --> 02:16:43,160
to run the program again correctly.
2883
02:16:43,160 --> 02:16:51,320
Else, let's go ahead and say, print
out, as before, Hello, comma %s,
2884
02:16:51,320 --> 02:16:56,730
and then plug in argv[1], so the
human's name from the prompt.
2885
02:16:56,730 --> 02:17:01,910
Now at this point, let me go
ahead and run status, ./status,
2886
02:17:01,910 --> 02:17:03,590
and I'll type nothing first.
2887
02:17:03,590 --> 02:17:04,700
I get yelled at.
2888
02:17:04,700 --> 02:17:10,170
This time, I'll type it again.
./status David, and it works properly.
2889
02:17:10,170 --> 02:17:14,090
But now let me show you a
somewhat secret, cryptic command.
2890
02:17:14,090 --> 02:17:17,330
You can type this at your prompt,
and it's just a coincidence
2891
02:17:17,330 --> 02:17:18,740
that there's another dollar sign.
2892
02:17:18,740 --> 02:17:22,400
Echo $?, totally arcane,
but it allows you
2893
02:17:22,400 --> 02:17:25,490
to see what exit status
your program has ended with.
2894
02:17:25,490 --> 02:17:27,559
So let me run this again the wrong way.
2895
02:17:27,559 --> 02:17:31,040
./status, I get the error message.
2896
02:17:31,040 --> 02:17:32,780
What was secretly returned?
2897
02:17:32,780 --> 02:17:33,440
I can't see it.
2898
02:17:33,440 --> 02:17:37,280
There's obviously no error
screen, but by typing echo $?,
2899
02:17:37,280 --> 02:17:41,420
I can see that, oh, my program
automatically, by default, returns
2900
02:17:41,420 --> 02:17:42,170
zero.
2901
02:17:42,170 --> 02:17:46,879
However, if I run it again
correctly, ./status David, Enter,
2902
02:17:46,879 --> 02:17:48,690
this is the correct version.
2903
02:17:48,690 --> 02:17:50,629
But if I run echo $?
2904
02:17:50,629 --> 02:17:52,879
status again, it's still entered with 0.
2905
02:17:52,879 --> 02:17:55,879
And long story short, this
is just a missed opportunity.
2906
02:17:55,879 --> 02:17:59,570
When something goes wrong, why
don't I return a value other than 0?
2907
02:17:59,570 --> 02:18:01,070
0, by default, means success.
2908
02:18:01,070 --> 02:18:02,690
And it's always there automatically.
2909
02:18:02,690 --> 02:18:04,940
But you can control this.
2910
02:18:04,940 --> 02:18:11,160
I can go into my code here and return
1, else, if something works fine,
2911
02:18:11,160 --> 02:18:14,870
I can return 0, by default. And
honestly, if I omit the return zero,
2912
02:18:14,870 --> 02:18:17,129
again, zero automatically is returned.
2913
02:18:17,129 --> 02:18:20,719
So let me go ahead and go be explicit,
just so I know what's going on.
2914
02:18:20,719 --> 02:18:26,360
Make status again, ./status, and
let's do this correctly with David.
2915
02:18:26,360 --> 02:18:28,520
Enter, hello, David.
2916
02:18:28,520 --> 02:18:32,059
Echo $?, zero.
2917
02:18:32,059 --> 02:18:33,270
So all is well.
2918
02:18:33,270 --> 02:18:38,240
But now if I do ./status and nothing,
or multiple things, but not just David,
2919
02:18:38,240 --> 02:18:40,530
Enter, I get the error message.
2920
02:18:40,530 --> 02:18:45,230
But now if I do echo $?,
voila, there now is the one.
2921
02:18:45,230 --> 02:18:47,330
So what does this now mean?
2922
02:18:47,330 --> 02:18:49,490
This is, in the graphical
world, we would just
2923
02:18:49,490 --> 02:18:51,020
show something like this
on the screen, which is
2924
02:18:51,020 --> 02:18:52,459
a little more informative to the user.
2925
02:18:52,459 --> 02:18:54,469
But even in the Linux world
where you don't have a GUI,
2926
02:18:54,469 --> 02:18:56,690
necessarily, even for the
programs we've written,
2927
02:18:56,690 --> 02:18:58,549
you can check these exit statuses.
2928
02:18:58,549 --> 02:19:01,070
And in fact, more comfortable,
more advanced programmers,
2929
02:19:01,070 --> 02:19:03,889
when they write code
that calls programs,
2930
02:19:03,889 --> 02:19:07,340
be it cowsay or anything
else, you can encode,
2931
02:19:07,340 --> 02:19:11,030
check what the exit status is
of a program, and then decide,
2932
02:19:11,030 --> 02:19:13,170
did my program work or did it not?
2933
02:19:13,170 --> 02:19:16,219
And now let's connect
the final dots before we
2934
02:19:16,219 --> 02:19:19,070
adjourn for some fruit snacks.
2935
02:19:19,070 --> 02:19:22,100
Cryptography, namely one of
the applications this week
2936
02:19:22,100 --> 02:19:24,770
via which you'll be able
to send, if you will,
2937
02:19:24,770 --> 02:19:27,650
secret messages, and better
yet, decrypt secret messages.
2938
02:19:27,650 --> 02:19:29,780
This will be in addition
to perhaps analyzing
2939
02:19:29,780 --> 02:19:32,120
the readability of text
using heuristics, like we
2940
02:19:32,120 --> 02:19:34,040
identified at the start of class two.
2941
02:19:34,040 --> 02:19:38,299
So cryptography is just the art, the
science of encrypting information,
2942
02:19:38,299 --> 02:19:41,330
scrambling information so that
if you have a secret message
2943
02:19:41,330 --> 02:19:45,980
to send in so-called plaintext, you
can run it through some algorithm
2944
02:19:45,980 --> 02:19:49,910
and turn it into what's called
ciphertext, thereby, encrypting it.
2945
02:19:49,910 --> 02:19:53,150
And only someone who knows
what algorithm you've used
2946
02:19:53,150 --> 02:19:55,880
and what input you've used to
the algorithm, theoretically,
2947
02:19:55,880 --> 02:19:59,880
can decrypt that process and convert
it back to the original message.
2948
02:19:59,880 --> 02:20:03,030
So if we use our mental model
from last week, here is a problem.
2949
02:20:03,030 --> 02:20:04,910
Here is an input and output.
2950
02:20:04,910 --> 02:20:08,120
The goal I claim here is to take
some plain text, like the message
2951
02:20:08,120 --> 02:20:10,250
you want to send, think
back to grade school
2952
02:20:10,250 --> 02:20:13,640
if you ever passed a note to a friend
or to your crush saying, I love you,
2953
02:20:13,640 --> 02:20:16,910
it's a little awkward if the teacher
or someone else intercepts the paper.
2954
02:20:16,910 --> 02:20:19,490
And in English, it just says,
I love you, or whatever it is.
2955
02:20:19,490 --> 02:20:22,350
It'd be nice if you had at
least encrypted it in some way.
2956
02:20:22,350 --> 02:20:25,220
But the other person needs to
know what algorithm you used
2957
02:20:25,220 --> 02:20:27,230
and what inputs you
use to that algorithm
2958
02:20:27,230 --> 02:20:31,100
so that, ultimately, they can decode
the so-called ciphertext, which
2959
02:20:31,100 --> 02:20:32,040
is the output.
2960
02:20:32,040 --> 02:20:34,190
So what goes inside of the box today?
2961
02:20:34,190 --> 02:20:37,970
Well, an algorithm, as it relates
to cryptography, is called a cipher.
2962
02:20:37,970 --> 02:20:41,390
And a cipher is a fancy name for
an algorithm that encrypts text
2963
02:20:41,390 --> 02:20:43,250
from plaintext to ciphertext.
2964
02:20:43,250 --> 02:20:46,760
The catch is, there needs to
be not just the algorithm,
2965
02:20:46,760 --> 02:20:48,750
there needs to be an input to it.
2966
02:20:48,750 --> 02:20:52,590
And so, for instance, you might draw
the picture like this for the first time
2967
02:20:52,590 --> 02:20:53,090
today.
2968
02:20:53,090 --> 02:20:54,257
And we've seen this in code.
2969
02:20:54,257 --> 02:20:57,180
You can give multiple inputs
or arguments to functions.
2970
02:20:57,180 --> 02:20:59,960
So in this black box, can you
imagine passing in the message
2971
02:20:59,960 --> 02:21:02,510
you want to send, and then some secret.
2972
02:21:02,510 --> 02:21:05,300
So for instance, suppose
that, the simplest
2973
02:21:05,300 --> 02:21:08,750
thing I could think of as a kid was
instead of sending the letter A,
2974
02:21:08,750 --> 02:21:10,310
why don't I write the letter B?
2975
02:21:10,310 --> 02:21:13,070
Instead of the letter B, why
don't I write the letter C?
2976
02:21:13,070 --> 02:21:16,280
So I can kind of shift the
English alphabet by one space.
2977
02:21:16,280 --> 02:21:18,740
So A becomes B, B
becomes C, dot, dot, dot,
2978
02:21:18,740 --> 02:21:21,690
Z becomes A. You can
wrap around at the end.
2979
02:21:21,690 --> 02:21:24,120
And let's assume no punctuation
in this part of the story.
2980
02:21:24,120 --> 02:21:29,420
So that's a very simple algorithm--
add a value to each letter
2981
02:21:29,420 --> 02:21:32,090
and send the value as the ciphertext.
2982
02:21:32,090 --> 02:21:35,540
And now the teacher, the classmate,
they have to know that you use,
2983
02:21:35,540 --> 02:21:39,410
not only this rotational algorithm,
also known as a Caesar cipher,
2984
02:21:39,410 --> 02:21:41,300
they also need to know
what number you use.
2985
02:21:41,300 --> 02:21:45,200
Did you add 1 to every letter, 2 to
every letter, 25 to every letter?
2986
02:21:45,200 --> 02:21:49,310
Now if they're super smart and probably
not the young age in this story,
2987
02:21:49,310 --> 02:21:51,165
they could also just
try all possibilities.
2988
02:21:51,165 --> 02:21:53,040
And that would be an
attack on the algorithm.
2989
02:21:53,040 --> 02:21:55,310
This is not a sophisticated
algorithm, but it's
2990
02:21:55,310 --> 02:21:56,970
enough to send a message in class.
2991
02:21:56,970 --> 02:21:58,940
So if the two inputs now are HI!
2992
02:21:58,940 --> 02:22:04,280
as the plain text message, and 1 as
the so-called key, the secret number
2993
02:22:04,280 --> 02:22:06,950
that only you and the
other person know, you
2994
02:22:06,950 --> 02:22:11,040
might be able to encrypt a
message from one way to the other.
2995
02:22:11,040 --> 02:22:13,400
And so in this case, for instance, HI!
2996
02:22:13,400 --> 02:22:16,198
would become I-J-!.
2997
02:22:16,198 --> 02:22:17,990
In this version of the
algorithm, we're not
2998
02:22:17,990 --> 02:22:19,823
going to bother with
numbers or punctuation.
2999
02:22:19,823 --> 02:22:23,090
We'll only operate on A through
Z, be it uppercase or lowercase.
3000
02:22:23,090 --> 02:22:28,250
So now if you were to receive a slip
of paper in class with I-J on it,
3001
02:22:28,250 --> 02:22:31,290
you, the recipient,
would know what it is
3002
02:22:31,290 --> 02:22:33,440
so long as you know that
the sender used one,
3003
02:22:33,440 --> 02:22:36,500
because you just reverse the algorithm
and you subtract one instead.
3004
02:22:36,500 --> 02:22:39,110
The teacher, they probably
don't know what this means,
3005
02:22:39,110 --> 02:22:41,443
and they're not going to spend
time hacking the message,
3006
02:22:41,443 --> 02:22:42,975
so it just looks scrambled to them.
3007
02:22:42,975 --> 02:22:44,600
And that's what we get from encryption.
3008
02:22:44,600 --> 02:22:47,430
Someone who intercepts it, be it
in class or in the real world,
3009
02:22:47,430 --> 02:22:51,080
on the internet or anywhere else,
can't actually figure out, ideally,
3010
02:22:51,080 --> 02:22:52,700
what it is you have sent.
3011
02:22:52,700 --> 02:22:55,130
The opposite, of course, is
indeed called decryption,
3012
02:22:55,130 --> 02:22:56,300
but the process is the same.
3013
02:22:56,300 --> 02:22:58,370
We now pass in negative 1.
3014
02:22:58,370 --> 02:23:00,300
And so how about this?
3015
02:23:00,300 --> 02:23:02,840
Why don't we end with
a demonstration here?
3016
02:23:02,840 --> 02:23:08,360
UIJT XBT DT50-- there's
a bit of a tell there.
3017
02:23:08,360 --> 02:23:11,060
If we pass that in and
do negative 1, well,
3018
02:23:11,060 --> 02:23:14,180
how do we get out the
plaintext originally?
3019
02:23:14,180 --> 02:23:18,200
Well, if this is the ciphertext,
and we subtract 1 from each letter,
3020
02:23:18,200 --> 02:23:28,010
I think U becomes T, I becomes H, J
becomes I, T becomes S, X becomes W,
3021
02:23:28,010 --> 02:23:37,580
B becomes A, T becomes S, D becomes C,
T becomes S, and this was, indeed, CS50.
3022
02:23:37,580 --> 02:23:40,250
Have a duck on your way out,
and some snacks in the lobby.
3023
02:23:40,250 --> 02:23:42,350
[APPLAUSE]
3024
02:23:42,350 --> 02:23:43,850
[FILM ROLLING]
3025
02:23:43,850 --> 02:23:47,500
[MUSIC PLAYING]
3026
02:23:47,500 --> 02:24:19,000253018
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.