Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:01:17,581
[MUSIC PLAYING]
1
00:01:17,581 --> 00:01:18,631
2
00:01:18,631 --> 00:01:22,651
DAVID J. MALAN: Well, this is CS50, and already this is week four,
3
00:01:22,651 --> 00:01:24,631
and recall that last week, week three, we
4
00:01:24,631 --> 00:01:27,571
began to explore the inside of a computer's memory a bit more.
5
00:01:27,571 --> 00:01:30,631
We talked about arrays, which were just chunks of memory
6
00:01:30,631 --> 00:01:33,451
back to back to back that really lay things out left to right, top
7
00:01:33,451 --> 00:01:36,721
to bottom, and this is actually a pretty common paradigm, even if you're
8
00:01:36,721 --> 00:01:38,761
new to programming, and certainly new to C.
9
00:01:38,761 --> 00:01:43,771
You've seen this approach of just using memory in some way to lay things out,
10
00:01:43,771 --> 00:01:45,161
like images, for instance.
11
00:01:45,161 --> 00:01:50,371
So for instance, here is a photo taken of last week's front row, for instance,
12
00:01:50,371 --> 00:01:53,791
and this is an opportunity to explore exactly what happens
13
00:01:53,791 --> 00:01:56,911
if we start to zoom in and zoom in and zoom in, because it seems like most
14
00:01:56,911 --> 00:02:00,661
any TV show like CSI, or whatever, or any movie that
15
00:02:00,661 --> 00:02:06,601
explores forensic information might have the investigators zoom in
16
00:02:06,601 --> 00:02:09,994
on an image like this to see what the glint in someone's eye
17
00:02:09,994 --> 00:02:12,661
is because that reveals the license plate number of someone that
18
00:02:12,661 --> 00:02:13,556
just drove past.
19
00:02:13,556 --> 00:02:15,431
Something that's a little over the top there,
20
00:02:15,431 --> 00:02:18,661
but there's an opportunity here to speak to why that is so unrealistic.
21
00:02:18,661 --> 00:02:21,661
For instance, let's zoom on this puppet here's eye and let's
22
00:02:21,661 --> 00:02:23,971
zoom in a little more to see what might be reflected.
23
00:02:23,971 --> 00:02:26,581
Let's zoom in a little more, and that's it.
24
00:02:26,581 --> 00:02:29,051
There's only finite amount of information
25
00:02:29,051 --> 00:02:31,171
if you have an image represented in this way.
26
00:02:31,171 --> 00:02:34,321
We're using pixels-- these dots on the screen as rows and columns--
27
00:02:34,321 --> 00:02:36,781
because if you're only using a finite amount of memory
28
00:02:36,781 --> 00:02:40,111
then at the end of the day, you can only store a finite amount of information.
29
00:02:40,111 --> 00:02:43,921
At least I don't really see in this grid here any glint of a license plate
30
00:02:43,921 --> 00:02:46,651
or something like that that you might otherwise see in Hollywood.
31
00:02:46,651 --> 00:02:49,681
So today we'll explore these kinds of representations
32
00:02:49,681 --> 00:02:52,501
of how you might use memory in new and interesting ways
33
00:02:52,501 --> 00:02:55,861
to represent now, very familiar things, but also
34
00:02:55,861 --> 00:02:59,071
start to explore what some of the limitations are of this representation.
35
00:02:59,071 --> 00:03:02,851
But consider after all that this doesn't need to be even as high resolution,
36
00:03:02,851 --> 00:03:05,161
as many pixels as something like this other image,
37
00:03:05,161 --> 00:03:09,131
you can imagine just doing something silly with Post-It notes, like this.
38
00:03:09,131 --> 00:03:11,821
And if you think of an image as just having rows and columns,
39
00:03:11,821 --> 00:03:14,131
these rows otherwise known as scan lines-- something
40
00:03:14,131 --> 00:03:17,701
we'll explore in the coming week-- you could make this fun smiley face
41
00:03:17,701 --> 00:03:22,111
by just using two different values, maybe a zero and a one.
42
00:03:22,111 --> 00:03:26,141
Or yellow and purple, or vice versa, just to make something come to life.
43
00:03:26,141 --> 00:03:30,331
Now in practice, recall we talked about storing not just a zero or one,
44
00:03:30,331 --> 00:03:37,414
but maybe an R, a G, and a B value-- like 24 bits, or three bytes in total--
45
00:03:37,414 --> 00:03:38,581
but we'll come back to that.
46
00:03:38,581 --> 00:03:40,289
That would just be a more involved image.
47
00:03:40,289 --> 00:03:46,111
But for fun, if today you want to tackle something passively in the background,
48
00:03:46,111 --> 00:03:49,531
if you go to this URL here, we've put together an opportunity
49
00:03:49,531 --> 00:03:52,201
to do a bit of pixel art.
50
00:03:52,201 --> 00:03:55,801
If you go to this URL here, that'll redirect you to a Google Spreadsheet.
51
00:03:55,801 --> 00:03:58,141
If you have a laptop with you today that'll
52
00:03:58,141 --> 00:04:01,541
look a little something like this, which we've organized in rows and columns.
53
00:04:01,541 --> 00:04:05,881
So if you'd like to go ahead and use Google Spreadsheet's colorization
54
00:04:05,881 --> 00:04:09,331
feature to color in those individual squares if you'd like,
55
00:04:09,331 --> 00:04:12,751
see if you can't make something a little creative and then email it to Carter
56
00:04:12,751 --> 00:04:16,841
and we'll exhibit some of the best or favorites on the website thereafter.
57
00:04:16,841 --> 00:04:20,064
So let's transition then to something a little more familiar-- images.
58
00:04:20,064 --> 00:04:22,231
And not all of you have used, presumably, Photoshop,
59
00:04:22,231 --> 00:04:25,481
but you're probably generally familiar with Photoshop as a program for editing
60
00:04:25,481 --> 00:04:27,701
and creating images or photos or the like.
61
00:04:27,701 --> 00:04:30,631
And here is a screenshot of p's color picker,
62
00:04:30,631 --> 00:04:32,618
via which you can change what color you're
63
00:04:32,618 --> 00:04:34,951
going to draw with the paint brush, or what color you're
64
00:04:34,951 --> 00:04:36,931
going to fill in with the paint bucket.
65
00:04:36,931 --> 00:04:39,031
It's representative of any kind of graphical tool.
66
00:04:39,031 --> 00:04:41,441
And there's a lot of information in here,
67
00:04:41,441 --> 00:04:43,921
but there's perhaps some familiar terms now--
68
00:04:43,921 --> 00:04:47,791
R, G, and B. In fact, right now this is Photoshop's way
69
00:04:47,791 --> 00:04:50,491
of saying you're about to fill in your background or foreground
70
00:04:50,491 --> 00:04:52,681
with the color black, and that appears to be
71
00:04:52,681 --> 00:04:56,131
represented with an R, a G, and a B value of zero, zero, zero.
72
00:04:56,131 --> 00:05:01,981
Or alternatively, using a hash symbol and then 000000.
73
00:05:01,981 --> 00:05:04,441
And if some of you have already made web pages before
74
00:05:04,441 --> 00:05:06,331
and you know a little bit of HTML and CSS,
75
00:05:06,331 --> 00:05:08,671
you probably are familiar with this kind of syntax--
76
00:05:08,671 --> 00:05:12,531
a hash symbol and then six, or sometimes three digits thereafter.
77
00:05:12,531 --> 00:05:15,031
And if we look at a few different colors here, for instance,
78
00:05:15,031 --> 00:05:17,131
here might be the representation of white.
79
00:05:17,131 --> 00:05:23,311
Now the R, the G, and the B values went way up from 0 to 255, 255, 255.
80
00:05:23,311 --> 00:05:28,111
Or alternatively, it looks like Photoshop, and in turn web browsers,
81
00:05:28,111 --> 00:05:31,589
could represent that same color white with FFFFFF.
82
00:05:31,589 --> 00:05:32,881
And let's just do a few others.
83
00:05:32,881 --> 00:05:37,621
Here is red, and it turns out that red is a whole lot of red, 255,
84
00:05:37,621 --> 00:05:39,181
but no green, no blue.
85
00:05:39,181 --> 00:05:40,326
Or, a.k.a.
86
00:05:40,326 --> 00:05:42,549
FF0000.
87
00:05:42,549 --> 00:05:44,341
So there's perhaps a pattern here emerging.
88
00:05:44,341 --> 00:05:48,421
Here is green, zero, 255, zero, a.k.a.
89
00:05:48,421 --> 00:05:52,661
00FF00, or lastly, here blue, which is no red,
90
00:05:52,661 --> 00:05:56,371
no green but apparently a lot of blue, 255 again, a.k.a.
91
00:05:56,371 --> 00:05:58,471
0000FF.
92
00:05:58,471 --> 00:06:01,861
Now some of you, again, might have seen this notation before,
93
00:06:01,861 --> 00:06:05,071
these zeros and these F's and all of the numbers and letters in between,
94
00:06:05,071 --> 00:06:06,844
but this is another form of notation.
95
00:06:06,844 --> 00:06:08,761
And in fact, we'll explore this today-- really
96
00:06:08,761 --> 00:06:11,491
is just a precondition for talking about some other concepts.
97
00:06:11,491 --> 00:06:14,641
But the ideas, ultimately, are really no different.
98
00:06:14,641 --> 00:06:17,821
What we're about to see is a different base system--
99
00:06:17,821 --> 00:06:19,951
not just binary, not just decimal, but something
100
00:06:19,951 --> 00:06:21,871
we're about to call hexadecimal.
101
00:06:21,871 --> 00:06:25,831
But first, recall that with RGB we previously did the following.
102
00:06:25,831 --> 00:06:28,231
Any RGB value-- red, green, blue-- just combine
103
00:06:28,231 --> 00:06:30,761
some amount of red or green or blue.
104
00:06:30,761 --> 00:06:35,341
So here we have 72, 73, 33, which in the context of an email or text, of course,
105
00:06:35,341 --> 00:06:36,901
said what--
106
00:06:36,901 --> 00:06:38,401
a couple of weeks back?
107
00:06:38,401 --> 00:06:40,891
Just hi with an exclamation point, but in the context
108
00:06:40,891 --> 00:06:45,121
of a Photoshop-like program, this might instead be representing,
109
00:06:45,121 --> 00:06:47,558
collectively, this shade of yellow, for instance,
110
00:06:47,558 --> 00:06:50,141
when you combine that much red that much green that much blue.
111
00:06:50,141 --> 00:06:51,451
So here is the same idea.
112
00:06:51,451 --> 00:06:53,701
If you've got a lot of red, no green, no blue,
113
00:06:53,701 --> 00:06:55,291
together that's going to give us red.
114
00:06:55,291 --> 00:06:58,081
If you've got no red, a lot of green, no blue,
115
00:06:58,081 --> 00:06:59,851
that's going to give us, of course, green.
116
00:06:59,851 --> 00:07:03,169
If you've got no red, no green, a lot of blue, that of course,
117
00:07:03,169 --> 00:07:04,211
is going to give us blue.
118
00:07:04,211 --> 00:07:08,401
So there's a pattern emerging here where apparently 00 is none, as always,
119
00:07:08,401 --> 00:07:10,591
and FF is apparently a lot.
120
00:07:10,591 --> 00:07:17,281
And it's maybe somehow equated with 255, at least per that Photoshop screenshot.
121
00:07:17,281 --> 00:07:20,551
Meanwhile, if we combine one last one, a lot of red, a lot of green,
122
00:07:20,551 --> 00:07:21,631
a lot of blue--
123
00:07:21,631 --> 00:07:25,359
that's actually going to give us a single white pixel like this.
124
00:07:25,359 --> 00:07:26,401
All right, so think back.
125
00:07:26,401 --> 00:07:30,119
Here was binary-- in the world of binary you had just two digits, zero and one.
126
00:07:30,119 --> 00:07:31,411
Could have been anything else--
127
00:07:31,411 --> 00:07:36,541
A or B, X or Y, but the world standardized on these numerals
128
00:07:36,541 --> 00:07:37,381
zero and one.
129
00:07:37,381 --> 00:07:40,591
In our world's decimal system, of course, you have zero through nine.
130
00:07:40,591 --> 00:07:44,101
As of today though, we're going to start using hexadecimal sometimes
131
00:07:44,101 --> 00:07:47,986
in the context of images and also files just because it's a convention
132
00:07:47,986 --> 00:07:49,834
and there's some conveniences to it.
133
00:07:49,834 --> 00:07:51,751
Where now, you're going to be able to count up
134
00:07:51,751 --> 00:07:54,601
to F in a notation called hexadecimal.
135
00:07:54,601 --> 00:07:59,671
From zero through nine, then you keep going to A to B to C to D to E to F,
136
00:07:59,671 --> 00:08:02,641
the idea being each of these, even though it's weirdly
137
00:08:02,641 --> 00:08:06,781
a letter of the English alphabet, it's still just a single symbol.
138
00:08:06,781 --> 00:08:12,241
It's not one zero for 10, or 1 1 for eleven-- all 16 of these values,
139
00:08:12,241 --> 00:08:15,601
these digits, so to speak, are indeed still just single symbols,
140
00:08:15,601 --> 00:08:19,211
and that's a characteristic of just using this other notational system.
141
00:08:19,211 --> 00:08:24,751
So how do we get from 00 and FF to something like 0 and 255, respectively?
142
00:08:24,751 --> 00:08:26,761
Well, this hexadecimal system, a.k.a.
143
00:08:26,761 --> 00:08:30,186
Base 16, just does the math from week zero and really,
144
00:08:30,186 --> 00:08:31,811
grade school, a little bit differently.
145
00:08:31,811 --> 00:08:34,981
For instance, if you have a number that's got two digits,
146
00:08:34,981 --> 00:08:38,921
or hexadecimal digits as of today, the columns are just a little different.
147
00:08:38,921 --> 00:08:42,511
Instead of powers of two or powers of 10, which we saw for binary and decimal
148
00:08:42,511 --> 00:08:45,271
respectively, it's powers of 16.
149
00:08:45,271 --> 00:08:48,001
So if we just do the math out, that's the ones column,
150
00:08:48,001 --> 00:08:50,731
this is the 16s column, and so forth.
151
00:08:50,731 --> 00:08:53,741
Things get actually pretty big pretty quickly in this system.
152
00:08:53,741 --> 00:08:56,746
But now let's just consider how we would represent familiar numbers.
153
00:08:56,746 --> 00:08:59,371
If you've got two hexadecimal digits for which these hashes are
154
00:08:59,371 --> 00:09:02,431
just placeholders, zero, zero is going to mathematically
155
00:09:02,431 --> 00:09:04,931
equal the decimal number you and I know, of course, as zero.
156
00:09:04,931 --> 00:09:05,431
Why?
157
00:09:05,431 --> 00:09:06,721
Same thing as week zero--
158
00:09:06,721 --> 00:09:11,041
16 times zero plus one times zero is the number you and I know as zero.
159
00:09:11,041 --> 00:09:12,521
And we can count up from here.
160
00:09:12,521 --> 00:09:15,031
This, in hexadecimal, would be how a computer
161
00:09:15,031 --> 00:09:16,831
represents the number we know as one.
162
00:09:16,831 --> 00:09:18,821
It would be zero one in this case.
163
00:09:18,821 --> 00:09:24,181
This would be two, three, four, five, six, seven, eight, nine--
164
00:09:24,181 --> 00:09:26,141
in decimal, we're about to go to 10.
165
00:09:26,141 --> 00:09:29,211
But in hexadecimal, to be clear, what comes next?
166
00:09:29,211 --> 00:09:38,021
So, apparently A, so 0A, 0B, which is now 10, or 11, or 12, 13, 14, 15.
167
00:09:38,021 --> 00:09:41,111
So using hexadecimal is just an interesting way
168
00:09:41,111 --> 00:09:44,951
of using single symbols now, zero through F,
169
00:09:44,951 --> 00:09:47,901
to count from zero through 15.
170
00:09:47,901 --> 00:09:50,651
And we'll see why it's 15 in a moment, but as soon as we get to F,
171
00:09:50,651 --> 00:09:54,821
anyone want to conjecture how in hexadecimal, a.k.a. hex,
172
00:09:54,821 --> 00:09:57,731
do we now count up one position higher?
173
00:09:57,731 --> 00:10:01,431
What comes after 0F in hexadecimal?
174
00:10:01,431 --> 00:10:03,701
So, one zero-- it's the same kind of thing--
175
00:10:03,701 --> 00:10:05,866
once you're at the highest digit possible, F--
176
00:10:05,866 --> 00:10:07,991
or in our decimal world that would have been nine--
177
00:10:07,991 --> 00:10:11,111
you add one more, nine wraps around to zero, or in this case,
178
00:10:11,111 --> 00:10:12,821
F wraps around to zero.
179
00:10:12,821 --> 00:10:15,791
You carry the one and voila-- now we're representing
180
00:10:15,791 --> 00:10:17,511
the number you and I know as 16.
181
00:10:17,511 --> 00:10:19,451
And we could keep going forever, literally.
182
00:10:19,451 --> 00:10:23,186
This could be 17, 18, 19, 20, and decimal--
183
00:10:23,186 --> 00:10:25,061
but let's just wave our hands at it and count
184
00:10:25,061 --> 00:10:27,821
as high as we can-- dot, dot, dot-- the highest
185
00:10:27,821 --> 00:10:31,181
we could count in hexadecimal with two digits, just logically,
186
00:10:31,181 --> 00:10:32,981
would be what, in hexadecimal?
187
00:10:32,981 --> 00:10:35,091
Something, something.
188
00:10:35,091 --> 00:10:35,951
FF, I heard.
189
00:10:35,951 --> 00:10:39,531
So yes, that's the biggest digit possible, so FF is what we have.
190
00:10:39,531 --> 00:10:43,163
So how high can you count in hexadecimal if you've got just two of these digits?
191
00:10:43,163 --> 00:10:44,621
Well, it's the same math as always.
192
00:10:44,621 --> 00:10:46,571
16 times F, a.k.a.
193
00:10:46,571 --> 00:10:52,941
15, so that's 16 times 15 plus one times F, or one times 15--
194
00:10:52,941 --> 00:10:57,341
that gives us 240 plus 15 in decimal, the result of which, of course, now
195
00:10:57,341 --> 00:10:59,421
is 255.
196
00:10:59,421 --> 00:11:02,511
So this hexadecimal system-- you may have seen in the world of web pages,
197
00:11:02,511 --> 00:11:05,261
and if you haven't we'll get to that in this class in a few weeks,
198
00:11:05,261 --> 00:11:07,991
or we just saw in the context of Photoshop-- just
199
00:11:07,991 --> 00:11:14,141
has this shorthand notation of counting as high as 255 but just calling it FF.
200
00:11:14,141 --> 00:11:17,771
Now it's marginal, but that's like 50% savings of how many digits
201
00:11:17,771 --> 00:11:21,491
you need in order to count as high as 255 because in decimal, of course,
202
00:11:21,491 --> 00:11:23,321
255 is three digits.
203
00:11:23,321 --> 00:11:27,131
In hexadecimal you can count as high using just two,
204
00:11:27,131 --> 00:11:30,489
and that difference is going to get magnified the bigger our numbers get.
205
00:11:30,489 --> 00:11:33,281
Let me stipulate for now, you're going to get more and more savings
206
00:11:33,281 --> 00:11:36,431
in terms of just how many symbols you need on the screen to represent
207
00:11:36,431 --> 00:11:39,881
bigger and bigger numbers than that.
208
00:11:39,881 --> 00:11:43,301
All right, let me pause here just to see if there's any questions thus far
209
00:11:43,301 --> 00:11:46,721
on what we've called hexadecimal, which again, just gives us zero through nine
210
00:11:46,721 --> 00:11:53,408
as well as A through F. Any questions or confusion?
211
00:11:53,408 --> 00:11:55,991
And if it feels like we're lingering a bit much on arithmetic,
212
00:11:55,991 --> 00:11:59,331
we're not really going to see other notations besides this moving forward.
213
00:11:59,331 --> 00:12:03,461
These are the go-to three in a programmer's world, typically.
214
00:12:03,461 --> 00:12:04,671
But there are some others.
215
00:12:04,671 --> 00:12:06,240
Yeah.
216
00:12:06,240 --> 00:12:08,532
AUDIENCE: Does the hexadecimal symbol take more storage
217
00:12:08,532 --> 00:12:11,251
than the decimal system?
218
00:12:11,251 --> 00:12:12,501
DAVID J. MALAN: Good question.
219
00:12:12,501 --> 00:12:16,611
Does hexadecimal require more storage or less storage than the decimal system?
220
00:12:16,611 --> 00:12:20,841
Theoretically no, because this is just a way of representing information
221
00:12:20,841 --> 00:12:23,721
and we'll see in a concrete example in a moment.
222
00:12:23,721 --> 00:12:27,111
But inside of the computer, at the end of the day, you're still storing bits.
223
00:12:27,111 --> 00:12:30,228
And using hexadecimal is not using more or fewer bits,
224
00:12:30,228 --> 00:12:32,061
think of this as how you might write it down
225
00:12:32,061 --> 00:12:34,971
on a piece of paper, just how many digits you're going to write
226
00:12:34,971 --> 00:12:37,941
or on a computer screen, how many digits you're going to see at once,
227
00:12:37,941 --> 00:12:41,211
but it doesn't change how the computer is representing information
228
00:12:41,211 --> 00:12:44,331
because all they're representing at the end of the day is zeros and ones.
229
00:12:44,331 --> 00:12:45,621
So in fact, let's go there.
230
00:12:45,621 --> 00:12:49,851
If this-- a moment ago FF I claimed was 255--
231
00:12:49,851 --> 00:12:51,891
let's just rewind to week zero and if we wanted
232
00:12:51,891 --> 00:12:56,391
to count to 255 in binary, that's as high as you can count, recall,
233
00:12:56,391 --> 00:12:57,411
with eight bits.
234
00:12:57,411 --> 00:12:59,244
And there's only a few of these numbers that
235
00:12:59,244 --> 00:13:03,081
are useful to memorize, like 255 is as high as you can count with eight bits
236
00:13:03,081 --> 00:13:06,981
if you start at zero, because two to the eighth is 256, but if you start at zero
237
00:13:06,981 --> 00:13:09,471
it's zero through 255.
238
00:13:09,471 --> 00:13:13,671
So in binary, recall if you have eight bits, all of which were ones,
239
00:13:13,671 --> 00:13:15,991
and I won't do out the math pedantically here,
240
00:13:15,991 --> 00:13:18,366
but if I do do this plus this plus this, dot, dot,
241
00:13:18,366 --> 00:13:21,391
dot-- that's also going to give me 255.
242
00:13:21,391 --> 00:13:24,441
So this is what's interesting here about hexadecimal.
243
00:13:24,441 --> 00:13:28,851
It turns out that an upside of storing values in hexadecimal
244
00:13:28,851 --> 00:13:32,571
is that we're going to see the first F represents
245
00:13:32,571 --> 00:13:35,901
the left half of all these bits, and the second F in this case
246
00:13:35,901 --> 00:13:38,431
represents the rightmost four of these bits.
247
00:13:38,431 --> 00:13:41,061
So it turns out hexadecimal is very useful when you
248
00:13:41,061 --> 00:13:44,031
want to treat data in units of four.
249
00:13:44,031 --> 00:13:47,181
It's not quite eight, but units of four, and that's not bad.
250
00:13:47,181 --> 00:13:50,271
Which is why-- if you use two digits like I have thus far,
251
00:13:50,271 --> 00:13:53,061
00 or FF or anything in between--
252
00:13:53,061 --> 00:13:57,921
that's actually a convenient way of representing eight bits in total.
253
00:13:57,921 --> 00:14:02,091
One hex digit for the first four bits, one hex digit for the second.
254
00:14:02,091 --> 00:14:04,791
And again, there's nothing new intellectually here per se,
255
00:14:04,791 --> 00:14:08,571
it's just a different way of representing the same story as before--
256
00:14:08,571 --> 00:14:09,651
zeros and ones.
257
00:14:09,651 --> 00:14:11,491
So in what context do we see this?
258
00:14:11,491 --> 00:14:12,831
Well, we talked about memory last week, and we're
259
00:14:12,831 --> 00:14:14,414
going to talk more about it this week.
260
00:14:14,414 --> 00:14:16,941
If this is my computer's RAM-- random access memory--
261
00:14:16,941 --> 00:14:21,111
you can again think of each byte as having a number associated with it--
262
00:14:21,111 --> 00:14:22,671
its address or location.
263
00:14:22,671 --> 00:14:26,991
This might be zero, this might be 2 billion, and so in the past
264
00:14:26,991 --> 00:14:29,781
I've described these as just this, using decimal numbers.
265
00:14:29,781 --> 00:14:34,131
Here's byte zero, one, two, three, four, five, six, seven, 15, 16
266
00:14:34,131 --> 00:14:35,581
would be here, and so forth.
267
00:14:35,581 --> 00:14:40,071
But it turns out in the world of memory, and thus today, programming, people
268
00:14:40,071 --> 00:14:44,691
tend to count memory bytes using hexadecimal.
269
00:14:44,691 --> 00:14:46,881
Partly just by convention, but also partly
270
00:14:46,881 --> 00:14:49,581
because it's a little more succinct and again, each digit
271
00:14:49,581 --> 00:14:52,641
represents four bits, typically.
272
00:14:52,641 --> 00:14:54,396
So what comes after F here?
273
00:14:54,396 --> 00:14:56,271
Well, if I think about the computer's memory,
274
00:14:56,271 --> 00:15:01,311
I normally might do after F, which is 15, 16.
275
00:15:01,311 --> 00:15:05,931
But instead, one zero, one one, one two, one three-- this
276
00:15:05,931 --> 00:15:10,551
is not 10, 11, 12, 13, because I claim I'm in the context of hexadecimal now.
277
00:15:10,551 --> 00:15:12,621
As per the previous slide, we already started
278
00:15:12,621 --> 00:15:15,441
going into A's through F's, so you immediately
279
00:15:15,441 --> 00:15:18,111
see here a possible problem.
280
00:15:18,111 --> 00:15:21,081
Why is this now worrisome, if all of a sudden you're
281
00:15:21,081 --> 00:15:26,791
seeing seemingly familiar numbers like 10, 11, 12, 13?
282
00:15:26,791 --> 00:15:28,928
We didn't really stumble across this problem
283
00:15:28,928 --> 00:15:30,511
when it was all zeros and ones before.
284
00:15:30,511 --> 00:15:31,614
Yeah.
285
00:15:31,614 --> 00:15:33,156
AUDIENCE: Try to do math [INAUDIBLE].
286
00:15:35,284 --> 00:15:37,951
DAVID J. MALAN: Yeah, so if you're writing some code in C that's
287
00:15:37,951 --> 00:15:39,809
doing some math, you might accidentally--
288
00:15:39,809 --> 00:15:42,601
or the computer might accidentally confuse hexadecimal with decimal
289
00:15:42,601 --> 00:15:45,161
if they look in some context the same.
290
00:15:45,161 --> 00:15:47,251
Any number on the board that doesn't have a letter
291
00:15:47,251 --> 00:15:51,041
is ambiguously hexadecimal or decimal at this point,
292
00:15:51,041 --> 00:15:52,751
and so how might we resolve this?
293
00:15:52,751 --> 00:15:55,711
Well, it turns out that what computers typically do is this.
294
00:15:55,711 --> 00:16:00,481
By convention, any time you see 0x and then a number,
295
00:16:00,481 --> 00:16:02,911
that's a human convention of saying--
296
00:16:02,911 --> 00:16:06,371
signaling to the reader that this is in fact a hexadecimal number.
297
00:16:06,371 --> 00:16:10,441
So if it's 0x10, that is not the number 10,
298
00:16:10,441 --> 00:16:15,611
that is the hexadecimal number one zero, which recall we said earlier,
299
00:16:15,611 --> 00:16:18,631
is how you count up to 16.
300
00:16:18,631 --> 00:16:21,151
And again, these are not the kinds of things to memorize,
301
00:16:21,151 --> 00:16:24,561
it's really just the system for how you think about these things.
302
00:16:24,561 --> 00:16:27,061
So henceforth today, we're going to start seeing hexadecimal
303
00:16:27,061 --> 00:16:28,471
in a bunch of contexts.
304
00:16:28,471 --> 00:16:31,501
When you write code, you might even write code using some hexadecimal
305
00:16:31,501 --> 00:16:34,001
but again, it's just a different way of representing numbers
306
00:16:34,001 --> 00:16:37,261
and humans have different conventions for different contexts.
307
00:16:37,261 --> 00:16:40,771
All right, so with that said, any questions now on this building block?
308
00:16:40,771 --> 00:16:46,321
But here on out, we'll start using it in some actual code.
309
00:16:46,321 --> 00:16:48,011
Any questions?
310
00:16:48,011 --> 00:16:49,581
Nothing so far?
311
00:16:49,581 --> 00:16:50,081
All right.
312
00:16:50,081 --> 00:16:53,821
So, let's go ahead and consider maybe a familiar example.
313
00:16:53,821 --> 00:16:57,571
Something where involving code, where I initialize a variable like n
314
00:16:57,571 --> 00:16:59,389
to a value like 50, in this case.
315
00:16:59,389 --> 00:17:01,681
And then let's start to tinker around with what's going
316
00:17:01,681 --> 00:17:03,391
on inside of the computer's memory.
317
00:17:03,391 --> 00:17:06,191
In a moment I'm going to load up VS Code on my computer
318
00:17:06,191 --> 00:17:09,511
and I'm going to go ahead and whip up a program that very simply assigns
319
00:17:09,511 --> 00:17:13,231
a value like the number 50 to a variable called n,
320
00:17:13,231 --> 00:17:19,036
but today, keep in mind that that variable n and that value 50
321
00:17:19,036 --> 00:17:21,404
is going to be stored somewhere in my computer's memory,
322
00:17:21,404 --> 00:17:24,571
and it turns out today we'll introduce a bit more syntax so you can actually
323
00:17:24,571 --> 00:17:27,011
see where things are being stored.
324
00:17:27,011 --> 00:17:28,711
So let me click over to VS Code here.
325
00:17:28,711 --> 00:17:31,681
I'm going to create a program called address.c just
326
00:17:31,681 --> 00:17:34,171
to explore computer's addresses today, and I'm
327
00:17:34,171 --> 00:17:38,701
going to do an include stdio.h, int main(void), as usual.
328
00:17:38,701 --> 00:17:40,441
No command line arguments for now.
329
00:17:40,441 --> 00:17:43,043
I'm going to declare that variable n equals 50,
330
00:17:43,043 --> 00:17:45,251
and then I'm just going to go ahead and print it out.
331
00:17:45,251 --> 00:17:50,731
So nothing very interesting but I'll use %i backslash n and then comma n
332
00:17:50,731 --> 00:17:52,321
to print out that value.
333
00:17:52,321 --> 00:17:55,311
Nothing here should be very interesting to compile or run,
334
00:17:55,311 --> 00:17:57,811
but I'll do it just to make sure I didn't make any mistakes.
335
00:17:57,811 --> 00:18:03,301
Looks like as expected, it simply prints out the number 50, like this.
336
00:18:03,301 --> 00:18:06,781
But let's consider then, what this code is doing underneath the hood
337
00:18:06,781 --> 00:18:09,521
when it's actually run on your machine.
338
00:18:09,521 --> 00:18:11,401
So here we have that grid of memory.
339
00:18:11,401 --> 00:18:15,451
That variable n is an int, and if you think back,
340
00:18:15,451 --> 00:18:19,051
how many bytes typically do we use for an int?
341
00:18:19,051 --> 00:18:20,131
Yeah.
342
00:18:20,131 --> 00:18:22,690
Four, so four bytes, or 32 bits.
343
00:18:22,690 --> 00:18:26,491
So if each of these squares represents one byte, then my computer, somewhere
344
00:18:26,491 --> 00:18:29,813
in my memory, or RAM, is using four of these squares.
345
00:18:29,813 --> 00:18:32,521
Maybe it ends up over here just because there's other stuff being
346
00:18:32,521 --> 00:18:33,731
used elsewhere, for instance.
347
00:18:33,731 --> 00:18:35,481
Though I don't really know, and frankly, I
348
00:18:35,481 --> 00:18:38,273
don't really care where it ends up, just that it ends up somewhere.
349
00:18:38,273 --> 00:18:41,940
So the variable-- the value 50 is stored here in a variable called n.
350
00:18:41,940 --> 00:18:45,581
Even though I've written it as decimal, just like in my code--
351
00:18:45,581 --> 00:18:50,184
let me again remind that this is 32 zeros and ones representing that 50--
352
00:18:50,184 --> 00:18:53,351
it's just going to be very tedious if we start writing everything in binary,
353
00:18:53,351 --> 00:18:56,351
so I'll use the more comfortable human decimal system.
354
00:18:56,351 --> 00:18:59,141
So that's what's going on inside of the computer's memory.
355
00:18:59,141 --> 00:19:03,571
So what if I actually wanted to start tinkering with its location,
356
00:19:03,571 --> 00:19:06,091
or maybe just knowing its location?
357
00:19:06,091 --> 00:19:09,901
Well, this variable n indeed has a name, n--
358
00:19:09,901 --> 00:19:13,763
that's a label of sorts for it-- but at the end of the day that 50 is
359
00:19:13,763 --> 00:19:16,471
technically at a specific address, and I'm going to make one up--
360
00:19:16,471 --> 00:19:19,501
0x123, and it's 123 because I really don't
361
00:19:19,501 --> 00:19:22,421
care what it is, I just want an address for the sake of discussion.
362
00:19:22,421 --> 00:19:28,951
So way over here off screen might be byte zero, way down here is byte 0x123.
363
00:19:28,951 --> 00:19:32,861
It's in hexadecimal notation just by convention.
364
00:19:32,861 --> 00:19:36,691
So how can I actually see where my variables are ending up
365
00:19:36,691 --> 00:19:38,341
in memory if I'm curious to do so?
366
00:19:38,341 --> 00:19:41,821
Well, let me go back to my code here and let me actually
367
00:19:41,821 --> 00:19:44,081
change this just a little bit.
368
00:19:44,081 --> 00:19:49,381
Let me go ahead and introduce, for instance, another symbol
369
00:19:49,381 --> 00:19:53,581
here and another topic altogether, namely pointers.
370
00:19:53,581 --> 00:19:59,111
So a pointer is a variable that stores the address of some value--
371
00:19:59,111 --> 00:20:02,371
the location of some value or more specifically,
372
00:20:02,371 --> 00:20:05,681
the specific byte in which that value is stored.
373
00:20:05,681 --> 00:20:08,941
So again, if you think of your memory as being a whole bunch of bytes--
374
00:20:08,941 --> 00:20:11,701
zero at top left, 2 billion or whatever at bottom right,
375
00:20:11,701 --> 00:20:13,201
depending on how much RAM you have--
376
00:20:13,201 --> 00:20:15,481
each of those things has a location, or an address.
377
00:20:15,481 --> 00:20:19,571
A pointer is just a variable storing one such address.
378
00:20:19,571 --> 00:20:24,751
So it turns out that in the world of C, there's a couple of new symbols
379
00:20:24,751 --> 00:20:29,111
we can use if we want to see what it is we're talking about here,
380
00:20:29,111 --> 00:20:32,041
and those two operators, as of today, are these.
381
00:20:32,041 --> 00:20:35,831
You can use the ampersand operator in C in a couple of ways.
382
00:20:35,831 --> 00:20:38,761
We already saw it very briefly to do ampersand ampersand--
383
00:20:38,761 --> 00:20:42,271
it's kind of and two Boolean expressions together
384
00:20:42,271 --> 00:20:43,811
in the context of a conditional.
385
00:20:43,811 --> 00:20:44,821
This is different.
386
00:20:44,821 --> 00:20:48,631
A single ampersand is the address of operator.
387
00:20:48,631 --> 00:20:52,651
So literally, in your code, if you've got a variable like n or anything else
388
00:20:52,651 --> 00:20:57,901
and you write &n, C is going to figure out for you what is the address of that
389
00:20:57,901 --> 00:21:00,371
variable n in the computer's memory.
390
00:21:00,371 --> 00:21:06,001
And it's going to give you a number, otherwise known as the address of that.
391
00:21:06,001 --> 00:21:09,781
If you want to store that address in a variable
392
00:21:09,781 --> 00:21:15,841
even though yes, it's a number like 0x123, you have to tell C in advance
393
00:21:15,841 --> 00:21:21,721
that you want to store not an int per se, but the address of an int.
394
00:21:21,721 --> 00:21:25,351
And the syntax for doing that-- somewhat nonobviously-- is
395
00:21:25,351 --> 00:21:29,071
to use an asterisk here, a star operator, and you
396
00:21:29,071 --> 00:21:30,871
say this when creating the variable.
397
00:21:30,871 --> 00:21:35,371
If you want p to be a pointer, that is the address of some other variable,
398
00:21:35,371 --> 00:21:37,051
you do int star p.
399
00:21:37,051 --> 00:21:41,191
And the star just tells the computer, this is not an integer per se,
400
00:21:41,191 --> 00:21:44,641
this is the address of something that yes, is an int,
401
00:21:44,641 --> 00:21:46,401
but we're just being more precise.
402
00:21:46,401 --> 00:21:49,301
So on the right hand side you have the address of operator.
403
00:21:49,301 --> 00:21:52,281
As always with the equal sign, you copy from right to left.
404
00:21:52,281 --> 00:21:56,231
Because &n is by definition the address of something you have to store it
405
00:21:56,231 --> 00:22:01,781
in a pointer, and the way to declare a pointer is to specify the type of value
406
00:22:01,781 --> 00:22:05,831
whose address you're storing, and then use the star to indicate that this is
407
00:22:05,831 --> 00:22:09,341
indeed a pointer and not just a regular old int.
408
00:22:09,341 --> 00:22:10,811
So let's see this in practice.
409
00:22:10,811 --> 00:22:13,871
Let me go back to my own source code here and let
410
00:22:13,871 --> 00:22:15,881
me make just a couple of tweaks.
411
00:22:15,881 --> 00:22:18,221
I'm going to leave n alone here but I'm going
412
00:22:18,221 --> 00:22:22,761
to go ahead and initially just do this.
413
00:22:22,761 --> 00:22:27,341
Let me say int star p equals ampersand n,
414
00:22:27,341 --> 00:22:31,961
and then down here, I'm going to print out not n this time, but p--
415
00:22:31,961 --> 00:22:33,401
the variable p.
416
00:22:33,401 --> 00:22:38,171
And then even though yes, it's just a number and therefore I could use %i
417
00:22:38,171 --> 00:22:42,311
for integers, there's actually a special format code in printf for printing
418
00:22:42,311 --> 00:22:45,521
pointers or addresses, and that's %p.
419
00:22:45,521 --> 00:22:48,821
So now let's go ahead and recompile this, make address--
420
00:22:48,821 --> 00:22:53,871
so far so good-- ./address, Enter, and a little weirdly,
421
00:22:53,871 --> 00:22:58,511
but perhaps understandably now, the address in my computer's memory
422
00:22:58,511 --> 00:23:02,381
at which the variable n happened to be stored was not quite as simple
423
00:23:02,381 --> 00:23:03,881
as 0x123.
424
00:23:03,881 --> 00:23:06,431
This computer has a lot more memory so technically,
425
00:23:06,431 --> 00:23:12,491
it was stored at 0x7FFCB4578E5C.
426
00:23:12,491 --> 00:23:14,651
Now that has no special significance to me.
427
00:23:14,651 --> 00:23:16,881
It could have ended up somewhere else altogether,
428
00:23:16,881 --> 00:23:20,381
but this is just where, in my computer-- or technically the cloud
429
00:23:20,381 --> 00:23:22,901
server to which I'm connected using VS Code here--
430
00:23:22,901 --> 00:23:25,498
that just happens to be where n ended up.
431
00:23:25,498 --> 00:23:28,331
And strictly speaking, I don't even need to introduce this variable.
432
00:23:28,331 --> 00:23:31,181
I could get rid of p and I could just say
433
00:23:31,181 --> 00:23:34,901
print not just n, but the address of n and achieve the same thing.
434
00:23:34,901 --> 00:23:37,361
You don't need to temporarily store it in a variable.
435
00:23:37,361 --> 00:23:40,341
Let me just do make address again, ./address,
436
00:23:40,341 --> 00:23:42,921
and now I see this address here.
437
00:23:42,921 --> 00:23:46,466
And notice if I keep running the program, it's actually moving around.
438
00:23:46,466 --> 00:23:49,091
There's other stuff presumably going on inside of the computer.
439
00:23:49,091 --> 00:23:52,501
Maybe it's actually randomizing it so it's not always at the same location.
440
00:23:52,501 --> 00:23:55,001
That can actually be a security feature underneath the hood,
441
00:23:55,001 --> 00:24:00,521
but this happens to be at that moment in time where that value is in memory,
442
00:24:00,521 --> 00:24:03,491
quite like our picture a moment ago.
443
00:24:03,491 --> 00:24:06,641
All right, so let me pause here to see if there's now
444
00:24:06,641 --> 00:24:08,171
any questions on what we just did.
445
00:24:08,171 --> 00:24:10,171
Yeah?
446
00:24:10,171 --> 00:24:12,391
AUDIENCE: Is there any way to control where
447
00:24:12,391 --> 00:24:15,551
you are storing something in memory?
448
00:24:15,551 --> 00:24:18,746
Does it even matter if it works, or does it just
449
00:24:18,746 --> 00:24:21,271
matter that you could go in and locate where something is?
450
00:24:21,271 --> 00:24:22,813
DAVID J. MALAN: Really good question.
451
00:24:22,813 --> 00:24:25,381
Is there any way to control where something is in memory?
452
00:24:25,381 --> 00:24:28,338
Short answer is yes, and this is both the power in the danger of C,
453
00:24:28,338 --> 00:24:31,171
and we're going to do this today and make a few deliberate mistakes,
454
00:24:31,171 --> 00:24:36,241
because with this power of going to or getting the address of any variable,
455
00:24:36,241 --> 00:24:38,341
I could just arbitrarily right now write code
456
00:24:38,341 --> 00:24:42,611
that stores a value at byte 2 billion, or zero, or anything in between.
457
00:24:42,611 --> 00:24:46,771
But that also means potentially, I could start creepily looking
458
00:24:46,771 --> 00:24:50,831
around at all of the computer's memory, even at things that I didn't put there.
459
00:24:50,831 --> 00:24:53,371
Maybe other programs, maybe other parts of programs
460
00:24:53,371 --> 00:24:55,621
and indeed, this is a potential security threat,
461
00:24:55,621 --> 00:24:57,984
if suddenly you're able to just look anywhere
462
00:24:57,984 --> 00:24:59,401
you want in the computer's memory.
463
00:24:59,401 --> 00:25:04,021
Now, I'm overselling it a little bit because nowadays, in this decade,
464
00:25:04,021 --> 00:25:06,571
there are some defenses in place in compilers
465
00:25:06,571 --> 00:25:09,941
and in our operating systems that do hedge against this a little bit.
466
00:25:09,941 --> 00:25:12,391
But this is still a very frequent source of problems,
467
00:25:12,391 --> 00:25:14,791
and later today we'll talk briefly about things
468
00:25:14,791 --> 00:25:17,651
called stack overflow, which is not just a website,
469
00:25:17,651 --> 00:25:19,831
it is a problem that you can encounter.
470
00:25:19,831 --> 00:25:22,351
Heap overflow, and more generally buffer overflows--
471
00:25:22,351 --> 00:25:25,801
there's just so many things that can go wrong using this language called C,
472
00:25:25,801 --> 00:25:29,401
and if any of you have encountered a segmentation fault yet?
473
00:25:29,401 --> 00:25:31,321
I think we saw a few hands for that already.
474
00:25:31,321 --> 00:25:33,901
You touched memory that you shouldn't have
475
00:25:33,901 --> 00:25:38,611
and odds are you did it most recently by going too far in an array.
476
00:25:38,611 --> 00:25:42,001
Going to the left, or negative in an array, or somehow looking at memory
477
00:25:42,001 --> 00:25:42,841
you shouldn't have.
478
00:25:42,841 --> 00:25:47,051
And we'll explain today why it is you were able to do that.
479
00:25:47,051 --> 00:25:49,531
Other questions on these primitives so far?
480
00:25:49,531 --> 00:25:51,623
Yeah, from Carter?
481
00:25:51,623 --> 00:25:54,748
AUDIENCE: [INAUDIBLE] pointer star p, but then we used p later in the code.
482
00:25:54,748 --> 00:25:56,031
Is it called star p or p?
483
00:25:56,031 --> 00:25:57,281
DAVID J. MALAN: Good question.
484
00:25:57,281 --> 00:25:58,571
Earlier, we used star p.
485
00:25:58,571 --> 00:26:01,061
Let me rewind in time to the previous version of this code,
486
00:26:01,061 --> 00:26:03,341
where I actually had a variable called p.
487
00:26:03,341 --> 00:26:07,151
Just like with variable declarations in the past,
488
00:26:07,151 --> 00:26:12,621
once you've declared a variable to be an int, a char, a bool, or an int
489
00:26:12,621 --> 00:26:15,761
star, a.k.a. a pointer, you don't thereafter
490
00:26:15,761 --> 00:26:18,671
keep using the word int or now, the star.
491
00:26:18,671 --> 00:26:20,471
Once you've declared it, that's it.
492
00:26:20,471 --> 00:26:21,921
You only refer to it by name.
493
00:26:21,921 --> 00:26:26,111
And so it's very deliberate what I did here,
494
00:26:26,111 --> 00:26:28,661
saying that the type here is int star--
495
00:26:28,661 --> 00:26:30,671
that is a pointer to an int--
496
00:26:30,671 --> 00:26:33,611
but here I just said the name of the variable, as always.
497
00:26:33,611 --> 00:26:36,311
I didn't repeat int, and I also didn't repeat star.
498
00:26:36,311 --> 00:26:39,191
But at the risk of bending one's minds a little bit there
499
00:26:39,191 --> 00:26:45,441
is unfortunately one other use for the star operator, and that's as follows.
500
00:26:45,441 --> 00:26:49,181
If you want to print out not the address of something,
501
00:26:49,181 --> 00:26:54,261
but what is at a specific address, you can actually do this.
502
00:26:54,261 --> 00:26:59,621
If I want to print out the integer via %i, that is at that address,
503
00:26:59,621 --> 00:27:04,061
I can actually use the star here, which technically contradicts what I just
504
00:27:04,061 --> 00:27:07,161
said but it has a different function here-- a different purpose.
505
00:27:07,161 --> 00:27:09,561
So let me go ahead and do this in two different ways.
506
00:27:09,561 --> 00:27:11,366
I'm going to leave this line of code as is,
507
00:27:11,366 --> 00:27:13,241
but I'm going to add another line of code now
508
00:27:13,241 --> 00:27:17,201
that prints out what apparently will be an integer, in a moment.
509
00:27:17,201 --> 00:27:21,124
So %i backslash n, and I could see-- and let me just do n for now.
510
00:27:21,124 --> 00:27:23,291
So there's really nothing special happening now, I'm
511
00:27:23,291 --> 00:27:25,301
just adding a sort of mindless printing of n.
512
00:27:25,301 --> 00:27:28,041
So make address, ./address--
513
00:27:28,041 --> 00:27:31,601
there's the current address of n and there's the value of n.
514
00:27:31,601 --> 00:27:34,571
But what's kind of cool about C here, too,
515
00:27:34,571 --> 00:27:38,861
is if you know that a value is at a specific address like p,
516
00:27:38,861 --> 00:27:42,591
there's one other use for this star operator, the asterisk.
517
00:27:42,591 --> 00:27:46,221
You can use it as the so-called dereference operator,
518
00:27:46,221 --> 00:27:49,071
which means go to that address.
519
00:27:49,071 --> 00:27:54,701
And so here what we actually have is an example of a pointer p,
520
00:27:54,701 --> 00:27:59,631
which is an address like 0x123 or 0x7FF and so forth.
521
00:27:59,631 --> 00:28:03,191
But if you say star p now, you're not redeclaring the variable
522
00:28:03,191 --> 00:28:04,631
because I didn't mention int--
523
00:28:04,631 --> 00:28:07,391
you're going to that address in p.
524
00:28:07,391 --> 00:28:09,071
So let me recompile this now.
525
00:28:09,071 --> 00:28:15,191
Make address, ./address, and just to be clear--
526
00:28:15,191 --> 00:28:16,721
what should I see?
527
00:28:16,721 --> 00:28:20,231
I'm first going to see the pointer itself, 0x something.
528
00:28:20,231 --> 00:28:23,096
What's the second line of output I should presumably see now?
529
00:28:25,801 --> 00:28:27,591
Shout a little louder.
530
00:28:27,591 --> 00:28:31,911
So I'm hearing 50, and that's true because if you figure out the address
531
00:28:31,911 --> 00:28:38,151
of n and print it in line seven, but then go to the address of n, a.k.a. p,
532
00:28:38,151 --> 00:28:41,331
that's indeed going to just show you the number n--
533
00:28:41,331 --> 00:28:44,121
the value of n again.
534
00:28:44,121 --> 00:28:47,028
All right, any questions now on this syntax-- and I will concede,
535
00:28:47,028 --> 00:28:48,861
I think this is confusing-- the fact that we
536
00:28:48,861 --> 00:28:51,051
use the star for multiplication, the fact
537
00:28:51,051 --> 00:28:53,361
that we use the star to declare a pointer,
538
00:28:53,361 --> 00:28:56,601
but then we use a star in a third way to dereference the pointer
539
00:28:56,601 --> 00:28:57,651
and go to the pointer.
540
00:28:57,651 --> 00:29:01,251
It's just too confusing, honestly, but with practice comes comfort.
541
00:29:01,251 --> 00:29:02,681
Yeah.
542
00:29:02,681 --> 00:29:12,501
AUDIENCE: [INAUDIBLE]
543
00:29:12,501 --> 00:29:13,751
DAVID J. MALAN: Good question.
544
00:29:13,751 --> 00:29:17,321
Do you-- when you are using the ampersand operator
545
00:29:17,321 --> 00:29:19,271
to get the address of something, the onus
546
00:29:19,271 --> 00:29:23,411
is on you at the moment to know what you are getting the address of.
547
00:29:23,411 --> 00:29:24,341
Is it a string?
548
00:29:24,341 --> 00:29:25,181
Is it a char?
549
00:29:25,181 --> 00:29:25,901
Is it a bool?
550
00:29:25,901 --> 00:29:26,681
Is it an int?
551
00:29:26,681 --> 00:29:30,041
I wrote this code so I know in line six that I'm
552
00:29:30,041 --> 00:29:33,131
trying to get the address of what is an integer.
553
00:29:33,131 --> 00:29:35,271
AUDIENCE: What about line eight?
554
00:29:35,271 --> 00:29:38,991
DAVID J. MALAN: In line eight you don't have
555
00:29:38,991 --> 00:29:40,821
to worry about that-- good question.
556
00:29:40,821 --> 00:29:44,851
Notice in line eight, I didn't tell the computer, other than the %i,
557
00:29:44,851 --> 00:29:49,551
what kind of address I'm going to, but I did already in line six.
558
00:29:49,551 --> 00:29:52,581
I told the compiler that p, now and forever,
559
00:29:52,581 --> 00:29:55,041
is going to be the address of an int.
560
00:29:55,041 --> 00:29:59,961
That's enough information in advance so that printf, or really the language C,
561
00:29:59,961 --> 00:30:03,951
still knows on line eight that p is a pointer to an int,
562
00:30:03,951 --> 00:30:07,371
and that way it will print out all four bytes at that address,
563
00:30:07,371 --> 00:30:11,288
not just part of it, and not more than those four bytes.
564
00:30:11,288 --> 00:30:11,871
Good question.
565
00:30:11,871 --> 00:30:13,801
Yeah, next to you.
566
00:30:13,801 --> 00:30:15,301
AUDIENCE: Do pointers have pointers?
567
00:30:15,301 --> 00:30:16,601
DAVID J. MALAN: Do pointers have pointers?
568
00:30:16,601 --> 00:30:17,101
Yes.
569
00:30:17,101 --> 00:30:20,731
We won't do this today by having pointers to pointers,
570
00:30:20,731 --> 00:30:24,421
but yes, you can use star star, and then things get--
571
00:30:24,421 --> 00:30:26,311
I'm sorry.
572
00:30:26,311 --> 00:30:28,501
We won't do that today and we won't do that often.
573
00:30:28,501 --> 00:30:31,051
In fact Python, another language, is just a couple of weeks
574
00:30:31,051 --> 00:30:32,221
away, so hang in there.
575
00:30:32,221 --> 00:30:32,921
Almost there.
576
00:30:32,921 --> 00:30:34,561
A question back here?
577
00:30:34,561 --> 00:30:36,331
Was there?
578
00:30:36,331 --> 00:30:38,191
That was-- more verbal feedback like that
579
00:30:38,191 --> 00:30:40,871
is helpful as we forge into the more complicated stuff.
580
00:30:40,871 --> 00:30:41,551
Other questions?
581
00:30:41,551 --> 00:30:42,909
Yeah.
582
00:30:42,909 --> 00:30:44,785
AUDIENCE: What's the point of [INAUDIBLE]??
583
00:30:48,071 --> 00:30:51,161
DAVID J. MALAN: What's the point of printing the address?
584
00:30:51,161 --> 00:30:54,451
AUDIENCE: Like, using the address to [INAUDIBLE]..
585
00:30:54,451 --> 00:30:55,381
DAVID J. MALAN: Sure.
586
00:30:55,381 --> 00:30:56,521
What's the point of doing this?
587
00:30:56,521 --> 00:30:58,771
If you don't mind, let me-- let's get there in a moment.
588
00:30:58,771 --> 00:31:01,471
This is not the common use case, just printing out the address--
589
00:31:01,471 --> 00:31:02,821
who really cares?
590
00:31:02,821 --> 00:31:05,401
At the moment we care only for the sake of discussion.
591
00:31:05,401 --> 00:31:07,453
We're soon going to start using these addresses.
592
00:31:07,453 --> 00:31:09,661
So hang in there just a little bit for that one, too,
593
00:31:09,661 --> 00:31:13,621
but it will solve some problems for us before long.
594
00:31:13,621 --> 00:31:17,311
So let's actually just now depict what was going on inside of the computer's
595
00:31:17,311 --> 00:31:19,691
memory just a moment ago.
596
00:31:19,691 --> 00:31:23,971
So if I toggle back here, let me redraw my computer's memory,
597
00:31:23,971 --> 00:31:27,421
now let me plop into the memory n, which is storing in this program
598
00:31:27,421 --> 00:31:28,471
the number 50.
599
00:31:28,471 --> 00:31:30,631
Where is p in my computer's memory?
600
00:31:30,631 --> 00:31:33,691
Specifically, I don't know and apparently it moves around each time I
601
00:31:33,691 --> 00:31:35,741
run the program so for the sake of discussion,
602
00:31:35,741 --> 00:31:40,711
let's just propose that if 50 ended up at address 0x123, I don't know--
603
00:31:40,711 --> 00:31:43,471
p ends up over here, at address--
604
00:31:43,471 --> 00:31:46,661
whoops-- at whatever address this is here.
605
00:31:46,661 --> 00:31:49,111
But notice a couple of curiosities now.
606
00:31:49,111 --> 00:31:52,621
If p is a pointer, it's the address of something.
607
00:31:52,621 --> 00:31:57,961
So the value in p should be an address, and I've indeed written it as such--
608
00:31:57,961 --> 00:32:02,071
0x123, and technically there's not an x there, there's not a zero there,
609
00:32:02,071 --> 00:32:04,471
there's not even a 123 there per se-- there's
610
00:32:04,471 --> 00:32:08,011
a pattern of bits that represents the address 0x123.
611
00:32:08,011 --> 00:32:11,681
But again, that's weak zero-- don't care about binary day-to-day.
612
00:32:11,681 --> 00:32:17,761
So if this is p, and this I claimed was n, why is p so much bigger?
613
00:32:17,761 --> 00:32:20,231
Can someone conjecture here?
614
00:32:20,231 --> 00:32:25,061
Because it turns out whether n is an int or a char or a bool,
615
00:32:25,061 --> 00:32:27,701
which are different types-- heck, even a long--
616
00:32:27,701 --> 00:32:31,871
it turns out that p is always going to take up eight squares on the board,
617
00:32:31,871 --> 00:32:33,951
but why might that be?
618
00:32:33,951 --> 00:32:35,261
What might explain that?
619
00:32:39,591 --> 00:32:41,507
Yeah, thoughts?
620
00:32:41,507 --> 00:32:45,451
AUDIENCE: Perhaps it allocates eight bytes,
621
00:32:45,451 --> 00:32:48,959
but it doesn't know the type of the data [INAUDIBLE]..
622
00:32:48,959 --> 00:32:50,001
DAVID J. MALAN: OK, fair.
623
00:32:50,001 --> 00:32:52,191
Maybe it's allocating eight bytes because it doesn't know the type.
624
00:32:52,191 --> 00:32:54,711
Turns out that's OK because an address is an address.
625
00:32:54,711 --> 00:32:58,281
It's really up to the programmer to use it as a string or a char or a bool.
626
00:32:58,281 --> 00:33:00,381
Other thoughts?
627
00:33:00,381 --> 00:33:05,443
AUDIENCE: Maybe the first four for the actual number and the last four
628
00:33:05,443 --> 00:33:11,033
is some null that [INAUDIBLE] where the pointer ends.
629
00:33:11,033 --> 00:33:12,241
DAVID J. MALAN: OK, possibly.
630
00:33:12,241 --> 00:33:15,211
It could be that pointers have some complexity like a backslash n
631
00:33:15,211 --> 00:33:18,091
or something curious like that, like we talked about for strings.
632
00:33:18,091 --> 00:33:19,751
Turns out that's not the case.
633
00:33:19,751 --> 00:33:23,281
It turns out that pointers nowadays typically are, but not
634
00:33:23,281 --> 00:33:25,921
always are eight bytes, a.k.a.
635
00:33:25,921 --> 00:33:29,101
64 bits, because you and I-- our Macs, our PCs,
636
00:33:29,101 --> 00:33:32,911
heck-- even our phones have a lot more memory than they did years ago.
637
00:33:32,911 --> 00:33:34,801
Back in the day, a pointer might have only
638
00:33:34,801 --> 00:33:38,701
been 32 bits, or even only eight bits way back in the day.
639
00:33:38,701 --> 00:33:41,551
It's considered 32 bits, because that was the norm for some time.
640
00:33:41,551 --> 00:33:45,091
How high can you count, roughly, if you've got 32 bits?
641
00:33:45,091 --> 00:33:47,901
What's the number we keep rattling off?
642
00:33:47,901 --> 00:33:53,061
32 bits is roughly 2 to the 32, so it's 4 billion,
643
00:33:53,061 --> 00:33:57,271
and I keep saying it's 2 billion if you do negative, but in the world of memory
644
00:33:57,271 --> 00:34:00,531
there's a reason I keep saying 2 billion bytes, two gigabytes,
645
00:34:00,531 --> 00:34:03,591
because for a very long time that was the maximum amount of memory
646
00:34:03,591 --> 00:34:04,621
a computer could have.
647
00:34:04,621 --> 00:34:05,121
Why?
648
00:34:05,121 --> 00:34:07,491
Because the pointers that the computers were using
649
00:34:07,491 --> 00:34:09,531
were only, for instance, 32 bits.
650
00:34:09,531 --> 00:34:12,591
And with 32 bits, depending on whether you allow for negatives or not,
651
00:34:12,591 --> 00:34:15,621
you can count as high as 2 billion, roughly, or maybe 4 billion
652
00:34:15,621 --> 00:34:17,961
but you know what-- your Mac, your PC, your phone
653
00:34:17,961 --> 00:34:22,441
could not have had five gigabytes of memory, or 5 billion bytes of memory.
654
00:34:22,441 --> 00:34:25,191
You certainly couldn't have had what computers nowadays come with,
655
00:34:25,191 --> 00:34:27,171
which might be 8 gigabytes of memory--
656
00:34:27,171 --> 00:34:28,561
16 gigabytes of memory.
657
00:34:28,561 --> 00:34:29,211
Why?
658
00:34:29,211 --> 00:34:33,501
Because with 4 bytes, or 32 bits, you literally, physically,
659
00:34:33,501 --> 00:34:37,611
can't count that high, which means if I drew a picture of all of the memory we
660
00:34:37,611 --> 00:34:41,301
would run out of numbers to describe them, which means most of my memory
661
00:34:41,301 --> 00:34:42,631
would just be unusable.
662
00:34:42,631 --> 00:34:45,771
So pointers nowadays are 64 bits, or eight bytes.
663
00:34:45,771 --> 00:34:46,521
That's really big.
664
00:34:46,521 --> 00:34:48,438
I can't even pronounce how big that number is,
665
00:34:48,438 --> 00:34:51,051
but it's plenty for the next many years, and so
666
00:34:51,051 --> 00:34:52,881
we've drawn it that way on the board here.
667
00:34:52,881 --> 00:34:54,501
Now let's just abstract this away.
668
00:34:54,501 --> 00:34:56,209
Let's get rid of all the other bytes that
669
00:34:56,209 --> 00:34:58,911
are storing something or nothing else, and let's now
670
00:34:58,911 --> 00:35:02,241
start to abstract away this complexity because the reality is,
671
00:35:02,241 --> 00:35:04,131
to your question earlier--
672
00:35:04,131 --> 00:35:06,441
what is this useful for, or what do we-- do we actually
673
00:35:06,441 --> 00:35:07,971
care about these addresses?
674
00:35:07,971 --> 00:35:08,961
Generally, no.
675
00:35:08,961 --> 00:35:11,061
We're doing this so that you see there's no magic.
676
00:35:11,061 --> 00:35:13,951
We're just moving things around and poking around in memory.
677
00:35:13,951 --> 00:35:16,791
But what a person would typically do when talking about pointers
678
00:35:16,791 --> 00:35:19,401
would literally be to just point at something.
679
00:35:19,401 --> 00:35:21,951
I really don't care what address n is at,
680
00:35:21,951 --> 00:35:25,131
so it suffices when general, when drawing pictures on a whiteboard,
681
00:35:25,131 --> 00:35:27,021
having a discussion with another programmer,
682
00:35:27,021 --> 00:35:31,341
you just draw an arrow from the pointer to the value in question,
683
00:35:31,341 --> 00:35:36,470
because neither you nor I probably care about the specifics of 0x whatever.
684
00:35:36,470 --> 00:35:39,813
There's your pointer-- it's literally an arrow, and we can see this.
685
00:35:39,813 --> 00:35:42,021
So it turns out that these pointers, these addresses,
686
00:35:42,021 --> 00:35:45,831
are not that dissimilar to what we've done for hundreds of years
687
00:35:45,831 --> 00:35:48,381
in the form of a postal system.
688
00:35:48,381 --> 00:35:50,121
For instance, here is a post office--
689
00:35:50,121 --> 00:35:52,731
here, no-- here is a mailbox, and suppose
690
00:35:52,731 --> 00:35:55,431
that this is a mailbox labeled p.
691
00:35:55,431 --> 00:35:58,191
It's a pointer, and suppose there's another mailbox
692
00:35:58,191 --> 00:36:02,041
way over there, which is just another bite of my computer's memory.
693
00:36:02,041 --> 00:36:03,831
What are we really talking about?
694
00:36:03,831 --> 00:36:07,881
Well, you store in a computer's memory values like the number 50,
695
00:36:07,881 --> 00:36:11,841
or the word "hi" inside of your computer's memory at some location.
696
00:36:11,841 --> 00:36:15,921
But today we can also use those same memory locations
697
00:36:15,921 --> 00:36:17,551
to store the address of things.
698
00:36:17,551 --> 00:36:21,351
For instance, if I open this up here and I
699
00:36:21,351 --> 00:36:25,071
see OK, the value inside of this mailbox is not a number like 50,
700
00:36:25,071 --> 00:36:26,361
it's actually an address--
701
00:36:26,361 --> 00:36:30,861
0x123-- that's like a pointer, a breadcrumb leading
702
00:36:30,861 --> 00:36:32,661
from one location in memory to another.
703
00:36:32,661 --> 00:36:35,161
And in fact, would someone who's seated roughly over there--
704
00:36:35,161 --> 00:36:37,761
do you mind getting the mail over there?
705
00:36:37,761 --> 00:36:40,581
Any volunteers over in this section?
706
00:36:40,581 --> 00:36:42,931
Just need you to get to the mailbox before I do.
707
00:36:42,931 --> 00:36:44,781
Who's being volunteered?
708
00:36:44,781 --> 00:36:45,471
Oh yes, please.
709
00:36:45,471 --> 00:36:50,926
Whoever is gesturing most wildly, come on down.
710
00:36:50,926 --> 00:36:51,426
Sure.
711
00:36:57,861 --> 00:36:59,315
What's your name?
712
00:36:59,315 --> 00:37:00,078
AUDIENCE: Anfoo.
713
00:37:00,078 --> 00:37:01,161
DAVID J. MALAN: Say again?
714
00:37:01,161 --> 00:37:01,851
AUDIENCE: Anfoo.
715
00:37:01,851 --> 00:37:03,201
DAVID J. MALAN: Anfoo?
716
00:37:03,201 --> 00:37:06,081
OK, come on up to the edge of the stage there and just to be clear--
717
00:37:06,081 --> 00:37:09,801
if this is p, that is apparently n, but to make clear
718
00:37:09,801 --> 00:37:12,621
what we're talking about when we're storing 0x whatever values--
719
00:37:12,621 --> 00:37:15,771
like 0x123, that's essentially equivalent to my
720
00:37:15,771 --> 00:37:18,501
maybe pulling out something like this and just
721
00:37:18,501 --> 00:37:21,051
abstractly pointing to your mailbox there,
722
00:37:21,051 --> 00:37:25,311
or if you prefer, pointing to the mailbox--
723
00:37:25,311 --> 00:37:26,271
OK, all right.
724
00:37:28,951 --> 00:37:29,451
Thank you.
725
00:37:29,451 --> 00:37:29,951
All right.
726
00:37:32,661 --> 00:37:34,821
This is akin to me pointing at your mailbox,
727
00:37:34,821 --> 00:37:36,863
and if you want to go ahead and open your mailbox
728
00:37:36,863 --> 00:37:43,201
and reveal to the crowd what's inside your mailbox labeled n.
729
00:37:43,201 --> 00:37:43,981
All right.
730
00:37:46,501 --> 00:37:48,601
Thank you.
731
00:37:48,601 --> 00:37:51,221
We have a little CS50 stress ball for your trouble.
732
00:37:51,221 --> 00:37:52,553
Thank you for coming up.
733
00:37:52,553 --> 00:37:55,261
So that's just to put a visual on what it is we're talking about,
734
00:37:55,261 --> 00:37:58,171
because it can get very abstract, very cryptic quickly when we're
735
00:37:58,171 --> 00:38:01,391
talking about addresses and memory and drawing it like these little squares.
736
00:38:01,391 --> 00:38:04,308
But if you think about just walking into a post office or an apartment
737
00:38:04,308 --> 00:38:07,261
complex that's got a lot of mailboxes, those mailboxes
738
00:38:07,261 --> 00:38:10,231
essentially are a big chunk of memory and each
739
00:38:10,231 --> 00:38:12,091
of those mailboxes has an address--
740
00:38:12,091 --> 00:38:14,821
this is apartment one, two, three-- apartment 2 billion.
741
00:38:14,821 --> 00:38:18,091
And inside of those mailboxes can go anything
742
00:38:18,091 --> 00:38:20,261
that can be represented as information.
743
00:38:20,261 --> 00:38:23,341
It could be a number like n, or 50, or if you
744
00:38:23,341 --> 00:38:25,741
prefer it could be a number that represents
745
00:38:25,741 --> 00:38:27,631
the address of another mailbox.
746
00:38:27,631 --> 00:38:30,811
And this is akin, really, if you've ever had an apartment or you
747
00:38:30,811 --> 00:38:33,631
and your parents have moved, to having a forwarding address.
748
00:38:33,631 --> 00:38:36,001
It's like having the Post Office in the US
749
00:38:36,001 --> 00:38:39,481
put some kind of piece of paper in your old mailbox saying,
750
00:38:39,481 --> 00:38:41,911
actually forward it to that other mailbox.
751
00:38:41,911 --> 00:38:44,281
That really is all a pointer is doing.
752
00:38:44,281 --> 00:38:45,991
At the end of the day, it's just a number
753
00:38:45,991 --> 00:38:48,331
but it's a number being used in a different way
754
00:38:48,331 --> 00:38:50,461
and it's the syntax that we've introduced,
755
00:38:50,461 --> 00:38:54,271
not just int but int star, that tells the computer how
756
00:38:54,271 --> 00:38:58,741
to treat that number in this slightly different way.
757
00:38:58,741 --> 00:39:01,841
Are there any questions then, on this?
758
00:39:01,841 --> 00:39:03,962
Yeah, in back.
759
00:39:03,962 --> 00:39:06,379
AUDIENCE: If you had a variable, like int c, [INAUDIBLE]..
760
00:39:10,711 --> 00:39:12,691
DAVID J. MALAN: If I did int c and--
761
00:39:12,691 --> 00:39:14,841
say the code again?
762
00:39:14,841 --> 00:39:17,011
Once more?
763
00:39:17,011 --> 00:39:19,141
Equal to n, so let me actually type it out.
764
00:39:19,141 --> 00:39:21,271
If I give myself another line of code, tell me
765
00:39:21,271 --> 00:39:27,251
one last time what to type. int is equal to n, like this?
766
00:39:27,251 --> 00:39:31,951
So this is OK, and I can't draw it quite quickly enough on the board here,
767
00:39:31,951 --> 00:39:36,181
but this would be like creating another four bytes somewhere in memory, maybe
768
00:39:36,181 --> 00:39:40,231
down here, that stores an identical copy of 50
769
00:39:40,231 --> 00:39:43,381
because the assignment operator from right to left copies one value
770
00:39:43,381 --> 00:39:44,201
to another.
771
00:39:44,201 --> 00:39:47,671
So that would just add one more rectangle of size four
772
00:39:47,671 --> 00:39:50,391
to this particular picture.
773
00:39:50,391 --> 00:39:52,371
If I'm answering your question as intended.
774
00:39:52,371 --> 00:39:57,231
OK, so that is week one style use of assignment operators before pointers.
775
00:39:57,231 --> 00:40:00,051
I could, though, start copying pointers but again, we'll
776
00:40:00,051 --> 00:40:01,881
come back to some of that complexity.
777
00:40:01,881 --> 00:40:03,421
Any other questions here?
778
00:40:03,421 --> 00:40:04,921
AUDIENCE: That was a great question.
779
00:40:04,921 --> 00:40:06,841
Does the pointer point--
780
00:40:06,841 --> 00:40:10,084
does the same pointer point to the new replica as well?
781
00:40:10,084 --> 00:40:11,501
DAVID J. MALAN: Ah, good question.
782
00:40:11,501 --> 00:40:12,406
Short answer, no.
783
00:40:12,406 --> 00:40:17,101
And to repeat for the camera, if I create a second variable like this,
784
00:40:17,101 --> 00:40:21,271
int c equals n, and I claim without actually drawing it on the board
785
00:40:21,271 --> 00:40:25,191
that this gives me another rectangle, the value of which is also 50,
786
00:40:25,191 --> 00:40:26,681
p does not get touched.
787
00:40:26,681 --> 00:40:29,041
And this is what's important and really characteristic
788
00:40:29,041 --> 00:40:33,001
of C. Nothing happens automatically for you.
789
00:40:33,001 --> 00:40:36,581
p is not going to be updated unless you update p in some way,
790
00:40:36,581 --> 00:40:39,121
so creating a third variable called c-- even
791
00:40:39,121 --> 00:40:41,521
if you're copying its value from right to left,
792
00:40:41,521 --> 00:40:44,701
that has no effect on anything else in the program.
793
00:40:44,701 --> 00:40:46,031
A good question.
794
00:40:46,031 --> 00:40:52,201
So what have we seen that's perhaps now a little more explainable?
795
00:40:52,201 --> 00:40:56,221
Well, recall that we talked quite a bit last week about strings, and just
796
00:40:56,221 --> 00:41:02,101
to recap in layperson's terms, what is this string as you now understand it?
797
00:41:02,101 --> 00:41:04,191
So say-- well, let me take a specific hand here.
798
00:41:04,191 --> 00:41:05,091
What's a string?
799
00:41:05,091 --> 00:41:06,926
How about over here.
800
00:41:06,926 --> 00:41:08,301
AUDIENCE: An array of characters.
801
00:41:08,301 --> 00:41:08,811
DAVID J. MALAN: OK, sure.
802
00:41:08,811 --> 00:41:09,728
Both of you are right.
803
00:41:09,728 --> 00:41:10,971
An array of characters.
804
00:41:10,971 --> 00:41:13,761
An array of characters, and we--
805
00:41:13,761 --> 00:41:16,881
I claimed-- or revealed last week that string is not technically
806
00:41:16,881 --> 00:41:20,151
a feature built into C. It's not an official data type
807
00:41:20,151 --> 00:41:22,401
but every programmer in most any language
808
00:41:22,401 --> 00:41:25,641
refers to sequences of characters-- words, letters,
809
00:41:25,641 --> 00:41:27,451
paragraphs-- as strings.
810
00:41:27,451 --> 00:41:30,771
So the vernacular exists but the data type doesn't typically
811
00:41:30,771 --> 00:41:34,111
exist per se in C. So what we're about to do, if you will,
812
00:41:34,111 --> 00:41:36,951
for dramatic effect, is take off some training wheels today.
813
00:41:36,951 --> 00:41:41,451
The CS50 library implemented in the form of the header file CS50.8--
814
00:41:41,451 --> 00:41:43,581
we claim has had a bunch of things in it.
815
00:41:43,581 --> 00:41:46,761
Prototypes for GetString, prototypes for GetInt,
816
00:41:46,761 --> 00:41:49,281
and all of those other functions, but it turns out
817
00:41:49,281 --> 00:41:53,481
it also is what defines the word "string" in such a way
818
00:41:53,481 --> 00:41:55,981
that you all can use it these past several weeks.
819
00:41:55,981 --> 00:41:58,641
So let's take a look at an example of a string in use.
820
00:41:58,641 --> 00:42:00,681
Here, for instance, is a tiny bit of code
821
00:42:00,681 --> 00:42:05,421
that uses the word "string," creating a variable called s
822
00:42:05,421 --> 00:42:08,083
and then storing quote unquote, hi, exclamation point.
823
00:42:08,083 --> 00:42:10,791
Let's consider what this looks like now in the computer's memory.
824
00:42:10,791 --> 00:42:13,541
I don't care about all the other bytes, let's just focus on these,
825
00:42:13,541 --> 00:42:16,551
and this per last week is how "hi" might be stored.
826
00:42:16,551 --> 00:42:19,311
h-i exclamation point and then one more, as someone already
827
00:42:19,311 --> 00:42:23,151
observed, that sentinel value-- that null character which
828
00:42:23,151 --> 00:42:26,558
just means eight zero bits to demarcate the end of that string
829
00:42:26,558 --> 00:42:28,641
just in case there's something to the right of it,
830
00:42:28,641 --> 00:42:31,801
the computer can now distinguish one string from another.
831
00:42:31,801 --> 00:42:35,004
So last week we introduced this new syntax.
832
00:42:35,004 --> 00:42:36,921
Well, if strings are just arrays of characters
833
00:42:36,921 --> 00:42:39,831
you can then very cleverly use that square bracket notation
834
00:42:39,831 --> 00:42:44,631
and go to location zero or one or two, which are like addresses,
835
00:42:44,631 --> 00:42:46,431
but they're relative to the string.
836
00:42:46,431 --> 00:42:51,381
This could be at 0x123 or 0x456, but with this bracket notation
837
00:42:51,381 --> 00:42:54,381
zero is always the beginning of the string, one is the next,
838
00:42:54,381 --> 00:42:55,801
two is the next, and so forth.
839
00:42:55,801 --> 00:43:00,561
So that was our array syntax for indexing into an array.
840
00:43:00,561 --> 00:43:03,471
But technically speaking, we can go a little deeper today--
841
00:43:03,471 --> 00:43:09,741
technically speaking, if hi is starting at the address 0x123 then
842
00:43:09,741 --> 00:43:15,711
it stands to reason that i is at 0x124, exclamation point's at 0x125,
843
00:43:15,711 --> 00:43:18,711
and the null is that 0x126.
844
00:43:18,711 --> 00:43:23,331
Now, I don't care about 123 per se, but even though this is hexadecimal,
845
00:43:23,331 --> 00:43:24,591
this is correct math.
846
00:43:24,591 --> 00:43:28,101
Even in hex, if you just add one when you start at 0x123,
847
00:43:28,101 --> 00:43:30,456
the next number is four, five, six at the end.
848
00:43:30,456 --> 00:43:32,331
I don't have to worry about A's, B's, and C's
849
00:43:32,331 --> 00:43:35,341
because I'm not counting that high in this example.
850
00:43:35,341 --> 00:43:39,531
So if that's the case, and my computer is actually
851
00:43:39,531 --> 00:43:47,271
laying out the word hi in memory like that, well, what exactly is s?
852
00:43:47,271 --> 00:43:50,001
What exactly is s if, at the end of the day,
853
00:43:50,001 --> 00:43:56,031
H-I exclamation point null is storing-- or is or stored at these addresses?
854
00:43:56,031 --> 00:43:57,006
Where is s?
855
00:43:57,006 --> 00:43:58,881
Now that I've taken off those training wheels
856
00:43:58,881 --> 00:44:02,481
and showed you where H-I exclamation point null actually are,
857
00:44:02,481 --> 00:44:04,221
what happened to s?
858
00:44:04,221 --> 00:44:08,211
Well s, as always, is actually a variable.
859
00:44:08,211 --> 00:44:10,251
Even in the code I proposed a moment ago,
860
00:44:10,251 --> 00:44:13,551
s is apparently a data type that yes, doesn't come with C,
861
00:44:13,551 --> 00:44:16,101
but CS50's library makes it exist.
862
00:44:16,101 --> 00:44:21,471
s is a variable of type string, so where is s in this picture?
863
00:44:21,471 --> 00:44:25,431
Well, it turns out that s might be up here.
864
00:44:25,431 --> 00:44:28,971
Again, I'm just drawing it anywhere for the sake of discussion,
865
00:44:28,971 --> 00:44:33,141
but s is a variable per that line of code.
866
00:44:33,141 --> 00:44:36,978
What s is storing, apparently, I claim, is 0x123.
867
00:44:36,978 --> 00:44:40,311
I actually don't really care about these addresses, so let's abstract that away.
868
00:44:40,311 --> 00:44:45,591
s is apparently, as of now, today, one week later, just a pointer
869
00:44:45,591 --> 00:44:46,761
to a character.
870
00:44:46,761 --> 00:44:49,311
Specifically, the first character in s.
871
00:44:49,311 --> 00:44:51,411
And this is the last piece of the puzzle.
872
00:44:51,411 --> 00:44:54,981
Last week we had this clever way of demarcating the end of a string.
873
00:44:54,981 --> 00:44:59,901
Well, it turns out that strings are represented in the computer's memory
874
00:44:59,901 --> 00:45:03,861
as a variable that is a pointer, inside of which
875
00:45:03,861 --> 00:45:06,901
is the address of the first character in the string.
876
00:45:06,901 --> 00:45:09,951
So if s points at the first character and you
877
00:45:09,951 --> 00:45:12,501
can trust that backslash zero is at the end of the string,
878
00:45:12,501 --> 00:45:18,091
that's literally all you need to figure out where a string begins and ends.
879
00:45:18,091 --> 00:45:19,531
So what do I mean by this?
880
00:45:19,531 --> 00:45:21,141
Well, let's be a little more concrete.
881
00:45:21,141 --> 00:45:24,801
In terms of this picture, if I've started with this line of code here,
882
00:45:24,801 --> 00:45:29,961
it turns out all this time since week 1, that the word string has just
883
00:45:29,961 --> 00:45:36,871
semi-secretly been an alias for char star.
884
00:45:36,871 --> 00:45:39,391
I know, so char star.
885
00:45:39,391 --> 00:45:40,841
So why does this make sense?
886
00:45:40,841 --> 00:45:44,081
It's a little weird still, but if in our previous example
887
00:45:44,081 --> 00:45:47,671
we were able to store the address of an integer by declaring a variable
888
00:45:47,671 --> 00:45:49,831
called p, as int star p--
889
00:45:49,831 --> 00:45:52,681
well, if as of now strings are just the address
890
00:45:52,681 --> 00:45:58,111
of the first character in a string, then probably a string is just a char star
891
00:45:58,111 --> 00:46:01,861
because that means s is the address of a character, the very
892
00:46:01,861 --> 00:46:03,461
first character in the string.
893
00:46:03,461 --> 00:46:07,441
Now, the string might have three letters like it did, or four, or even a hundred
894
00:46:07,441 --> 00:46:09,571
if it's a long paragraph, but that's fine
895
00:46:09,571 --> 00:46:11,488
because you can trust that there's going to be
896
00:46:11,488 --> 00:46:13,181
that null character at the very end.
897
00:46:13,181 --> 00:46:16,921
So this is a general purpose way of representing strings
898
00:46:16,921 --> 00:46:20,041
using this new mechanism in C.
899
00:46:20,041 --> 00:46:23,221
So in fact, let me go ahead here and introduce maybe
900
00:46:23,221 --> 00:46:25,061
a couple of manipulations of this.
901
00:46:25,061 --> 00:46:28,831
Let me go back to my code here, and let's get rid of this integer stuff,
902
00:46:28,831 --> 00:46:32,381
and let's instead now do, for instance, this.
903
00:46:32,381 --> 00:46:37,383
Let me add in the CS50 library, so we'll include CS50.H for now.
904
00:46:37,383 --> 00:46:39,091
I'm going to go ahead and inside of main,
905
00:46:39,091 --> 00:46:41,971
give myself a string s equals hi exclamation point.
906
00:46:41,971 --> 00:46:43,621
I don't type the backslash zero.
907
00:46:43,621 --> 00:46:48,228
C does that for me automatically by using my double quotes like this.
908
00:46:48,228 --> 00:46:49,811
Now let me just go ahead and print it.
909
00:46:49,811 --> 00:46:52,981
So this again is week 1 style stuff where I'm just printing a string.
910
00:46:52,981 --> 00:46:54,611
No pointers yet.
911
00:46:54,611 --> 00:46:59,761
So let me do make address, Enter, ./address, and hopefully I see hi,
912
00:46:59,761 --> 00:47:01,391
so nothing new there.
913
00:47:01,391 --> 00:47:05,341
But let's start to peel back some of these layers here.
914
00:47:05,341 --> 00:47:09,361
Let me first of all, get rid of the CS50 library for a moment
915
00:47:09,361 --> 00:47:13,651
and let me change string to char star.
916
00:47:13,651 --> 00:47:15,901
And it's a little bit weird but yes, the convention
917
00:47:15,901 --> 00:47:19,899
is to say char, a space, then the star, and then immediately thereafter
918
00:47:19,899 --> 00:47:20,941
the name of the variable.
919
00:47:20,941 --> 00:47:23,691
Strictly speaking though, you might see textbooks or websites that
920
00:47:23,691 --> 00:47:26,671
do it like this or like this, but the canonical way
921
00:47:26,671 --> 00:47:28,451
is typically to do it like that.
922
00:47:28,451 --> 00:47:31,311
So now no more CS50 library, no more training wheels, if you will.
923
00:47:31,311 --> 00:47:33,821
I'm just treating strings for what they really are.
924
00:47:33,821 --> 00:47:37,021
Let me go ahead and do make address, Enter--
925
00:47:37,021 --> 00:47:39,181
so far so good-- ./address--
926
00:47:39,181 --> 00:47:40,651
and that, too, still works.
927
00:47:40,651 --> 00:47:44,851
So %s is a thing that comes with printf because the word string is programmer
928
00:47:44,851 --> 00:47:48,901
terminology but strictly speaking C doesn't have a string data type.
929
00:47:48,901 --> 00:47:53,221
It's always been char star, so what this means now is I
930
00:47:53,221 --> 00:47:56,761
can start to have some fun with these basic ideas,
931
00:47:56,761 --> 00:47:59,891
even though this is not purposeful other than for the sake of discussion.
932
00:47:59,891 --> 00:48:03,901
But if s is this-- let me go back and give myself the CS50 library.
933
00:48:03,901 --> 00:48:06,391
Let's put those training wheels back on for just a moment
934
00:48:06,391 --> 00:48:09,221
so that I can do one manipulation at a time.
935
00:48:09,221 --> 00:48:12,131
Here's my string s, as before.
936
00:48:12,131 --> 00:48:15,181
Well, let me go ahead and declare a char called c,
937
00:48:15,181 --> 00:48:20,221
and let me store the first character in the string there, which is
938
00:48:20,221 --> 00:48:22,891
s bracket zero, and that should give me h.
939
00:48:22,891 --> 00:48:25,951
And then just for kicks, let me go ahead and do char star--
940
00:48:25,951 --> 00:48:33,061
whoops-- let me go ahead and do char star p equals ampersand c,
941
00:48:33,061 --> 00:48:35,491
and see what this actually prints for me.
942
00:48:35,491 --> 00:48:38,861
Let me go ahead and print out what p is here.
943
00:48:38,861 --> 00:48:40,091
So we're just playing around.
944
00:48:40,091 --> 00:48:43,681
So make address-- so far so good-- ./address.
945
00:48:43,681 --> 00:48:46,021
All right, so what have I just done?
946
00:48:46,021 --> 00:48:51,151
I've just created a char c and stored in it the letter H, which
947
00:48:51,151 --> 00:48:55,531
is the same thing as s bracket I, then I'm saying, what's the address of c,
948
00:48:55,531 --> 00:48:58,391
and that's apparently 0x7FF whatever.
949
00:48:58,391 --> 00:48:59,641
So that's the address.
950
00:48:59,641 --> 00:49:01,841
But I technically didn't have to do that.
951
00:49:01,841 --> 00:49:03,641
Let me go ahead and do two things now.
952
00:49:03,641 --> 00:49:12,001
Instead of just printing p, let me go ahead and print out maybe s itself.
953
00:49:12,001 --> 00:49:14,461
Let me go ahead and do make address, Enter--
954
00:49:14,461 --> 00:49:17,611
so far so good-- ./address and--
955
00:49:17,611 --> 00:49:20,371
damn it, what did I do wrong.
956
00:49:20,371 --> 00:49:22,201
Oh shoot, I didn't want to do that.
957
00:49:22,201 --> 00:49:25,781
Oh, I really made a mess of this.
958
00:49:25,781 --> 00:49:28,561
What did I want to do here?
959
00:49:28,561 --> 00:49:31,831
That was supposed to be impressive but it was the opposite.
960
00:49:31,831 --> 00:49:35,321
So let me turn it around.
961
00:49:35,321 --> 00:49:39,181
So if I intended to do this, why are lines nine and 10
962
00:49:39,181 --> 00:49:41,461
printing different values?
963
00:49:41,461 --> 00:49:44,641
Didn't really intend to go here, but let me try to save this.
964
00:49:44,641 --> 00:49:51,991
Why are we seeing different addresses, namely this address 402004 for s,
965
00:49:51,991 --> 00:49:57,031
and then 0x7FF for p?
966
00:49:57,031 --> 00:49:57,991
Any thoughts?
967
00:49:57,991 --> 00:50:00,121
Yeah, over here.
968
00:50:00,121 --> 00:50:02,571
AUDIENCE: [INAUDIBLE] is the character c is
969
00:50:02,571 --> 00:50:07,471
its own sort of location of the [INAUDIBLE],,
970
00:50:07,471 --> 00:50:09,513
and it's taking off just the values [INAUDIBLE]..
971
00:50:09,513 --> 00:50:10,513
DAVID J. MALAN: Correct.
972
00:50:10,513 --> 00:50:12,684
So if I really wanted to weasel my way out of this,
973
00:50:12,684 --> 00:50:15,351
this is a great answer to the previous question which was about,
974
00:50:15,351 --> 00:50:20,091
what if I introduce another variable, c, that's a copy of the value,
975
00:50:20,091 --> 00:50:22,791
and not in this case an int, but an actual char.
976
00:50:22,791 --> 00:50:28,281
Here, I've made c be a copy of the character that's at the beginning of s,
977
00:50:28,281 --> 00:50:29,381
but that's indeed a copy.
978
00:50:29,381 --> 00:50:31,131
So if I were to draw it on the screen that
979
00:50:31,131 --> 00:50:35,271
would give me a different rectangle in which this copy of h
980
00:50:35,271 --> 00:50:36,681
would actually be stored.
981
00:50:36,681 --> 00:50:38,631
So I didn't intend to do this, but what you're
982
00:50:38,631 --> 00:50:40,618
seeing is yes, the address of s--
983
00:50:40,618 --> 00:50:42,951
and apparently that's at a pretty low address by default
984
00:50:42,951 --> 00:50:44,961
here-- then you're seeing the address of c.
985
00:50:44,961 --> 00:50:47,841
But even though each of them is h, I claim
986
00:50:47,841 --> 00:50:49,803
one is at a different address in memory.
987
00:50:49,803 --> 00:50:51,261
And this has always been happening.
988
00:50:51,261 --> 00:50:53,991
Any time you created one variable or another it was ending up here,
989
00:50:53,991 --> 00:50:55,908
or here, or here, or somewhere else in memory.
990
00:50:55,908 --> 00:50:58,911
Now for the first time all we're doing is actually just poking around
991
00:50:58,911 --> 00:51:02,371
the computer's memory to see what is actually there.
992
00:51:02,371 --> 00:51:06,021
So let me actually back this up a little bit
993
00:51:06,021 --> 00:51:09,391
and do what I intended to do here, which was something like this.
994
00:51:09,391 --> 00:51:13,551
So if string s equals quote unquote, hi, let's go ahead
995
00:51:13,551 --> 00:51:23,051
and give myself a pointer, called p, to the first character in s.
996
00:51:23,051 --> 00:51:26,891
All right, so now let me go ahead and print out the value of this pointer,
997
00:51:26,891 --> 00:51:29,034
%p, printing out p.
998
00:51:29,034 --> 00:51:30,951
So we're just going to do one thing at a time.
999
00:51:30,951 --> 00:51:33,761
So make address, Enter, ./address.
1000
00:51:33,761 --> 00:51:38,861
There, at the moment, is the address of the first character in s.
1001
00:51:38,861 --> 00:51:40,781
What I meant to do now, was this.
1002
00:51:40,781 --> 00:51:43,721
If I want to print out two things this time,
1003
00:51:43,721 --> 00:51:49,391
let me print out not only what p is, but also what s itself originally is.
1004
00:51:49,391 --> 00:51:53,411
Because if I claim that everyone from last week should be comfortable with
1005
00:51:53,411 --> 00:51:56,381
s bracket zero just representing the first character in s
1006
00:51:56,381 --> 00:51:59,621
by definition of strings being arrays of characters.
1007
00:51:59,621 --> 00:52:05,871
Then s, as of today, is itself the address of a character,
1008
00:52:05,871 --> 00:52:06,761
the first one in s.
1009
00:52:06,761 --> 00:52:10,721
So if I now do make address, and do ./address,
1010
00:52:10,721 --> 00:52:13,481
this time I see the same exact things.
1011
00:52:13,481 --> 00:52:14,081
Thank you.
1012
00:52:18,228 --> 00:52:20,811
This is really the lamest sort of thing to be applauding over,
1013
00:52:20,811 --> 00:52:26,571
but what we're demonstrating here is that s is by definition the address
1014
00:52:26,571 --> 00:52:28,261
of the first character in c.
1015
00:52:28,261 --> 00:52:30,931
So if we borrow some of our mental model from last week--
1016
00:52:30,931 --> 00:52:35,811
well, if s bracket zero is the first character in c, doing the ampersand on
1017
00:52:35,811 --> 00:52:38,351
that expression should be the same as s.
1018
00:52:38,351 --> 00:52:40,851
Now this isn't to say that we would jump through these hoops
1019
00:52:40,851 --> 00:52:45,051
all the time with this much syntax, but this is just to do proof by example
1020
00:52:45,051 --> 00:52:51,171
that s is in fact, as I claimed a moment ago, just the address of a character.
1021
00:52:51,171 --> 00:52:54,651
Not even multiple characters, it's the address of a single character,
1022
00:52:54,651 --> 00:52:58,581
but the key thing is it's the address of the first character in the string,
1023
00:52:58,581 --> 00:53:01,821
and per last week we trust that C is going
1024
00:53:01,821 --> 00:53:04,881
to look for that null character at the very end just
1025
00:53:04,881 --> 00:53:08,721
to make sure it knows where the string actually ends.
1026
00:53:08,721 --> 00:53:12,317
All right, a question came up over here.
1027
00:53:12,317 --> 00:53:25,581
AUDIENCE: [INAUDIBLE]
1028
00:53:25,581 --> 00:53:26,581
DAVID J. MALAN: Correct.
1029
00:53:26,581 --> 00:53:30,181
To summarize, on line eight, when I am using %p--
1030
00:53:30,181 --> 00:53:33,181
that just means print a pointer value, so 0x something--
1031
00:53:33,181 --> 00:53:35,581
I'm passing it s.
1032
00:53:35,581 --> 00:53:41,281
Previously, when we used %s, printf knew to print not just the first character
1033
00:53:41,281 --> 00:53:45,481
of s, but h, i, exclamation point, and then stop when it hits the backslash
1034
00:53:45,481 --> 00:53:46,621
zero.
1035
00:53:46,621 --> 00:53:51,841
p is different. %p tells the computer to go to that address--
1036
00:53:51,841 --> 00:53:56,711
sorry, tells the computer to print that address on the screen.
1037
00:53:56,711 --> 00:53:59,761
So this is where %s all this time has been powerful.
1038
00:53:59,761 --> 00:54:03,961
The reason printf worked in week 1 and 2 and 3
1039
00:54:03,961 --> 00:54:07,261
was because printf was designed by some human years ago
1040
00:54:07,261 --> 00:54:10,291
to go to the address that's being passed in-- for instance,
1041
00:54:10,291 --> 00:54:12,631
s-- and print out character after character
1042
00:54:12,631 --> 00:54:16,291
after character until it sees the null character backslash zero,
1043
00:54:16,291 --> 00:54:17,891
and then stop printing it.
1044
00:54:17,891 --> 00:54:21,481
So that's-- you're getting a lot of functionality for free from %s.
1045
00:54:21,481 --> 00:54:23,911
Today we're using something much simpler, %p,
1046
00:54:23,911 --> 00:54:27,211
which just literally prints what s is.
1047
00:54:27,211 --> 00:54:28,951
And the reason we don't do this in week 1
1048
00:54:28,951 --> 00:54:31,021
is just because this is like way too much
1049
00:54:31,021 --> 00:54:33,021
to be interesting when all you want to print out
1050
00:54:33,021 --> 00:54:34,541
is hi or hello, world, or the like.
1051
00:54:34,541 --> 00:54:36,511
But now what we're really doing is revealing
1052
00:54:36,511 --> 00:54:38,941
what's been going on this whole time.
1053
00:54:38,941 --> 00:54:40,678
And let me make one other example here.
1054
00:54:40,678 --> 00:54:42,511
Let me go ahead and get rid of this variable
1055
00:54:42,511 --> 00:54:45,901
here and let me just print out a few things to make the same point.
1056
00:54:45,901 --> 00:54:50,131
I'm going to print out not just s like I did here, but let's go ahead
1057
00:54:50,131 --> 00:54:51,181
and print out every--
1058
00:54:51,181 --> 00:54:53,071
the address of every character in s.
1059
00:54:53,071 --> 00:54:57,353
So let's get the first letter in s and get its address,
1060
00:54:57,353 --> 00:54:59,311
and I'm going to do copy paste for time's sake,
1061
00:54:59,311 --> 00:55:02,521
but not something I would do frequently.
1062
00:55:02,521 --> 00:55:06,034
So let me print out the address of the first character, the second character,
1063
00:55:06,034 --> 00:55:07,951
the third, and actually even the fourth, which
1064
00:55:07,951 --> 00:55:11,321
is the backslash zero, by doing this.
1065
00:55:11,321 --> 00:55:15,931
So when I compiled this program-- make address, ./address--
1066
00:55:15,931 --> 00:55:19,441
I should see two identical values and then
1067
00:55:19,441 --> 00:55:21,931
additional values that are one byte away.
1068
00:55:21,931 --> 00:55:27,571
In my diagram a moment ago, my addresses were arbitrarily 0x123, 124, 125, 126.
1069
00:55:27,571 --> 00:55:33,841
Now it starts at, by chance, 0x402004, which is s.
1070
00:55:33,841 --> 00:55:37,381
0x402004 is the same thing as s because I'm just
1071
00:55:37,381 --> 00:55:39,991
saying go to the first character and then get its address.
1072
00:55:39,991 --> 00:55:41,491
Those are one in the same now.
1073
00:55:41,491 --> 00:55:47,401
And then after that is 0x402005, 006, 007,
1074
00:55:47,401 --> 00:55:49,181
because that is just like the diagram.
1075
00:55:49,181 --> 00:55:52,981
Go to the i, to the exclamation point, and to the null character.
1076
00:55:52,981 --> 00:55:55,891
So all I'm doing now is using my newfound understanding of what
1077
00:55:55,891 --> 00:55:59,251
ampersand does and what the star does, is I'm just playing around.
1078
00:55:59,251 --> 00:56:02,149
I'm poking around in the computer's memory.
1079
00:56:02,149 --> 00:56:03,691
Just to demonstrate there's no magic.
1080
00:56:03,691 --> 00:56:06,661
It's all there very deliberately because I or printf or someone
1081
00:56:06,661 --> 00:56:07,441
else put it there.
1082
00:56:07,441 --> 00:56:09,166
Yeah.
1083
00:56:09,166 --> 00:56:15,894
AUDIENCE: [INAUDIBLE]
1084
00:56:15,894 --> 00:56:17,561
DAVID J. MALAN: Really good observation.
1085
00:56:17,561 --> 00:56:21,071
So it's indeed the case that hi, unlike 50,
1086
00:56:21,071 --> 00:56:26,291
is ending up at a very low address, not the 0x7FF wherever it was.
1087
00:56:26,291 --> 00:56:29,261
That's actually because, long story short, strings
1088
00:56:29,261 --> 00:56:32,231
are often stored in a different part of the computer's memory--
1089
00:56:32,231 --> 00:56:34,331
more on that later today-- for efficiency.
1090
00:56:34,331 --> 00:56:37,541
There's actually only going to be one copy of the word "hi" and exclamation
1091
00:56:37,541 --> 00:56:40,821
point, and the computer is going to tuck it at the beginning of my memory,
1092
00:56:40,821 --> 00:56:43,751
but other values like ints and floats and the
1093
00:56:43,751 --> 00:56:46,391
like-- they end up lower in memory by convention.
1094
00:56:46,391 --> 00:56:49,641
But a good observation, because that is consistent here.
1095
00:56:49,641 --> 00:56:53,111
All right, so a couple final details then, on what's been going on here.
1096
00:56:53,111 --> 00:56:58,691
Let me go ahead and claim that we implemented char star--
1097
00:56:58,691 --> 00:57:01,391
or rather, string as a char star as follows.
1098
00:57:01,391 --> 00:57:03,731
As of last week we were writing this code.
1099
00:57:03,731 --> 00:57:07,961
As of this week, we can now start writing this code because char star
1100
00:57:07,961 --> 00:57:11,541
specifically, we invented in the CS50 library.
1101
00:57:11,541 --> 00:57:14,891
But it turns out you've seen a way of inventing your own data types.
1102
00:57:14,891 --> 00:57:16,631
Recall this thing here.
1103
00:57:16,631 --> 00:57:20,861
We played around last time with data structures, or the struct keyword in C,
1104
00:57:20,861 --> 00:57:24,641
and briefly the typedef keyword, which defines a type for you.
1105
00:57:24,641 --> 00:57:26,651
And if I highlight what's interesting here,
1106
00:57:26,651 --> 00:57:30,341
the way we invented a person data type last time
1107
00:57:30,341 --> 00:57:33,401
was to define a person as having two variables inside of it--
1108
00:57:33,401 --> 00:57:38,598
a structure that encapsulates a name and encapsulates a number.
1109
00:57:38,598 --> 00:57:41,681
Now even though the syntax is a little different today because of the star
1110
00:57:41,681 --> 00:57:47,771
thing, notice that this could be a similar application of that idea.
1111
00:57:47,771 --> 00:57:52,061
If I want to create a type called string, highlighted in yellow here,
1112
00:57:52,061 --> 00:57:56,231
then I use typedef to make it defined to be char star.
1113
00:57:56,231 --> 00:57:59,951
So this is literally all that has ever been in CS50.h,
1114
00:57:59,951 --> 00:58:02,771
in addition to those prototypes of functions we've talked about.
1115
00:58:02,771 --> 00:58:05,831
typedef char star string is a one-line code
1116
00:58:05,831 --> 00:58:10,558
that brings the word string as a data type into existence,
1117
00:58:10,558 --> 00:58:12,141
and that's all that's ever been there.
1118
00:58:12,141 --> 00:58:15,281
But the star, the char star, is just too much in week 1.
1119
00:58:15,281 --> 00:58:18,671
We wait until this point to peel back that layer.
1120
00:58:18,671 --> 00:58:21,161
are any questions, then, on what a string is?
1121
00:58:21,161 --> 00:58:23,741
What star or the ampersand are doing?
1122
00:58:23,741 --> 00:58:25,511
Yeah.
1123
00:58:25,511 --> 00:58:28,608
AUDIENCE: [INAUDIBLE]
1124
00:58:28,608 --> 00:58:29,691
DAVID J. MALAN: Oh my God.
1125
00:58:29,691 --> 00:58:31,071
Massive spoiler, but yes.
1126
00:58:31,071 --> 00:58:34,671
If that is-- is that why when you compare two strings as I briefly
1127
00:58:34,671 --> 00:58:38,671
did, or almost did, problems arise.
1128
00:58:38,671 --> 00:58:40,971
And in fact yes, last week we use str compare--
1129
00:58:40,971 --> 00:58:45,351
STRCMP-- for a very deliberate reason because yes, the spoiler is I
1130
00:58:45,351 --> 00:58:49,941
accidentally would have compared two addresses in memory, not the strings
1131
00:58:49,941 --> 00:58:52,111
at those addresses.
1132
00:58:52,111 --> 00:58:53,251
Other questions here.
1133
00:58:55,213 --> 00:58:58,171
All right, well, before we give ourselves maybe a 10 minute break here,
1134
00:58:58,171 --> 00:58:59,401
we have lots of pieces of paper.
1135
00:58:59,401 --> 00:59:02,191
If anyone wants to come on up and play with this big stack of Post-Its,
1136
00:59:02,191 --> 00:59:04,201
if you want to make your own eight by eight grid of something
1137
00:59:04,201 --> 00:59:07,261
to share with the class if you're artistically inclined, come on up.
1138
00:59:07,261 --> 00:59:09,991
Otherwise, let's take 10 minutes and will return after 10.
1139
00:59:09,991 --> 00:59:14,911
All right, so let's come back to this question of how
1140
00:59:14,911 --> 00:59:17,881
we can start to use these pointers and these addresses, ultimately
1141
00:59:17,881 --> 00:59:18,971
in an interesting way.
1142
00:59:18,971 --> 00:59:21,211
The goal ultimately next week is going to be
1143
00:59:21,211 --> 00:59:24,931
to use these addresses to really stitch together more complicated data
1144
00:59:24,931 --> 00:59:28,261
structures than just persons, like last week, or candidates
1145
00:59:28,261 --> 00:59:30,061
in the context of an electoral algorithm,
1146
00:59:30,061 --> 00:59:33,631
if you will, and actually really use our memory in the most versatile way
1147
00:59:33,631 --> 00:59:36,691
to represent not just images but maybe videos
1148
00:59:36,691 --> 00:59:39,191
and other two-dimensional structures as well.
1149
00:59:39,191 --> 00:59:41,581
But for now, let's come back to this address example,
1150
00:59:41,581 --> 00:59:46,561
whittle it down to just a hi initially, and see what's going on again, here
1151
00:59:46,561 --> 00:59:47,461
underneath the hood.
1152
00:59:47,461 --> 00:59:50,401
So let me re-add the CS50 library just so we
1153
00:59:50,401 --> 00:59:54,031
use our synonym for a moment, that is the word string,
1154
00:59:54,031 --> 00:59:56,161
and I'll redefine s as a string.
1155
00:59:56,161 --> 00:59:58,831
And what I didn't mention before is that these double quotes
1156
00:59:58,831 --> 01:00:01,681
that you've been using for some time are actually a little special.
1157
01:00:01,681 --> 01:00:04,921
The double quotes are a clue to the compiler
1158
01:00:04,921 --> 01:00:09,311
that what is between them is in fact a string as we now know it,
1159
01:00:09,311 --> 01:00:12,571
which means the compiler will do all the work of figuring out
1160
01:00:12,571 --> 01:00:15,331
where to put the h, the i, the exclamation point,
1161
01:00:15,331 --> 01:00:18,361
and even adding for you automatically a backslash zero.
1162
01:00:18,361 --> 01:00:20,581
And what the compiler will do for you, too,
1163
01:00:20,581 --> 01:00:23,461
is figure out what address all four of those chars
1164
01:00:23,461 --> 01:00:27,331
ended up at and store it for you in the variable s.
1165
01:00:27,331 --> 01:00:31,531
So that's why it just happens with strings without using ampersands
1166
01:00:31,531 --> 01:00:35,911
or even stars explicitly, but the star at least has been there because again,
1167
01:00:35,911 --> 01:00:38,401
string is just synonymous now with char star.
1168
01:00:38,401 --> 01:00:42,371
It's not really as readable, but it is now the same idea.
1169
01:00:42,371 --> 01:00:44,911
So I'll leave string in place just to do something week
1170
01:00:44,911 --> 01:00:48,581
1 style here for a moment, and let's go ahead and print out a few characters.
1171
01:00:48,581 --> 01:00:54,031
So I'm going to use %c this time, and I'm going to print out s bracket zero
1172
01:00:54,031 --> 01:00:59,161
and then I'm going to print out s bracket one and s bracket two,
1173
01:00:59,161 --> 01:01:03,091
literally doing week three style from last week--
1174
01:01:03,091 --> 01:01:07,921
a printing of every character in s as though it were an array.
1175
01:01:07,921 --> 01:01:11,221
So ./address should give me h-i exclamation point.
1176
01:01:11,221 --> 01:01:14,461
And if I really want to get curious, technically speaking,
1177
01:01:14,461 --> 01:01:18,691
I could print out one more location, and let me go ahead and recompile,
1178
01:01:18,691 --> 01:01:24,211
make address ./address and there is, it would seem, the backslash zero.
1179
01:01:24,211 --> 01:01:29,641
I'm not seeing zero because I didn't type literally the zero char in ASCII,
1180
01:01:29,641 --> 01:01:33,331
it's literally eight zero bits which are technically unprintable,
1181
01:01:33,331 --> 01:01:34,961
if you will, in printf speak.
1182
01:01:34,961 --> 01:01:37,351
And so what I'm seeing here is like a blank symbol.
1183
01:01:37,351 --> 01:01:39,541
That just means there is something else there--
1184
01:01:39,541 --> 01:01:43,801
it's apparently all eight zero bits, but they are there
1185
01:01:43,801 --> 01:01:46,571
even though we're not seeing them literally right now.
1186
01:01:46,571 --> 01:01:49,211
Well, let's go ahead and peel back one of these layers
1187
01:01:49,211 --> 01:01:53,131
and let me go ahead and get rid of the CS50 library and get rid of,
1188
01:01:53,131 --> 01:01:56,551
therefore, the word string because again, henceforth it's just char star.
1189
01:01:56,551 --> 01:01:57,901
Nothing else is different.
1190
01:01:57,901 --> 01:02:00,781
I'm going to now do make address, ./address,
1191
01:02:00,781 --> 01:02:02,251
and it's the same exact thing.
1192
01:02:02,251 --> 01:02:05,621
And now, let's just focus on the hi rather than even worry about that.
1193
01:02:05,621 --> 01:02:10,411
So I'm going to recompile one last time and now I have h-i exclamation point.
1194
01:02:10,411 --> 01:02:15,001
Well, it turns out that the array notation we used last week
1195
01:02:15,001 --> 01:02:17,611
was technically some of this syntactic sugar.
1196
01:02:17,611 --> 01:02:20,821
Sort of a neat way to use syntax in a useful way,
1197
01:02:20,821 --> 01:02:26,431
but we can see more explicitly today what the square brackets for a string
1198
01:02:26,431 --> 01:02:28,061
is actually doing.
1199
01:02:28,061 --> 01:02:29,801
Let me go ahead and do this.
1200
01:02:29,801 --> 01:02:35,041
Let me adventurously say I want to print out not s bracket
1201
01:02:35,041 --> 01:02:40,831
zero, but I want to print out whatever the first character of s is.
1202
01:02:40,831 --> 01:02:43,081
So to be clear, what is s now?
1203
01:02:43,081 --> 01:02:44,431
It's the address of a string.
1204
01:02:44,431 --> 01:02:45,931
OK, but what is s, really?
1205
01:02:45,931 --> 01:02:49,441
s is the address of the first char in a string
1206
01:02:49,441 --> 01:02:52,441
and again, that's sufficient for defining a string because eventually
1207
01:02:52,441 --> 01:02:55,361
the computer will see that there's a backslash n at the end of it.
1208
01:02:55,361 --> 01:03:01,241
So s is specifically the address of the first character in a string.
1209
01:03:01,241 --> 01:03:04,291
So that means, using my new syntax, if I want
1210
01:03:04,291 --> 01:03:07,583
to print out that first character I can print out star
1211
01:03:07,583 --> 01:03:11,473
s, because recall that star is the dereference operator when you don't
1212
01:03:11,473 --> 01:03:13,681
repeat the word char, you don't repeat the word int--
1213
01:03:13,681 --> 01:03:15,301
you just use the star here.
1214
01:03:15,301 --> 01:03:17,821
That means go to that address.
1215
01:03:17,821 --> 01:03:22,651
Similarly, if I, in my newfound knowledge of how strings work,
1216
01:03:22,651 --> 01:03:26,281
know that the h comes first, then the i right after it,
1217
01:03:26,281 --> 01:03:30,151
then the exclamation point, then the backslash zero, contiguously
1218
01:03:30,151 --> 01:03:33,931
one byte apart, I could start to do some arithmetic.
1219
01:03:33,931 --> 01:03:39,571
I could go to s plus 1 byte and print out the second character,
1220
01:03:39,571 --> 01:03:43,321
and I could print out whatever is at s plus 2--
1221
01:03:43,321 --> 01:03:46,591
in fact, doing what's generally known as pointer arithmetic.
1222
01:03:46,591 --> 01:03:49,591
Literally treating pointers as the numbers they are--
1223
01:03:49,591 --> 01:03:52,831
hexadecimal or decimal, doesn't really matter-- it's still just numbers.
1224
01:03:52,831 --> 01:03:55,661
And go ahead and add one byte or two bytes
1225
01:03:55,661 --> 01:03:58,151
to them to start at the beginning of a string
1226
01:03:58,151 --> 01:04:00,831
and just poke around from left to right.
1227
01:04:00,831 --> 01:04:04,901
So this now is equivalent to what we did last week using square bracket
1228
01:04:04,901 --> 01:04:09,671
notation, but now I'm re implementing that same idea with this lower level
1229
01:04:09,671 --> 01:04:13,821
plumbing, understanding ampersand and stars now a little bit more,
1230
01:04:13,821 --> 01:04:16,601
so if I remake this program and do ./address,
1231
01:04:16,601 --> 01:04:19,128
I should still see h-i exclamation point.
1232
01:04:19,128 --> 01:04:21,461
But what I'm really doing is just kind of demonstrating,
1233
01:04:21,461 --> 01:04:24,851
hopefully, my understanding of what really
1234
01:04:24,851 --> 01:04:26,711
is going on in the computer's memory.
1235
01:04:26,711 --> 01:04:29,231
Now, programmers who are maybe trying to show off
1236
01:04:29,231 --> 01:04:30,611
might actually write this syntax.
1237
01:04:30,611 --> 01:04:33,236
I think the more common syntax would be what we did last week--
1238
01:04:33,236 --> 01:04:34,971
s bracket zero, s bracket one.
1239
01:04:34,971 --> 01:04:35,471
Why?
1240
01:04:35,471 --> 01:04:37,346
It's just a little more readable and we don't
1241
01:04:37,346 --> 01:04:41,531
need to brag about or care about this underlying representation.
1242
01:04:41,531 --> 01:04:44,411
The square brackets last week we're an abstraction, if you will,
1243
01:04:44,411 --> 01:04:46,721
on top of what is lower level math.
1244
01:04:46,721 --> 01:04:49,361
But that's all that's going on underneath the hood.
1245
01:04:49,361 --> 01:04:52,811
We're poking around from byte to byte to byte.
1246
01:04:52,811 --> 01:04:58,221
All right, let me pause here, see if there's any questions on that one.
1247
01:04:58,221 --> 01:05:00,931
Any questions on this?
1248
01:05:00,931 --> 01:05:03,651
Let's do one more then, just to demonstrate that this is not
1249
01:05:03,651 --> 01:05:05,171
even specific to strings.
1250
01:05:05,171 --> 01:05:07,161
Let me go ahead and get rid of all of this
1251
01:05:07,161 --> 01:05:11,541
and let me give myself an array of numbers like I did last week.
1252
01:05:11,541 --> 01:05:13,821
So if I'm going to declare all the numbers
1253
01:05:13,821 --> 01:05:16,521
at once using this funky curly brace notation,
1254
01:05:16,521 --> 01:05:19,971
I can do like 4, 6, 8, 2, 7, 5, 0.
1255
01:05:19,971 --> 01:05:24,051
So seven different numbers inside of an array that's automatically
1256
01:05:24,051 --> 01:05:25,071
initialized like this.
1257
01:05:25,071 --> 01:05:27,131
I don't, strictly speaking, need to say seven.
1258
01:05:27,131 --> 01:05:28,881
The compiler is smart enough to figure out
1259
01:05:28,881 --> 01:05:31,251
how many numbers I put with commas between them,
1260
01:05:31,251 --> 01:05:35,751
and that just gives me an array containing 4, 6, 8, 2, 7, 5, 0.
1261
01:05:35,751 --> 01:05:39,201
So it turns out I can print each of these numbers in the familiar way.
1262
01:05:39,201 --> 01:05:45,021
I can do a printf of %i backslash n, and I can print numbers bracket zero,
1263
01:05:45,021 --> 01:05:49,041
and let me just do some quick copy/paste just to print the first three of these.
1264
01:05:49,041 --> 01:05:53,881
Theoretically, that should print out 4, 6, 8, and so forth.
1265
01:05:53,881 --> 01:05:57,021
But I can do the same sort of manipulation understanding
1266
01:05:57,021 --> 01:05:59,931
what pointers now are, using pointer arithmetic.
1267
01:05:59,931 --> 01:06:03,741
So let me actually unwind this and just go back to one printf,
1268
01:06:03,741 --> 01:06:07,191
and instead of printing numbers bracket zero like I might have last week,
1269
01:06:07,191 --> 01:06:11,361
let me just go and print out whatever is at that address--
1270
01:06:11,361 --> 01:06:13,431
so asterisk numbers.
1271
01:06:13,431 --> 01:06:15,861
Let me then print out the second digit, which
1272
01:06:15,861 --> 01:06:21,051
is going to be whatever is at numbers plus 1, and then let me do this further
1273
01:06:21,051 --> 01:06:25,021
and do whatever is at numbers plus 2, and if I really want to repeat this,
1274
01:06:25,021 --> 01:06:27,261
let me do it four more times and do what's
1275
01:06:27,261 --> 01:06:31,881
at location three, four, five, and six.
1276
01:06:31,881 --> 01:06:35,631
And that's seven total numbers because I started counting at zero.
1277
01:06:35,631 --> 01:06:37,201
So let me just quickly run this.
1278
01:06:37,201 --> 01:06:39,651
Make address, ./address.
1279
01:06:39,651 --> 01:06:42,381
There are those seven digits being printed.
1280
01:06:42,381 --> 01:06:46,401
But there's something subtle but also useful here.
1281
01:06:46,401 --> 01:06:47,541
Each of these digits--
1282
01:06:47,541 --> 01:06:49,341
4, 6, 8, 2,7,5, 0--
1283
01:06:49,341 --> 01:06:49,891
is an int.
1284
01:06:49,891 --> 01:06:50,391
Why?
1285
01:06:50,391 --> 01:06:52,531
Because I made an array of integers.
1286
01:06:52,531 --> 01:06:57,181
But think back-- how big is a typical integer, have we claimed?
1287
01:06:57,181 --> 01:07:02,821
Four bytes, or 32 bits, so it's worth noting that I don't really
1288
01:07:02,821 --> 01:07:04,841
need to worry about that detail.
1289
01:07:04,841 --> 01:07:10,119
Notice that I did not do plus 4, plus 8, plus 12, plus 16, plus 20.
1290
01:07:10,119 --> 01:07:11,911
I, the programmer, strictly speaking, don't
1291
01:07:11,911 --> 01:07:14,191
need to worry about how big the data type is.
1292
01:07:14,191 --> 01:07:16,291
This is the power of pointer arithmetic.
1293
01:07:16,291 --> 01:07:21,931
The compiler is smart enough to know that if you add 1 to this pointer,
1294
01:07:21,931 --> 01:07:26,441
that is the same as saying go one more piece of data--
1295
01:07:26,441 --> 01:07:27,481
not just one byte--
1296
01:07:27,481 --> 01:07:29,251
so if it's an int, move four.
1297
01:07:29,251 --> 01:07:30,871
If it's a second int, move eight.
1298
01:07:30,871 --> 01:07:32,601
If it's a third int, move 12.
1299
01:07:32,601 --> 01:07:35,821
Pointer arithmetic handles that annoying arithmetic for you
1300
01:07:35,821 --> 01:07:38,461
so you can just think of this as a number after a number
1301
01:07:38,461 --> 01:07:41,821
after a number that are back to back to back but not one byte apart,
1302
01:07:41,821 --> 01:07:43,171
but four bytes apart.
1303
01:07:43,171 --> 01:07:47,201
Which is only to say plus 1, plus 2, plus 3 works no matter the data type.
1304
01:07:47,201 --> 01:07:47,701
Why?
1305
01:07:47,701 --> 01:07:53,121
Because the compiler knows what type of data you're talking about.
1306
01:07:53,121 --> 01:07:56,511
Now, there's one other detail I should reveal here
1307
01:07:56,511 --> 01:07:58,671
that I've taken for granted.
1308
01:07:58,671 --> 01:08:01,641
In the past I was using double quotes to represent strings,
1309
01:08:01,641 --> 01:08:04,371
and I claim that the compiler's smart enough to realize that oh,
1310
01:08:04,371 --> 01:08:08,911
if I have double quote hi, that means it's an array of h-i exclamation point,
1311
01:08:08,911 --> 01:08:10,431
and then the backslash zero.
1312
01:08:10,431 --> 01:08:12,801
Notice this usefulness.
1313
01:08:12,801 --> 01:08:18,561
It turns out that you can actually treat arrays as though the name of the array
1314
01:08:18,561 --> 01:08:20,781
is itself a pointer, and this is actually
1315
01:08:20,781 --> 01:08:23,151
going to be something useful in upcoming problems
1316
01:08:23,151 --> 01:08:26,721
when we want to pass arrays around in the computer's memory.
1317
01:08:26,721 --> 01:08:30,463
Notice that strictly speaking on line five, there's no pointers going on.
1318
01:08:30,463 --> 01:08:32,421
There's no star, there's no ampersand-- there's
1319
01:08:32,421 --> 01:08:35,661
nothing new there, and yet instantly on line seven
1320
01:08:35,661 --> 01:08:40,491
I'm pretending that it is the address, and this is actually OK.
1321
01:08:40,491 --> 01:08:44,391
It turns out that an array really can be treated
1322
01:08:44,391 --> 01:08:47,881
as the address of the first element in that array.
1323
01:08:47,881 --> 01:08:52,079
The difference is that there's no secret backslash zero anywhere.
1324
01:08:52,079 --> 01:08:53,871
This is just part of the phone number here,
1325
01:08:53,871 --> 01:08:56,691
the ending in zero-- that's not like a special backslash zero.
1326
01:08:56,691 --> 01:08:59,721
So this is something we're going to take advantage of too, before long.
1327
01:08:59,721 --> 01:09:03,441
There's this interrelationship between addresses and arrays
1328
01:09:03,441 --> 01:09:08,121
that just generally allows you to treat one as though it is the other,
1329
01:09:08,121 --> 01:09:10,521
but the math is taken care of for you.
1330
01:09:10,521 --> 01:09:14,961
Are any questions then on this before we start to solve some bigger problems?
1331
01:09:14,961 --> 01:09:16,761
Yeah.
1332
01:09:16,761 --> 01:09:23,784
AUDIENCE: [INAUDIBLE]
1333
01:09:23,784 --> 01:09:24,951
DAVID J. MALAN: Potentially.
1334
01:09:24,951 --> 01:09:28,911
If you go beyond the end of an array, you might get a segmentation fault.
1335
01:09:28,911 --> 01:09:32,181
The problem is that that symptom is sometimes nondeterministic,
1336
01:09:32,181 --> 01:09:35,181
which means that sometimes it will happen, sometimes it won't.
1337
01:09:35,181 --> 01:09:39,141
It often depends on how far off the end of the array you actually go.
1338
01:09:39,141 --> 01:09:41,631
You'll often not induce the segmentation fault
1339
01:09:41,631 --> 01:09:44,421
if you just poke a little too far, but if you go way too far
1340
01:09:44,421 --> 01:09:45,831
it quite likely will.
1341
01:09:45,831 --> 01:09:49,161
But we'll give you a tool today actually for detecting and solving
1342
01:09:49,161 --> 01:09:51,181
exactly that kind of situation.
1343
01:09:51,181 --> 01:09:54,091
So let's go ahead now and do something a little different in code,
1344
01:09:54,091 --> 01:09:56,601
but that actually comes back to that spoiler from earlier.
1345
01:09:56,601 --> 01:10:01,471
Let me go ahead and create a program called compare.c, and in this program
1346
01:10:01,471 --> 01:10:04,641
I'm going to go ahead and allow myself the CS50 library,
1347
01:10:04,641 --> 01:10:08,121
not so much for string but so that I can actually use GetInt still,
1348
01:10:08,121 --> 01:10:12,440
which is way easier than the way we'll see that C normally lets you get input.
1349
01:10:12,440 --> 01:10:15,471
Let me give myself stdio.h, do an int main(void),
1350
01:10:15,471 --> 01:10:18,381
not worrying about command line arguments today, and let me go ahead
1351
01:10:18,381 --> 01:10:22,701
and get an int i using get int, and ask the human for the value of i,
1352
01:10:22,701 --> 01:10:28,461
then let me give myself an int j, ask the user for another int, calling it j,
1353
01:10:28,461 --> 01:10:32,631
and then let me go ahead and kind of naively, but to your point earlier,
1354
01:10:32,631 --> 01:10:36,051
if i equals equals j, then let's go ahead
1355
01:10:36,051 --> 01:10:41,121
and print out something like "same," backslash n, else let's go ahead
1356
01:10:41,121 --> 01:10:44,791
and print out "different" if they are not, in fact, the same.
1357
01:10:44,791 --> 01:10:48,951
So that would seem to be a program that compares the value of two integers.
1358
01:10:48,951 --> 01:10:51,261
All right, so let's go ahead and run make compare--
1359
01:10:51,261 --> 01:10:53,451
so far so good-- ./compare.
1360
01:10:53,451 --> 01:10:56,991
OK, i will be 50, j will be 50--
1361
01:10:56,991 --> 01:10:58,041
they're the same.
1362
01:10:58,041 --> 01:10:59,221
Let's do it once more.
1363
01:10:59,221 --> 01:11:02,239
i will be 50, j will be 42.
1364
01:11:02,239 --> 01:11:03,031
They are different.
1365
01:11:03,031 --> 01:11:07,341
So so far, so good in this first version of comparison.
1366
01:11:07,341 --> 01:11:10,411
But as you might see where I'm going with this,
1367
01:11:10,411 --> 01:11:14,151
let's move away from integers and let's actually change these things to char--
1368
01:11:14,151 --> 01:11:15,301
to strings.
1369
01:11:15,301 --> 01:11:17,901
So I could do string s over here--
1370
01:11:17,901 --> 01:11:20,481
GetString s over here.
1371
01:11:20,481 --> 01:11:27,351
Then I could do string t over here, and GetString over here,
1372
01:11:27,351 --> 01:11:30,081
asking the user for t this time, here.
1373
01:11:30,081 --> 01:11:31,611
And then I can compare the two.
1374
01:11:31,611 --> 01:11:33,458
If s equals equals t--
1375
01:11:33,458 --> 01:11:34,791
and this is a common convention.
1376
01:11:34,791 --> 01:11:37,821
If you've used s for string already you can use t for the next one, at least
1377
01:11:37,821 --> 01:11:39,441
for simple demonstrations like this.
1378
01:11:39,441 --> 01:11:42,566
I'm going to compare the two, just like I did for ints, which worked great.
1379
01:11:42,566 --> 01:11:46,521
Make compare-- so far so good-- ./address--
1380
01:11:46,521 --> 01:11:47,361
oh, sorry.
1381
01:11:47,361 --> 01:11:49,221
Wrong program-- ./compare.
1382
01:11:49,221 --> 01:11:52,431
Let me go ahead and type in something like
1383
01:11:52,431 --> 01:11:57,401
hi, exclamation point and bye, exclamation point, which of course
1384
01:11:57,401 --> 01:11:59,301
should definitely be different.
1385
01:11:59,301 --> 01:12:05,121
Let me run it again with hi, exclamation point and hi, exclamation point.
1386
01:12:05,121 --> 01:12:07,071
Different-- maybe I messed up.
1387
01:12:07,071 --> 01:12:10,181
Let's maybe do it lowercase, maybe that'll fix.
1388
01:12:10,181 --> 01:12:12,501
But no, those two are different.
1389
01:12:12,501 --> 01:12:16,481
So to come back to what I described as a spoiler earlier, what's
1390
01:12:16,481 --> 01:12:20,659
the fundamental issue here, to be clear?
1391
01:12:20,659 --> 01:12:22,701
Why is it saying different even though I'm pretty
1392
01:12:22,701 --> 01:12:24,118
sure I typed the same thing twice.
1393
01:12:24,118 --> 01:12:26,181
Yeah.
1394
01:12:26,181 --> 01:12:29,601
Yeah, this is where it's now useful to know that string has been
1395
01:12:29,601 --> 01:12:33,063
an abstraction-- a training wheel, if you will-- and if we take that away--
1396
01:12:33,063 --> 01:12:35,271
still use GetString because that's convenient still--
1397
01:12:35,271 --> 01:12:38,061
but if I change string to be char star, it's
1398
01:12:38,061 --> 01:12:44,301
a little more explicit as to what s and what t are. s is a pointer to a char,
1399
01:12:44,301 --> 01:12:46,761
that is the address of a char. t is a pointer
1400
01:12:46,761 --> 01:12:48,921
to a char, that is the address of a char.
1401
01:12:48,921 --> 01:12:52,071
Specifically, the first character in s and the first character
1402
01:12:52,071 --> 01:12:53,851
in t, respectively.
1403
01:12:53,851 --> 01:12:56,076
So if I'm comparing these two it should stand
1404
01:12:56,076 --> 01:12:57,951
to reason that they're going to be different.
1405
01:12:57,951 --> 01:12:58,451
Why?
1406
01:12:58,451 --> 01:13:02,061
Because s might end up here in memory and t might end up here in memory.
1407
01:13:02,061 --> 01:13:05,181
Each time I call GetString, it is not smart enough or advanced enough
1408
01:13:05,181 --> 01:13:07,171
to know that, wait a minute-- you typed the same thing.
1409
01:13:07,171 --> 01:13:08,691
I'm just going to hand you back the same address.
1410
01:13:08,691 --> 01:13:11,511
That doesn't happen because we did not design GetString that way.
1411
01:13:11,511 --> 01:13:15,141
Each time I call GetString, it returns, apparently,
1412
01:13:15,141 --> 01:13:17,901
a different copy of the string that was typed in.
1413
01:13:17,901 --> 01:13:20,211
A hi over here and a hi over here.
1414
01:13:20,211 --> 01:13:22,791
They might look the same to the human but to the computer
1415
01:13:22,791 --> 01:13:26,691
they are different chunks of memory, and therefore at different addresses.
1416
01:13:26,691 --> 01:13:30,181
And here, too, we can reveal what is GetString returning?
1417
01:13:30,181 --> 01:13:34,161
Well, up until today it was returning a string, so to speak.
1418
01:13:34,161 --> 01:13:35,661
That's not really a thing.
1419
01:13:35,661 --> 01:13:38,001
Technically, what GetString has always been
1420
01:13:38,001 --> 01:13:43,371
doing is returning the address of the first char in a string
1421
01:13:43,371 --> 01:13:47,181
and trusting that we put a backslash zero at the end of whatever the human
1422
01:13:47,181 --> 01:13:51,411
typed in, and that's enough now for printf, for strlen, for you
1423
01:13:51,411 --> 01:13:53,961
to know where a string begins and ends.
1424
01:13:53,961 --> 01:13:57,711
So GetString has actually always returned a pointer.
1425
01:13:57,711 --> 01:14:01,101
It has not returned a quote unquote string per se,
1426
01:14:01,101 --> 01:14:04,401
but there are functions that can solve this comparison for us.
1427
01:14:04,401 --> 01:14:07,501
Recall that I could do something like this.
1428
01:14:07,501 --> 01:14:10,431
I could actually go in here and I could--
1429
01:14:10,431 --> 01:14:11,641
let's see, where was it?
1430
01:14:11,641 --> 01:14:18,981
So if I include str compare here and use it to pass in two values, s and t,
1431
01:14:18,981 --> 01:14:22,701
let's see now what happens when I make compare.
1432
01:14:22,701 --> 01:14:26,211
Implicitly declaring library function str compare with type int--
1433
01:14:26,211 --> 01:14:27,321
and well, there's a star.
1434
01:14:27,321 --> 01:14:30,801
So you might have seen this error before and you might have ignored most of it,
1435
01:14:30,801 --> 01:14:35,281
but there's some evidence of stars or pointers going on here.
1436
01:14:35,281 --> 01:14:37,771
It looks like I didn't include the string.h header file,
1437
01:14:37,771 --> 01:14:38,961
so that's an easy fix.
1438
01:14:38,961 --> 01:14:43,551
Include string.h which, despite its name, does not create a data type
1439
01:14:43,551 --> 01:14:46,431
called string, it just has string-related functions in it
1440
01:14:46,431 --> 01:14:47,511
like str compare.
1441
01:14:47,511 --> 01:14:49,161
Let's make compare again.
1442
01:14:49,161 --> 01:14:51,231
Now it compiles, ./compare.
1443
01:14:51,231 --> 01:14:55,011
Now let's type in hi, exclamation point and even the same thing again.
1444
01:14:55,011 --> 01:14:58,641
These are now-- oh, I used it wrong.
1445
01:14:58,641 --> 01:15:00,364
OK, user error.
1446
01:15:00,364 --> 01:15:02,781
That was supposed to be impressive, but it's the opposite.
1447
01:15:02,781 --> 01:15:05,101
What did I do wrong?
1448
01:15:05,101 --> 01:15:06,201
What did I do wrong here?
1449
01:15:06,201 --> 01:15:07,463
Yeah.
1450
01:15:07,463 --> 01:15:08,951
Yeah.
1451
01:15:08,951 --> 01:15:12,258
AUDIENCE: [INAUDIBLE]
1452
01:15:12,258 --> 01:15:14,591
DAVID J. MALAN: Yeah, it returns three different values.
1453
01:15:14,591 --> 01:15:18,371
Zero if they're the same, positive 1 becomes before the other,
1454
01:15:18,371 --> 01:15:20,061
negative if the opposite is true.
1455
01:15:20,061 --> 01:15:23,261
I just forgot that, so like I did last week correctly,
1456
01:15:23,261 --> 01:15:26,741
if I want to compare them for equality per the manual page,
1457
01:15:26,741 --> 01:15:29,421
I should be checking for zero as the return value.
1458
01:15:29,421 --> 01:15:32,591
Now make compare, ./compare, Enter.
1459
01:15:32,591 --> 01:15:35,261
Let's try it one last time-- hi and hi.
1460
01:15:35,261 --> 01:15:36,821
OK now, they're in fact the same.
1461
01:15:36,821 --> 01:15:38,231
And Justin, thank you.
1462
01:15:41,871 --> 01:15:44,751
And indeed, not that it's returning same all the time.
1463
01:15:44,751 --> 01:15:46,971
If I type in hi and then bye, it's indeed
1464
01:15:46,971 --> 01:15:49,261
noticing that difference as well.
1465
01:15:49,261 --> 01:15:53,251
Well, let me go ahead and do one other thing here.
1466
01:15:53,251 --> 01:15:55,501
Let's do one other thing.
1467
01:15:55,501 --> 01:15:59,001
Let me go ahead now and just reveal more pictorially what's going on.
1468
01:15:59,001 --> 01:16:02,331
Let's get rid of the string comparison and let's just print these things out.
1469
01:16:02,331 --> 01:16:06,111
The simple way to print this out would be with %s and again, %s is special--
1470
01:16:06,111 --> 01:16:07,161
printf knows--
1471
01:16:07,161 --> 01:16:10,341
taking an address and start there, print every character up
1472
01:16:10,341 --> 01:16:13,741
until the backslash n, so let's just hand it s and do that.
1473
01:16:13,741 --> 01:16:16,911
And then let's do one more, %s,t.
1474
01:16:16,911 --> 01:16:21,751
This is, again, sort of a mix of week 1 and this week
1475
01:16:21,751 --> 01:16:23,571
because I got rid of the word string.
1476
01:16:23,571 --> 01:16:28,711
I'm using char star, but I'm still using printf and %s in the same way.
1477
01:16:28,711 --> 01:16:32,331
Let me go ahead and run compare now, and if I type hi and hi,
1478
01:16:32,331 --> 01:16:34,291
I should see the same thing twice.
1479
01:16:34,291 --> 01:16:37,911
So they look the same, but here now we have the syntax today
1480
01:16:37,911 --> 01:16:40,291
to print out the actual addresses of these things.
1481
01:16:40,291 --> 01:16:44,721
So let me just change the s to a p, because p means don't go to the address
1482
01:16:44,721 --> 01:16:48,651
and print it, it means just print the address as a pointer.
1483
01:16:48,651 --> 01:16:53,421
So make compare, ./compare, and now let's type in hi, and once more,
1484
01:16:53,421 --> 01:16:57,831
and I should see, indeed, two slightly different addresses given
1485
01:16:57,831 --> 01:16:58,641
in hexadecimal.
1486
01:16:58,641 --> 01:17:00,951
One's got a B at the end, one's got an F at the end,
1487
01:17:00,951 --> 01:17:03,481
and they are indeed a few bytes apart.
1488
01:17:03,481 --> 01:17:06,706
So this is just confirming what our suspicions have actually been.
1489
01:17:06,706 --> 01:17:09,081
So what does this mean, perhaps in the computer's memory?
1490
01:17:09,081 --> 01:17:10,581
Well, let's take a look.
1491
01:17:10,581 --> 01:17:14,511
I've zoomed out so I have a little more squares to look at at once.
1492
01:17:14,511 --> 01:17:20,901
Here might be s in memory when I do string s equals, or char star s equals.
1493
01:17:20,901 --> 01:17:24,381
I get a variable that's of size 1, 2, 3, 4, 5, 6, 7, 8, because I
1494
01:17:24,381 --> 01:17:27,951
claimed earlier that on modern systems, pointers are generally eight bytes
1495
01:17:27,951 --> 01:17:30,261
nowadays so they can count even higher.
1496
01:17:30,261 --> 01:17:33,246
And inside of the computer's memory, also, might be hi.
1497
01:17:33,246 --> 01:17:35,871
And I don't know where it ends up so for the sake of discussion
1498
01:17:35,871 --> 01:17:36,801
it ended up down here.
1499
01:17:36,801 --> 01:17:39,761
That's what was free when I ran the program.
1500
01:17:39,761 --> 01:17:41,601
h-i exclamation point, backslash zero.
1501
01:17:41,601 --> 01:17:46,761
Maybe it ended up, for the sake of discussion, at 0x123, 4, 5, and 6.
1502
01:17:46,761 --> 01:17:51,801
So to be clear, what is s storing once the assignment
1503
01:17:51,801 --> 01:17:54,711
operator copies from right to left?
1504
01:17:54,711 --> 01:17:59,331
What is s storing if I advance one more slide?
1505
01:17:59,331 --> 01:18:01,451
Yeah.
1506
01:18:01,451 --> 01:18:05,261
0x123, the presumption being that if a string is
1507
01:18:05,261 --> 01:18:09,236
defined by the address of its first char and that address of its first char
1508
01:18:09,236 --> 01:18:13,691
is 0x123, then that's indeed what should be in the variable s.
1509
01:18:13,691 --> 01:18:16,751
And so technically, that's what's been happening with that assignment
1510
01:18:16,751 --> 01:18:18,251
operator from right to left.
1511
01:18:18,251 --> 01:18:21,401
GetString indeed returns a string, so to speak,
1512
01:18:21,401 --> 01:18:25,241
but more properly it returns the address of a char.
1513
01:18:25,241 --> 01:18:28,721
What's been then copied from right to left using that assignment operator
1514
01:18:28,721 --> 01:18:31,601
all these weeks is indeed that address.
1515
01:18:31,601 --> 01:18:36,101
Now technically, we don't really need to care about where these addresses are.
1516
01:18:36,101 --> 01:18:38,951
It suffices to just think about them referentially, but let's
1517
01:18:38,951 --> 01:18:42,791
first consider where t might be. t is just another variable that I
1518
01:18:42,791 --> 01:18:44,441
created on my second line of code.
1519
01:18:44,441 --> 01:18:46,061
Maybe it ends up there, maybe somewhere else.
1520
01:18:46,061 --> 01:18:48,353
For the sake of discussion I'll draw it left and right.
1521
01:18:48,353 --> 01:18:51,771
Where did the second word end up that I typed in?
1522
01:18:51,771 --> 01:18:57,671
Well, suppose the second copy of hi ended up at 0x456457458459.
1523
01:18:57,671 --> 01:18:58,961
What ended up in t?
1524
01:18:58,961 --> 01:19:00,551
I'll pluck this one off myself.
1525
01:19:00,551 --> 01:19:02,621
0x456, presumably.
1526
01:19:02,621 --> 01:19:06,071
And so this is now a pictorial representation of why,
1527
01:19:06,071 --> 01:19:07,751
and let's abstract away everything else.
1528
01:19:07,751 --> 01:19:13,061
When I compared s against t using equal equals, based on the picture
1529
01:19:13,061 --> 01:19:14,591
they're obviously not the same.
1530
01:19:14,591 --> 01:19:16,751
One is over here, one is over here.
1531
01:19:16,751 --> 01:19:21,281
And per a moment ago, one is 0x123, the other is 0x456.
1532
01:19:21,281 --> 01:19:24,491
Yes, technically they're pointing at something that's the same,
1533
01:19:24,491 --> 01:19:27,971
but that just reveals how str compare works.
1534
01:19:27,971 --> 01:19:30,641
str compare is apparently a function that
1535
01:19:30,641 --> 01:19:33,881
takes in the address of a string as its argument
1536
01:19:33,881 --> 01:19:36,401
and the address of another string as its argument,
1537
01:19:36,401 --> 01:19:41,321
it goes to the first character in each of those strings, respectively,
1538
01:19:41,321 --> 01:19:43,511
and probably has a for loop or a while loop
1539
01:19:43,511 --> 01:19:46,421
and just goes from left to right, comparing, looking
1540
01:19:46,421 --> 01:19:50,141
for the same chars left and right, and if it doesn't notice any differences,
1541
01:19:50,141 --> 01:19:52,121
boom-- it returns zero.
1542
01:19:52,121 --> 01:19:56,481
If it does notice a difference it returns a positive or a negative value.
1543
01:19:56,481 --> 01:20:00,321
And that's very similar, recall, to how we implemented string length ourselves
1544
01:20:00,321 --> 01:20:00,821
last week.
1545
01:20:00,821 --> 01:20:03,731
I used a for loop, I was looking for a backslash zero.
1546
01:20:03,731 --> 01:20:09,521
str compare is probably a little similar in spirit, looping from left to right
1547
01:20:09,521 --> 01:20:13,001
but comparing, this time not just counting.
1548
01:20:13,001 --> 01:20:15,731
Are any questions then, on string comparison
1549
01:20:15,731 --> 01:20:18,821
and why it is that we use str compare and not equals equals?
1550
01:20:18,821 --> 01:20:20,013
Yeah.
1551
01:20:20,013 --> 01:20:22,249
AUDIENCE: Do pointers have addresses?
1552
01:20:22,249 --> 01:20:24,041
DAVID J. MALAN: Do pointers have addresses?
1553
01:20:24,041 --> 01:20:24,541
Yes.
1554
01:20:24,541 --> 01:20:29,291
So we won't do that today, but I could actually use the ampersand operator
1555
01:20:29,291 --> 01:20:30,821
on s or on t.
1556
01:20:30,821 --> 01:20:34,421
That would give me the equivalent of a char star star
1557
01:20:34,421 --> 01:20:36,606
that itself could be stored elsewhere in memory.
1558
01:20:36,606 --> 01:20:37,481
That's where it ends.
1559
01:20:37,481 --> 01:20:39,671
We don't do that recursively forever.
1560
01:20:39,671 --> 01:20:42,611
There's star and there's star star, but yes, that is a thing
1561
01:20:42,611 --> 01:20:45,911
and it's very often useful in the context of two dimensional arrays,
1562
01:20:45,911 --> 01:20:49,181
which we haven't really talked about, but that is a feature of the language,
1563
01:20:49,181 --> 01:20:49,681
too.
1564
01:20:49,681 --> 01:20:50,711
But not today.
1565
01:20:50,711 --> 01:20:52,221
Good question.
1566
01:20:52,221 --> 01:20:55,271
All right, so what might we now do to take things up a notch?
1567
01:20:55,271 --> 01:20:57,791
Well let's go ahead and implement a different program here
1568
01:20:57,791 --> 01:21:01,341
that maybe tries copying some values, just to demonstrate this.
1569
01:21:01,341 --> 01:21:05,081
Let me open up a file called, how about copy.c,
1570
01:21:05,081 --> 01:21:07,511
and I'm going to start off with a few includes.
1571
01:21:07,511 --> 01:21:11,291
So let's include the CS50 library just so we have a way of getting user input.
1572
01:21:11,291 --> 01:21:15,941
Let's include-- how about stdio as always, let's preemptively
1573
01:21:15,941 --> 01:21:18,711
include string.h and maybe one other in a moment.
1574
01:21:18,711 --> 01:21:21,711
Let's do int main(void) as before.
1575
01:21:21,711 --> 01:21:25,241
And then in here, let's get a string from the user and just
1576
01:21:25,241 --> 01:21:27,671
call it s for simplicity.
1577
01:21:27,671 --> 01:21:31,361
And heck, we can actually just call this char star if we want,
1578
01:21:31,361 --> 01:21:33,474
or string, since we're using the RS50 library.
1579
01:21:33,474 --> 01:21:34,641
But we'll come back to that.
1580
01:21:34,641 --> 01:21:38,231
Let's now make a copy of s and do s equals t,
1581
01:21:38,231 --> 01:21:42,891
using a single assignment operator and then let's check something like this.
1582
01:21:42,891 --> 01:21:47,831
Let's go into the first character of t, which is t bracket zero,
1583
01:21:47,831 --> 01:21:50,231
and then let's uppercase it using that function
1584
01:21:50,231 --> 01:21:55,571
that we've used in the past of toupper t bracket zero, semicolon.
1585
01:21:55,571 --> 01:21:57,231
And actually, I should go back up here.
1586
01:21:57,231 --> 01:22:01,468
If I'm using toupper or if you use tolower or isupper or islower--
1587
01:22:01,468 --> 01:22:04,301
I might not remember this offhand, but it was in another header file
1588
01:22:04,301 --> 01:22:06,161
called C type dot h.
1589
01:22:06,161 --> 01:22:09,291
There was a bunch of helpful functions in that library as well.
1590
01:22:09,291 --> 01:22:14,096
Now at the very last line of the program let's just print out what both s and t
1591
01:22:14,096 --> 01:22:21,521
are by simply printing out %s for each of them, and t is %s also, not %t,
1592
01:22:21,521 --> 01:22:24,681
of course, and let's see what happens here.
1593
01:22:24,681 --> 01:22:26,471
So let me make copy--
1594
01:22:26,471 --> 01:22:27,881
oh my God, so many mistakes.
1595
01:22:27,881 --> 01:22:29,271
What did I do wrong?
1596
01:22:29,271 --> 01:22:30,221
Oh.
1597
01:22:30,221 --> 01:22:31,301
OK, that was unintended.
1598
01:22:31,301 --> 01:22:34,851
String t equals s, sorry, so I'm creating two variables,
1599
01:22:34,851 --> 01:22:37,781
s and t respectively, and I'm copying s into t.
1600
01:22:37,781 --> 01:22:39,461
Make copy, Enter.
1601
01:22:39,461 --> 01:22:44,651
There we go. ./copy, and let's now type in, for instance,
1602
01:22:44,651 --> 01:22:48,521
how about hi exclamation point in all lowercase this time,
1603
01:22:48,521 --> 01:22:52,091
and now what gets printed?
1604
01:22:52,091 --> 01:22:56,201
I don't think that's what I intended, so to speak, here.
1605
01:22:56,201 --> 01:23:00,021
Because notice that I got s from the user, so that checks out.
1606
01:23:00,021 --> 01:23:03,703
I then copied t into s, which looks correct.
1607
01:23:03,703 --> 01:23:05,411
That's what we always use assignment for.
1608
01:23:05,411 --> 01:23:09,191
Then I uppercase the first letter in t, but not s--
1609
01:23:09,191 --> 01:23:10,331
at least in my code--
1610
01:23:10,331 --> 01:23:14,051
then I printed s and t and then noticed, apparently, both s
1611
01:23:14,051 --> 01:23:17,921
and t got capitalized.
1612
01:23:17,921 --> 01:23:20,521
So if you're starting to get a little comfortable with what's
1613
01:23:20,521 --> 01:23:24,421
going on underneath the hood, what's the fundamental problem here?
1614
01:23:24,421 --> 01:23:28,223
Why did both get capitalized?
1615
01:23:28,223 --> 01:23:29,431
Why did both get capitalized?
1616
01:23:29,431 --> 01:23:30,121
Yeah, over here.
1617
01:23:30,121 --> 01:23:32,601
AUDIENCE: Could it be they're referencing the same address?
1618
01:23:32,601 --> 01:23:34,011
DAVID J. MALAN: Yeah, they're representing the same address.
1619
01:23:34,011 --> 01:23:35,871
So C is really literal.
1620
01:23:35,871 --> 01:23:39,261
If you create another variable called t and you assign it the value of s,
1621
01:23:39,261 --> 01:23:41,871
you are literally assigning it the value in s,
1622
01:23:41,871 --> 01:23:44,761
which is 0x123 or something like that.
1623
01:23:44,761 --> 01:23:48,381
And so at that point in the story both s and t presumably
1624
01:23:48,381 --> 01:23:51,951
have a value of 0x123, which means they technically
1625
01:23:51,951 --> 01:23:56,061
point to the same h-i exclamation point in memory.
1626
01:23:56,061 --> 01:24:00,891
Nowhere did I tell the computer to give me a copy of a h-i exclamation point
1627
01:24:00,891 --> 01:24:04,131
per se, I literally said just copy s.
1628
01:24:04,131 --> 01:24:08,391
So here's where an understanding of what s literally is explains the situation.
1629
01:24:08,391 --> 01:24:10,761
I'm only copying the pointers.
1630
01:24:10,761 --> 01:24:12,601
So what actually went on in memory?
1631
01:24:12,601 --> 01:24:14,241
Let's take a look here at this grid.
1632
01:24:14,241 --> 01:24:17,091
If I created s initially, maybe it ends up here.
1633
01:24:17,091 --> 01:24:20,601
And I created hi in lowercase, and it ended up down here.
1634
01:24:20,601 --> 01:24:26,751
Then the address was, again, like 0x123456, 0x123 is what's in s.
1635
01:24:26,751 --> 01:24:29,451
If then I create a second variable called t,
1636
01:24:29,451 --> 01:24:33,681
and I call it a string, a.k.a. char star, maybe it again ends up here.
1637
01:24:33,681 --> 01:24:39,261
But when I copy s into t by doing t equals s semicolon,
1638
01:24:39,261 --> 01:24:44,866
that literally just copies s into t, which puts the value 0x123 there.
1639
01:24:44,866 --> 01:24:47,991
So if we now abstract away all these numbers and just think about a picture
1640
01:24:47,991 --> 01:24:52,371
with arrows, what we've drawn in the computer's memory is this.
1641
01:24:52,371 --> 01:24:56,871
Two different pointers but storing the same address, which means
1642
01:24:56,871 --> 01:24:59,761
the breadcrumbs lead to the same place.
1643
01:24:59,761 --> 01:25:02,841
And so if you follow the t breadcrumb and capitalize the first letter,
1644
01:25:02,841 --> 01:25:06,831
it is functionally the same as copying the--
1645
01:25:06,831 --> 01:25:12,471
changing the first letter in the version s as well.
1646
01:25:12,471 --> 01:25:17,311
So what's the solution, then, to this kind of problem?
1647
01:25:17,311 --> 01:25:19,381
Even if you have no idea how to do it in code,
1648
01:25:19,381 --> 01:25:21,946
what's the gist of what I really intended, which is,
1649
01:25:21,946 --> 01:25:26,101
I want a genuine copy of s, called t.
1650
01:25:26,101 --> 01:25:30,213
I want a new h-i exclamation point backslash zero.
1651
01:25:30,213 --> 01:25:31,921
What do I need to do to make that happen?
1652
01:25:31,921 --> 01:25:32,888
Thoughts?
1653
01:25:32,888 --> 01:25:35,631
AUDIENCE: I think there's a function called str copy.
1654
01:25:35,631 --> 01:25:38,961
DAVID J. MALAN: So there is a function called str copy, strcpy,
1655
01:25:38,961 --> 01:25:41,511
which is a possible answer to this question.
1656
01:25:41,511 --> 01:25:45,681
The catch with stir copy is that you have to tell it in advance not only
1657
01:25:45,681 --> 01:25:48,231
what the source string is-- the one you want to copy--
1658
01:25:48,231 --> 01:25:50,961
you also need to pass in the address of a chunk of memory
1659
01:25:50,961 --> 01:25:55,551
into which you can copy the string, and here's one thing we haven't seen yet,
1660
01:25:55,551 --> 01:25:57,951
and we need one more building block today, if you will.
1661
01:25:57,951 --> 01:26:02,361
We haven't yet seen a way to create new chunks of memory
1662
01:26:02,361 --> 01:26:05,281
and then let some other function copy into them.
1663
01:26:05,281 --> 01:26:08,661
And for this, we're going to introduce something called dynamic memory
1664
01:26:08,661 --> 01:26:09,571
allocation.
1665
01:26:09,571 --> 01:26:12,291
And this is the last and most powerful feature perhaps, today,
1666
01:26:12,291 --> 01:26:16,251
whereby we're going to introduce two functions, malloc and free, where
1667
01:26:16,251 --> 01:26:19,491
malloc means memory allocate, which literally does just that.
1668
01:26:19,491 --> 01:26:22,641
It's a function that takes a number as input-- how many bytes of memory
1669
01:26:22,641 --> 01:26:26,034
do you want the operating system to find for you somewhere in that big grid?
1670
01:26:26,034 --> 01:26:27,951
It's going to find it and it's going to return
1671
01:26:27,951 --> 01:26:31,554
to you the address of the first byte of contiguous memory back to back to back,
1672
01:26:31,554 --> 01:26:34,221
and then you can do anything you want with that chunk of memory.
1673
01:26:34,221 --> 01:26:35,751
free is going to do the opposite.
1674
01:26:35,751 --> 01:26:38,571
When you're done using a chunk of memory that malloc has given you,
1675
01:26:38,571 --> 01:26:42,201
you can say free it, and that means you hand it back to the operating system
1676
01:26:42,201 --> 01:26:45,421
and then the operating system can use it for something else later.
1677
01:26:45,421 --> 01:26:48,861
So this is actually evidence of a common problem in programming.
1678
01:26:48,861 --> 01:26:53,311
If your Mac your PC has ever been in the habit of starting to get really,
1679
01:26:53,311 --> 01:26:57,921
really slow, or it's slowing to a crawl-- heck, maybe it even freezes--
1680
01:26:57,921 --> 01:27:00,921
one of the possible explanations could be
1681
01:27:00,921 --> 01:27:03,801
that the program you're running by Apple or Microsoft
1682
01:27:03,801 --> 01:27:07,041
or whoever, maybe they're using malloc or some equivalent,
1683
01:27:07,041 --> 01:27:08,346
asking the operating system--
1684
01:27:08,346 --> 01:27:10,221
Mac OS or Windows-- for, give me more memory.
1685
01:27:10,221 --> 01:27:11,001
I need more memory.
1686
01:27:11,001 --> 01:27:12,381
The user is creating more images.
1687
01:27:12,381 --> 01:27:13,821
The user is typing a longer essay.
1688
01:27:13,821 --> 01:27:15,441
Give me more memory, more memory.
1689
01:27:15,441 --> 01:27:20,001
If the program has a bug and never actually frees any of that memory,
1690
01:27:20,001 --> 01:27:22,701
your computer might end up using all of the available memory
1691
01:27:22,701 --> 01:27:26,571
and honestly, humans are not very good at handling corner cases like that.
1692
01:27:26,571 --> 01:27:29,451
Very often programs, computers just freeze at that point
1693
01:27:29,451 --> 01:27:33,591
or get really, really slow because they start trying to be creative
1694
01:27:33,591 --> 01:27:35,751
when there's not enough memory left.
1695
01:27:35,751 --> 01:27:38,361
So one of the reasons for a computer really slowing down
1696
01:27:38,361 --> 01:27:42,634
might be calling for malloc a lot, or some equivalent, but never freeing it.
1697
01:27:42,634 --> 01:27:45,051
Which is to say, you should always use these two functions
1698
01:27:45,051 --> 01:27:48,631
in concert and free memory once you are done with it.
1699
01:27:48,631 --> 01:27:52,761
So let me go ahead and do this in code and solve this problem properly.
1700
01:27:52,761 --> 01:27:54,801
Let me go ahead and do this.
1701
01:27:54,801 --> 01:27:58,491
Before I copy s into t using something like str copy,
1702
01:27:58,491 --> 01:28:01,126
I first need to get a bunch of memory from the computer.
1703
01:28:01,126 --> 01:28:04,251
So to do that, let's make this super clear that we're dealing with pointer,
1704
01:28:04,251 --> 01:28:07,821
so I'm going to change my strings to char stars for both s and t,
1705
01:28:07,821 --> 01:28:10,281
and what I technically am going to store in t
1706
01:28:10,281 --> 01:28:14,331
is the address of an available chunk of memory.
1707
01:28:14,331 --> 01:28:18,531
To do that, I can ask the computer to allocate memory for me,
1708
01:28:18,531 --> 01:28:19,941
and how many bytes.
1709
01:28:19,941 --> 01:28:23,181
If I want to create a copy of h-i exclamation point,
1710
01:28:23,181 --> 01:28:26,501
I need how many bytes?
1711
01:28:26,501 --> 01:28:27,001
Good!
1712
01:28:27,001 --> 01:28:27,631
Four!
1713
01:28:27,631 --> 01:28:31,891
Because I need the h, the i, the exclamation point, and additional space
1714
01:28:31,891 --> 01:28:33,001
for the backslash zero.
1715
01:28:33,001 --> 01:28:35,161
It's up to me to understand that and ask for it.
1716
01:28:35,161 --> 01:28:36,691
It's not going to happen magically.
1717
01:28:36,691 --> 01:28:40,601
Nothing does in C. So I could just naively type four there,
1718
01:28:40,601 --> 01:28:43,501
and that would be correct if I type in h-i exclamation
1719
01:28:43,501 --> 01:28:47,431
point or any other three letter word or phrase, but to do this dynamically
1720
01:28:47,431 --> 01:28:50,761
I should probably do something like strlen of s
1721
01:28:50,761 --> 01:28:54,331
plus 1 for the additional null character.
1722
01:28:54,331 --> 01:28:56,821
Recall that string length does it in the English sense--
1723
01:28:56,821 --> 01:29:00,991
it returns the length of the string you see, plus 1 also takes into account
1724
01:29:00,991 --> 01:29:03,241
the fact that I'm going to need that backslash n.
1725
01:29:03,241 --> 01:29:05,611
Now let me do this old school style first.
1726
01:29:05,611 --> 01:29:10,351
Let me go ahead and manually copy the string s into t first.
1727
01:29:10,351 --> 01:29:18,211
So for int i equals 0, i is less than the string length of s, i plus plus.
1728
01:29:18,211 --> 01:29:23,161
Then inside my for loop, I'm going to do t bracket i equals s bracket
1729
01:29:23,161 --> 01:29:27,211
i, but actually I want the null character too,
1730
01:29:27,211 --> 01:29:30,001
so I want to do the length of the string plus 1 more,
1731
01:29:30,001 --> 01:29:32,671
and heck, I think I learned an optimization last time.
1732
01:29:32,671 --> 01:29:35,131
If I'm doing this again and again, I could really
1733
01:29:35,131 --> 01:29:40,861
do n equals strlen of s plus 1 and then do i is less than n,
1734
01:29:40,861 --> 01:29:43,361
just as a nice design optimization.
1735
01:29:43,361 --> 01:29:46,531
I think this for loop will actually handle the process, then,
1736
01:29:46,531 --> 01:29:53,341
of copying every character from s into every available byte of memory in t.
1737
01:29:53,341 --> 01:29:56,671
Or I could get rid of all of that and take your suggestion, which
1738
01:29:56,671 --> 01:30:00,841
is to use str copy, which takes as its first argument the destination
1739
01:30:00,841 --> 01:30:03,301
and its second argument the source.
1740
01:30:03,301 --> 01:30:08,281
So copy from right to left in this case, too, that's going to do all of that
1741
01:30:08,281 --> 01:30:11,231
automatically for me as well.
1742
01:30:11,231 --> 01:30:13,421
Now I think I'm good.
1743
01:30:13,421 --> 01:30:15,401
I can now capitalize safely.
1744
01:30:15,401 --> 01:30:19,441
The first character in t, which is now a different chunk of memory
1745
01:30:19,441 --> 01:30:23,441
than s, and then I can print them both out to see that one has not changed
1746
01:30:23,441 --> 01:30:24,451
but the other has.
1747
01:30:24,451 --> 01:30:27,331
So make copy-- all right, what did I do wrong?
1748
01:30:27,331 --> 01:30:30,421
Implicitly declaring library function malloc dot, dot, dot.
1749
01:30:30,421 --> 01:30:33,061
So we've seen this kind of error before.
1750
01:30:33,061 --> 01:30:36,151
What is-- even if you don't know quite how to solve it,
1751
01:30:36,151 --> 01:30:37,681
what's the essence of the solution?
1752
01:30:37,681 --> 01:30:40,711
What do I need to do to fix this kind of problem involving implicitly
1753
01:30:40,711 --> 01:30:43,271
declaring a library function?
1754
01:30:43,271 --> 01:30:44,081
What did I forget?
1755
01:30:44,081 --> 01:30:46,211
Yeah.
1756
01:30:46,211 --> 01:30:47,561
I need to include the library.
1757
01:30:47,561 --> 01:30:51,551
And I could look this up in the manual, or I know it off the top of my head,
1758
01:30:51,551 --> 01:30:52,361
I just forgot it.
1759
01:30:52,361 --> 01:30:54,461
There's another library we'll occasionally
1760
01:30:54,461 --> 01:30:56,561
need now called standard lib--
1761
01:30:56,561 --> 01:31:00,671
standard library-- that contains malloc and free prototypes
1762
01:31:00,671 --> 01:31:02,021
and some other stuff, too.
1763
01:31:02,021 --> 01:31:05,061
All right, let me just clear this away and do make copy one more time.
1764
01:31:05,061 --> 01:31:10,961
Now I'm good. ./copy, Enter, All right. s, I'm going to type in hi, lowercase.
1765
01:31:10,961 --> 01:31:14,771
t and s now come back as intended.
1766
01:31:14,771 --> 01:31:19,961
s is untouched, it would seem, but t is now capitalized.
1767
01:31:19,961 --> 01:31:23,351
Are any questions, then, on what we just did in code?
1768
01:31:23,351 --> 01:31:25,172
Yeah.
1769
01:31:25,172 --> 01:31:28,581
AUDIENCE: You said that malloc and free go together.
1770
01:31:28,581 --> 01:31:32,093
[INAUDIBLE]
1771
01:31:32,093 --> 01:31:33,051
DAVID J. MALAN: Indeed.
1772
01:31:33,051 --> 01:31:35,093
There's a few improvements I want to make, so let
1773
01:31:35,093 --> 01:31:36,651
me actually do those right now.
1774
01:31:36,651 --> 01:31:39,681
Technically, I should practice what I preached and I should indeed,
1775
01:31:39,681 --> 01:31:42,098
when I'm done with t, free t.
1776
01:31:42,098 --> 01:31:44,181
Fortunately, I don't have to worry about how big t
1777
01:31:44,181 --> 01:31:47,691
was-- the computer remembers how many bytes it gave me and it will go free
1778
01:31:47,691 --> 01:31:49,371
all of them, not just the first.
1779
01:31:49,371 --> 01:31:51,081
I should do free t.
1780
01:31:51,081 --> 01:31:53,751
I don't need to do free s, and I shouldn't,
1781
01:31:53,751 --> 01:31:56,691
because that is handled automatically by the CS50 library.
1782
01:31:56,691 --> 01:31:59,091
s, recall, came from GetString, and we actually
1783
01:31:59,091 --> 01:32:01,469
have some fancy code in place that makes sure
1784
01:32:01,469 --> 01:32:03,261
that at the end of your program's execution
1785
01:32:03,261 --> 01:32:06,321
we free any memory that we allocated so we don't actually
1786
01:32:06,321 --> 01:32:08,256
waste memory like I described earlier.
1787
01:32:08,256 --> 01:32:10,131
But there's actually a couple of other things
1788
01:32:10,131 --> 01:32:12,631
if I really want to be pedantic I should put in here.
1789
01:32:12,631 --> 01:32:16,071
It turns out that sometimes malloc can fail,
1790
01:32:16,071 --> 01:32:18,809
and sometimes malloc doesn't have enough memory available
1791
01:32:18,809 --> 01:32:20,601
because maybe your computer's doing so much
1792
01:32:20,601 --> 01:32:22,701
stuff there's just no more RAM available.
1793
01:32:22,701 --> 01:32:24,981
So technically, I should do something like this--
1794
01:32:24,981 --> 01:32:29,541
if t equals equals null, with two L's today,
1795
01:32:29,541 --> 01:32:32,751
then I should just return 1 or something to say that there was a problem.
1796
01:32:32,751 --> 01:32:34,626
I should probably print an error message too,
1797
01:32:34,626 --> 01:32:36,301
but for now I'm going to keep it simple.
1798
01:32:36,301 --> 01:32:38,526
I should also probably check this.
1799
01:32:38,526 --> 01:32:40,851
This is a little risky of me.
1800
01:32:40,851 --> 01:32:45,511
If I'm doing t bracket zero, this is assuming that there is a letter there.
1801
01:32:45,511 --> 01:32:48,231
But what if the human just hit Enter at the prompt
1802
01:32:48,231 --> 01:32:51,391
and didn't even type h, let alone h-i exclamation point?
1803
01:32:51,391 --> 01:32:53,631
What if there is no t bracket zero?
1804
01:32:53,631 --> 01:32:59,181
So technically, what I should probably do here is, if the length of t
1805
01:32:59,181 --> 01:33:05,121
is at least greater than zero, then go ahead and safely capitalize
1806
01:33:05,121 --> 01:33:06,441
the first letter of it.
1807
01:33:06,441 --> 01:33:08,731
And then at the very end if all goes well,
1808
01:33:08,731 --> 01:33:12,841
I can return zero, thereby signifying that indeed, this thing was successful.
1809
01:33:12,841 --> 01:33:16,711
So yes, these two functions, malloc and free, should be in concert.
1810
01:33:16,711 --> 01:33:21,651
And so if you call malloc you should call free eventually.
1811
01:33:21,651 --> 01:33:27,256
But you did not call malloc for s, so you should not call free for s.
1812
01:33:27,256 --> 01:33:28,131
Yeah, other question.
1813
01:33:28,131 --> 01:33:29,298
AUDIENCE: Here's a question.
1814
01:33:29,298 --> 01:33:31,579
Why do we do malloc plus 1?
1815
01:33:31,579 --> 01:33:33,371
DAVID J. MALAN: Why did I do malloc plus 1?
1816
01:33:33,371 --> 01:33:36,281
So malloc-- sorry, malloc of string length of s
1817
01:33:36,281 --> 01:33:39,903
plus 1-- the string length is the literal length of the string as a human
1818
01:33:39,903 --> 01:33:41,111
would perceive it in English.
1819
01:33:41,111 --> 01:33:44,111
So h-i exclamation point-- strlen gives me 3,
1820
01:33:44,111 --> 01:33:47,801
but I know now as of last week and this week what a string technically is
1821
01:33:47,801 --> 01:33:49,751
and a string always has an extra byte.
1822
01:33:49,751 --> 01:33:52,301
The onus is on me to understand and apply
1823
01:33:52,301 --> 01:33:57,011
that lesson learned so that I actually give str copy enough room for that
1824
01:33:57,011 --> 01:33:58,631
trailing null character.
1825
01:33:58,631 --> 01:34:04,301
And here's just an annoying thing when we called the backslash zero N-U-L last
1826
01:34:04,301 --> 01:34:08,351
week, it turns out that N-U-L-L is the same idea.
1827
01:34:08,351 --> 01:34:11,531
It's also zero, but it's zero in the context of pointer.
1828
01:34:11,531 --> 01:34:15,761
So long story short, you never really write N-U-L, I've just said it
1829
01:34:15,761 --> 01:34:17,051
and we saw it on the screen.
1830
01:34:17,051 --> 01:34:22,631
You will start writing N-U-L-L when you want to check whether or not a pointer
1831
01:34:22,631 --> 01:34:23,681
is valid or not.
1832
01:34:23,681 --> 01:34:25,091
And what I mean by that is this.
1833
01:34:25,091 --> 01:34:27,971
If malloc fails and there's just not enough memory left inside
1834
01:34:27,971 --> 01:34:31,271
of the computer for you, it's got to return a special value,
1835
01:34:31,271 --> 01:34:35,201
and that special value is N-U-L-L in all capital letters.
1836
01:34:35,201 --> 01:34:36,821
That signifies something went wrong.
1837
01:34:36,821 --> 01:34:41,771
Do not trust that I'm giving you a useful return value.
1838
01:34:41,771 --> 01:34:45,391
Other questions on these copies thus far?
1839
01:34:45,391 --> 01:34:47,530
Yeah, over there.
1840
01:34:47,530 --> 01:34:51,481
AUDIENCE: [INAUDIBLE]
1841
01:34:51,481 --> 01:34:52,731
DAVID J. MALAN: Good question.
1842
01:34:52,731 --> 01:34:54,621
Will str copy not work without malloc?
1843
01:34:54,621 --> 01:34:57,891
You kind of need both in this case because str copy,
1844
01:34:57,891 --> 01:35:01,281
by definition-- if I pull up its manual page-- needs a destination
1845
01:35:01,281 --> 01:35:03,261
to put the copied characters.
1846
01:35:03,261 --> 01:35:06,321
It's not sufficient just to say char star t semicolon.
1847
01:35:06,321 --> 01:35:07,761
That only gives you a pointer.
1848
01:35:07,761 --> 01:35:10,701
But I need another chunk of memory that's
1849
01:35:10,701 --> 01:35:14,811
just as big as h-i exclamation point backslash zero,
1850
01:35:14,811 --> 01:35:17,271
so malloc gives me a whole bunch of memory
1851
01:35:17,271 --> 01:35:21,561
and then str copy fills it with h-i exclamation point backslash zero.
1852
01:35:21,561 --> 01:35:24,021
So again, that's why we're going down to this lower level,
1853
01:35:24,021 --> 01:35:26,063
because once you understand what needs to be done
1854
01:35:26,063 --> 01:35:27,931
you now have the functions to do it.
1855
01:35:27,931 --> 01:35:29,971
So let's actually consider what we just solved.
1856
01:35:29,971 --> 01:35:33,831
So in this next version of the program where I actually introduced malloc,
1857
01:35:33,831 --> 01:35:37,341
t was initialized for the return value of malloc,
1858
01:35:37,341 --> 01:35:39,381
and maybe the memory that I got back was here--
1859
01:35:39,381 --> 01:35:42,981
0x456457458459.
1860
01:35:42,981 --> 01:35:45,291
I've left it blank initially because nothing
1861
01:35:45,291 --> 01:35:47,001
is put there automatically by malloc.
1862
01:35:47,001 --> 01:35:51,111
I just get a chunk of memory that is now mine to use as I see fit.
1863
01:35:51,111 --> 01:35:56,031
I then assign t to that return value, which points t at the first address.
1864
01:35:56,031 --> 01:35:57,861
Notice there's no backslash zero.
1865
01:35:57,861 --> 01:36:00,741
This is not yet a string it's just a chunk of memory--
1866
01:36:00,741 --> 01:36:02,871
four bytes-- an array of four bytes.
1867
01:36:02,871 --> 01:36:06,441
What str copy eventually did for me was it copied the h over,
1868
01:36:06,441 --> 01:36:10,671
the i over, the exclamation point over, and the backslash zero.
1869
01:36:10,671 --> 01:36:14,541
And if I didn't want to use str copy or I forgot that it existed, my for loop
1870
01:36:14,541 --> 01:36:18,701
would have done exactly the same thing.
1871
01:36:18,701 --> 01:36:23,818
Are any questions, then, on these examples here.
1872
01:36:23,818 --> 01:36:24,401
Any questions?
1873
01:36:24,401 --> 01:36:26,144
Yeah.
1874
01:36:26,144 --> 01:36:33,131
AUDIENCE: [INAUDIBLE]
1875
01:36:33,131 --> 01:36:34,381
DAVID J. MALAN: Good question.
1876
01:36:34,381 --> 01:36:38,731
After malloc, if I had then still done just t equals s,
1877
01:36:38,731 --> 01:36:41,851
it actually would have recreated the same original problem
1878
01:36:41,851 --> 01:36:45,571
by just copying 0x123 from s into t.
1879
01:36:45,571 --> 01:36:48,751
So then I would have been left with a picture that looked like this a few
1880
01:36:48,751 --> 01:36:52,711
steps ago, I would have-- and I can't quite do it live--
1881
01:36:52,711 --> 01:36:55,021
this arrow, if I did what you just described,
1882
01:36:55,021 --> 01:36:58,998
would now be pointing over here and so I wouldn't have fundamentally solved
1883
01:36:58,998 --> 01:37:01,081
the problem, I would have just additionally wasted
1884
01:37:01,081 --> 01:37:04,141
four bytes temporarily that I'm not actually using.
1885
01:37:04,141 --> 01:37:05,983
Yeah.
1886
01:37:05,983 --> 01:37:09,781
AUDIENCE: [INAUDIBLE]
1887
01:37:09,781 --> 01:37:10,861
DAVID J. MALAN: You can--
1888
01:37:10,861 --> 01:37:12,819
do you always use malloc and str copy together?
1889
01:37:12,819 --> 01:37:13,594
Not necessarily.
1890
01:37:13,594 --> 01:37:15,511
These are both solving two different problems.
1891
01:37:15,511 --> 01:37:19,771
malloc's giving me enough memory to make a copy, str copy is doing the copy.
1892
01:37:19,771 --> 01:37:23,581
However, you could actually use an array, if you wanted, of characters,
1893
01:37:23,581 --> 01:37:26,911
and you could use str copy on that, and there's other use cases for str copy.
1894
01:37:26,911 --> 01:37:29,071
But thus far, it's a reasonable mental model
1895
01:37:29,071 --> 01:37:31,291
to have that if you want to copy strings,
1896
01:37:31,291 --> 01:37:34,921
you use malloc and then str copy, or your own homegrown loop.
1897
01:37:34,921 --> 01:37:36,844
Yeah.
1898
01:37:36,844 --> 01:37:47,171
AUDIENCE: [INAUDIBLE]
1899
01:37:47,171 --> 01:37:49,370
DAVID J. MALAN: Say that once more.
1900
01:37:49,370 --> 01:37:54,579
AUDIENCE: [INAUDIBLE]
1901
01:37:54,579 --> 01:37:55,371
DAVID J. MALAN: No.
1902
01:37:55,371 --> 01:37:57,031
It will-- good question.
1903
01:37:57,031 --> 01:38:00,171
If I had a--
1904
01:38:00,171 --> 01:38:03,441
str copy, per its documentation, will copy the whole string
1905
01:38:03,441 --> 01:38:05,661
plus the null character at the end.
1906
01:38:05,661 --> 01:38:08,121
It just assumes there will be one there.
1907
01:38:08,121 --> 01:38:12,291
It's therefore up to you to pass str copy a long enough chunk of memory
1908
01:38:12,291 --> 01:38:13,281
to have room for that.
1909
01:38:13,281 --> 01:38:15,471
If I only ask malloc for three bytes, that
1910
01:38:15,471 --> 01:38:17,541
could have potentially created a memory problem
1911
01:38:17,541 --> 01:38:20,901
whereby str copy would just still blindly copy one, two, three,
1912
01:38:20,901 --> 01:38:24,441
four bytes, but technically it should have only touched three of those.
1913
01:38:24,441 --> 01:38:27,291
You do not yet have access to the fourth one, or the rights to it,
1914
01:38:27,291 --> 01:38:29,541
because you never asked malloc for it.
1915
01:38:29,541 --> 01:38:31,461
Yeah.
1916
01:38:31,461 --> 01:38:34,461
AUDIENCE: So the number inside malloc would be the number of bytes.
1917
01:38:34,461 --> 01:38:34,821
DAVID J. MALAN: Correct.
1918
01:38:34,821 --> 01:38:36,696
The number inside malloc-- it's one argument.
1919
01:38:36,696 --> 01:38:39,723
It's the number of bytes you want back.
1920
01:38:39,723 --> 01:38:43,041
AUDIENCE: Does that mean you have to remember [INAUDIBLE]??
1921
01:38:45,798 --> 01:38:48,131
DAVID J. MALAN: Yes, the onus is on you, the programmer,
1922
01:38:48,131 --> 01:38:50,298
to remember or frankly, use a function to figure out
1923
01:38:50,298 --> 01:38:51,821
how many bytes you actually need.
1924
01:38:51,821 --> 01:38:54,671
That's why I did not ultimately type in four manually,
1925
01:38:54,671 --> 01:38:56,441
I used str length plus 1.
1926
01:38:56,441 --> 01:38:59,831
So the plus 1 is necessary if you understand how strings are represented,
1927
01:38:59,831 --> 01:39:02,471
but using strlen means that I can actually
1928
01:39:02,471 --> 01:39:05,651
play around with any types of inputs and it will dynamically
1929
01:39:05,651 --> 01:39:07,541
figure out the length.
1930
01:39:07,541 --> 01:39:09,821
So suffice it to say, there's so many ways
1931
01:39:09,821 --> 01:39:11,931
already where you can start to break programs.
1932
01:39:11,931 --> 01:39:15,386
Let's give you at least one tool for finding mistakes that you might make.
1933
01:39:15,386 --> 01:39:17,261
And indeed, in upcoming problem sets you will
1934
01:39:17,261 --> 01:39:19,361
use this to find bugs in your own code.
1935
01:39:19,361 --> 01:39:22,991
Not just using printf, not just using the built-in debugger, but another tool
1936
01:39:22,991 --> 01:39:24,201
here as well.
1937
01:39:24,201 --> 01:39:27,371
So let me go ahead and deliberately write a program called memory.c
1938
01:39:27,371 --> 01:39:29,511
that has some memory-related errors.
1939
01:39:29,511 --> 01:39:34,901
Let me include stdio.h at the top and let me include stdlib.h at the top
1940
01:39:34,901 --> 01:39:36,551
so I have access to malloc now.
1941
01:39:36,551 --> 01:39:41,171
Let me do int main(void) and then inside of main, let me do this--
1942
01:39:41,171 --> 01:39:44,351
I want to allocate maybe how about three--
1943
01:39:44,351 --> 01:39:45,711
space for three integers.
1944
01:39:45,711 --> 01:39:46,211
Why?
1945
01:39:46,211 --> 01:39:48,191
Just for the sake of discussion.
1946
01:39:48,191 --> 01:39:52,721
So I'm going to go ahead and do malloc of three, but I don't want three bytes.
1947
01:39:52,721 --> 01:39:56,008
I want three integers and an integer is four bytes,
1948
01:39:56,008 --> 01:39:57,341
so technically I could do this--
1949
01:39:57,341 --> 01:40:01,851
3 times 4, or I could do 12 but again, that's making certain assumptions
1950
01:40:01,851 --> 01:40:04,341
and if I run this program on a slightly different computer,
1951
01:40:04,341 --> 01:40:05,861
int might be a different size.
1952
01:40:05,861 --> 01:40:10,321
so the better way to do this would be 3 times whatever the size is of an int.
1953
01:40:10,321 --> 01:40:13,571
And this is just an operator you can use any time if you just want to find out
1954
01:40:13,571 --> 01:40:15,611
on this computer, how big is an int?
1955
01:40:15,611 --> 01:40:18,291
How big is a float, or something else?
1956
01:40:18,291 --> 01:40:20,411
So that's going to give me that many--
1957
01:40:20,411 --> 01:40:22,811
that much memory for three ints.
1958
01:40:22,811 --> 01:40:24,821
What do I want to assign this to?
1959
01:40:24,821 --> 01:40:27,011
Well, malloc returns an address.
1960
01:40:27,011 --> 01:40:32,291
Pointers are addresses, so I'm going to create a pointer to an int called
1961
01:40:32,291 --> 01:40:34,521
x and assign it the value.
1962
01:40:34,521 --> 01:40:35,741
So what am I doing here?
1963
01:40:35,741 --> 01:40:38,321
This is a little less obvious, but again go back to basics.
1964
01:40:38,321 --> 01:40:43,091
The right hand side here gives me a chunk of memory for three integers.
1965
01:40:43,091 --> 01:40:46,661
malloc returns the address of the first byte of that chunk.
1966
01:40:46,661 --> 01:40:48,791
How do I store the address of anything?
1967
01:40:48,791 --> 01:40:49,691
I need a pointer.
1968
01:40:49,691 --> 01:40:53,561
The syntax for today is type of data, star,
1969
01:40:53,561 --> 01:40:58,631
where the type of data in question is three ints, so I do int star x.
1970
01:40:58,631 --> 01:41:02,531
Again, it's kind of purposeless, only for sort of instructional purposes
1971
01:41:02,531 --> 01:41:07,901
here, but this is equivalent now to having a chunk of memory of size 12
1972
01:41:07,901 --> 01:41:11,351
in total, presumably, so I can technically now do this.
1973
01:41:11,351 --> 01:41:15,491
I can go into maybe the first location and assign it the number 72
1974
01:41:15,491 --> 01:41:16,911
like the other day.
1975
01:41:16,911 --> 01:41:24,701
Second location, the number 73, and the third location, maybe the number 33.
1976
01:41:24,701 --> 01:41:27,551
Now I've deliberately made two mistakes here
1977
01:41:27,551 --> 01:41:30,701
because I'm trying to trip over my newfound understanding,
1978
01:41:30,701 --> 01:41:33,281
or my greenness with understanding pointers.
1979
01:41:33,281 --> 01:41:36,641
One, I didn't remember that I should be treating chunks of memory
1980
01:41:36,641 --> 01:41:37,751
as zero indexed.
1981
01:41:37,751 --> 01:41:41,141
malloc essentially returns an array, if you want to think of it as that.
1982
01:41:41,141 --> 01:41:43,541
An array of three ints, or more technically,
1983
01:41:43,541 --> 01:41:47,381
the address of a chunk of memory that could fit three ints.
1984
01:41:47,381 --> 01:41:50,681
So I can use my square bracket notation, or I could be really cool
1985
01:41:50,681 --> 01:41:53,631
and use pointer arithmetic, but this is a little more user friendly.
1986
01:41:53,631 --> 01:41:55,481
But I have made two mistakes.
1987
01:41:55,481 --> 01:41:59,081
I did not start indexing at zero, so line seven
1988
01:41:59,081 --> 01:42:00,941
should have been x bracket zero.
1989
01:42:00,941 --> 01:42:03,813
Line eight should have been x bracket 1, and then line nine
1990
01:42:03,813 --> 01:42:05,021
should have been x bracket 2.
1991
01:42:05,021 --> 01:42:06,231
So first mistake.
1992
01:42:06,231 --> 01:42:09,161
The second mistake that I've made as a side effect,
1993
01:42:09,161 --> 01:42:12,221
is I'm also touching memory that I shouldn't.
1994
01:42:12,221 --> 01:42:17,171
x bracket 3 would mean go to the fourth int in the chunk of memory
1995
01:42:17,171 --> 01:42:17,981
that came back.
1996
01:42:17,981 --> 01:42:20,501
I only asked for enough memory for three ints,
1997
01:42:20,501 --> 01:42:23,741
not four, so this is what's called a buffer overflow.
1998
01:42:23,741 --> 01:42:26,831
I am accidentally, but deliberately at the moment,
1999
01:42:26,831 --> 01:42:30,951
going beyond the boundaries of this array, this chunk of memory.
2000
01:42:30,951 --> 01:42:33,311
So bad things happen, but not necessarily
2001
01:42:33,311 --> 01:42:34,641
by just running your program.
2002
01:42:34,641 --> 01:42:36,191
Let me go ahead and just try this.
2003
01:42:36,191 --> 01:42:42,011
Make memory, and you'll see here that it compiles OK. ./memory,
2004
01:42:42,011 --> 01:42:44,139
and it actually does not segmentation fault,
2005
01:42:44,139 --> 01:42:46,181
which comes back to that point of nondeterminism.
2006
01:42:46,181 --> 01:42:48,551
Sometimes it does, sometimes it doesn't-- it depends on how bad
2007
01:42:48,551 --> 01:42:49,691
of a mistake you made.
2008
01:42:49,691 --> 01:42:52,858
But there's a program that can spot these kinds of mistakes,
2009
01:42:52,858 --> 01:42:55,691
and I'm going to go ahead and expand my terminal window for a moment
2010
01:42:55,691 --> 01:43:01,151
and I'm going to run not just ./memory, but a program called Valgrind./memory.
2011
01:43:01,151 --> 01:43:04,001
This is a command that comes with a lot of computer systems
2012
01:43:04,001 --> 01:43:07,071
that's designed to find memory-related bugs in code.
2013
01:43:07,071 --> 01:43:09,011
So it's a new tool in your toolkit today,
2014
01:43:09,011 --> 01:43:11,111
and you'll use it with the coming problem sets.
2015
01:43:11,111 --> 01:43:12,311
I'm going to run this now.
2016
01:43:12,311 --> 01:43:14,591
It's output, honestly, it's hideous.
2017
01:43:14,591 --> 01:43:17,981
But there's a few things that will start to jump out
2018
01:43:17,981 --> 01:43:20,381
and will help you with tools and the problems
2019
01:43:20,381 --> 01:43:21,951
sets to see these kinds of things.
2020
01:43:21,951 --> 01:43:23,531
Here's the first mistake.
2021
01:43:23,531 --> 01:43:26,471
Invalid write of size four.
2022
01:43:26,471 --> 01:43:30,461
That's on memory.c line nine, per my highlights.
2023
01:43:30,461 --> 01:43:32,351
So let me go look at line nine.
2024
01:43:32,351 --> 01:43:36,011
In what sense is this an invalid write of size four?
2025
01:43:36,011 --> 01:43:38,591
Well, I'm touching memory that I shouldn't, and I'm
2026
01:43:38,591 --> 01:43:40,061
touching it as though it's an int.
2027
01:43:40,061 --> 01:43:42,551
And an int is four bytes-- size four.
2028
01:43:42,551 --> 01:43:45,831
So again, this takes some practice to get used to, the nomenclature here,
2029
01:43:45,831 --> 01:43:48,771
but this is now a clue for me, the programmer,
2030
01:43:48,771 --> 01:43:52,231
that not only did I screw up, but I screwed up related to memory
2031
01:43:52,231 --> 01:43:54,749
and so this is just a hint, if you will.
2032
01:43:54,749 --> 01:43:57,291
It's not going to necessarily tell you exactly how to fix it,
2033
01:43:57,291 --> 01:44:01,131
you have to wrestle with the semantics, but invalid
2034
01:44:01,131 --> 01:44:02,961
write of size four-- oh, OK.
2035
01:44:02,961 --> 01:44:07,321
So I should not have indexed past the boundary here.
2036
01:44:07,321 --> 01:44:10,021
All right, so I shouldn't have done that.
2037
01:44:10,021 --> 01:44:15,764
So let me go ahead then and change this to zero, one, and two, perhaps, here.
2038
01:44:15,764 --> 01:44:17,931
All right, so let me go ahead and recompile my code.
2039
01:44:17,931 --> 01:44:24,261
Make memory, ./memory, still doesn't seem to be broken but it is technically
2040
01:44:24,261 --> 01:44:24,891
buggy.
2041
01:44:24,891 --> 01:44:31,101
Let me go ahead and run Valgrind again, so Valgrind of ./memory, Enter.
2042
01:44:31,101 --> 01:44:33,321
And now there's fewer scary--
2043
01:44:33,321 --> 01:44:36,841
less scary output now, but there's still something in there.
2044
01:44:36,841 --> 01:44:40,368
Notice this-- 12 bytes in one blocks--
2045
01:44:40,368 --> 01:44:42,201
no regard for grammar there-- are definitely
2046
01:44:42,201 --> 01:44:43,971
lost in lost record one of one.
2047
01:44:43,971 --> 01:44:47,611
Super cryptic, but this is hinting at a so-called memory leak.
2048
01:44:47,611 --> 01:44:51,441
The blocks of memory are lost in the sense that I malloc'd them--
2049
01:44:51,441 --> 01:44:52,881
I asked for them but I never--
2050
01:44:52,881 --> 01:44:55,071
take a guess-- freed them.
2051
01:44:55,071 --> 01:44:56,008
I have a memory leak.
2052
01:44:56,008 --> 01:44:58,341
And this is the arcane way of saying, you've screwed up.
2053
01:44:58,341 --> 01:44:59,551
You have a memory leak.
2054
01:44:59,551 --> 01:45:01,821
So this is an easy fix, fortunately.
2055
01:45:01,821 --> 01:45:06,211
Once I'm done with this memory I just need to free it at the end.
2056
01:45:06,211 --> 01:45:08,631
So now let me go ahead and rerun make memory,
2057
01:45:08,631 --> 01:45:12,441
it's still runs fine so all the while I might have thought, incorrectly,
2058
01:45:12,441 --> 01:45:13,581
my code is correct.
2059
01:45:13,581 --> 01:45:15,261
But let me run Valgrind one more time.
2060
01:45:15,261 --> 01:45:17,451
Valgrin of ./memory, Enter.
2061
01:45:17,451 --> 01:45:19,341
Now, this is pretty good.
2062
01:45:19,341 --> 01:45:21,531
All heap blocks were freed, whatever that means.
2063
01:45:21,531 --> 01:45:23,371
No leaks are possible.
2064
01:45:23,371 --> 01:45:26,481
And even though it's still a little cryptic, there's no other error here
2065
01:45:26,481 --> 01:45:29,985
and in fact, it's pretty explicit-- error summary, zero errors from zero
2066
01:45:29,985 --> 01:45:31,641
contexts, dot, dot, dot.
2067
01:45:31,641 --> 01:45:34,831
So even though this is one of the most arcane tools we'll use,
2068
01:45:34,831 --> 01:45:37,341
it's also one of the most powerful because it can see things
2069
01:45:37,341 --> 01:45:40,671
that you, the human, might not, and maybe even that the debugger might not.
2070
01:45:40,671 --> 01:45:42,741
It does a much closer reading of your code
2071
01:45:42,741 --> 01:45:48,501
while it's running to figure out exactly what is going on.
2072
01:45:48,501 --> 01:45:50,781
Any questions, then, on this tool?
2073
01:45:50,781 --> 01:45:54,681
And we'll guide you after today with actually using this, too.
2074
01:45:54,681 --> 01:45:57,201
Just helps you find memory-related mistakes
2075
01:45:57,201 --> 01:46:00,021
that you might now be capable of making.
2076
01:46:00,021 --> 01:46:02,181
All right, let's do one other memory-related thing.
2077
01:46:02,181 --> 01:46:04,171
Let me shrink my terminal window here.
2078
01:46:04,171 --> 01:46:07,911
Let me create one other file here called garbage.c.
2079
01:46:07,911 --> 01:46:11,421
It turns out there's a term of ours called garbage values in programming
2080
01:46:11,421 --> 01:46:12,931
that we can reveal as follows.
2081
01:46:12,931 --> 01:46:15,921
Let me include stdio.h, and let me include--
2082
01:46:15,921 --> 01:46:19,461
how about stdlib.h, and then let me give myself int
2083
01:46:19,461 --> 01:46:22,561
main(void), and then in this relatively short program
2084
01:46:22,561 --> 01:46:25,461
let me give myself three ints using last week's
2085
01:46:25,461 --> 01:46:29,421
notation, just int scores bracket 3 for 3 quiz scores, or whatever.
2086
01:46:29,421 --> 01:46:33,441
Then let me go ahead and do for int i equals zero, i less than 3,
2087
01:46:33,441 --> 01:46:38,691
i plus plus, then let me go ahead and print out, %i backslash n,
2088
01:46:38,691 --> 01:46:40,911
scores bracket i semicolon.
2089
01:46:40,911 --> 01:46:43,491
That's it.
2090
01:46:43,491 --> 01:46:48,781
This code, pretty sure is going to compile and it's going to run,
2091
01:46:48,781 --> 01:46:51,171
but what is my logical bug?
2092
01:46:51,171 --> 01:46:55,701
I've forgotten a step even though the code that's written is not so wrong.
2093
01:46:55,701 --> 01:46:58,431
Yeah?
2094
01:46:58,431 --> 01:47:00,921
Yeah, I didn't provide the scores, so I didn't actually
2095
01:47:00,921 --> 01:47:04,851
initialize the array called scores to have any scores whatsoever.
2096
01:47:04,851 --> 01:47:08,391
What's curious about this, though, is that the computer technically
2097
01:47:08,391 --> 01:47:09,081
doesn't mind.
2098
01:47:09,081 --> 01:47:13,041
Let me go ahead and playfully make garbage, Enter,
2099
01:47:13,041 --> 01:47:15,621
and it's an apt description because what I'm about to see
2100
01:47:15,621 --> 01:47:18,231
are so-called garbage values.
2101
01:47:18,231 --> 01:47:23,061
When you, the programmer, do not initialize your codes variables to have
2102
01:47:23,061 --> 01:47:25,878
values, sometimes, who knows what's going to be there.
2103
01:47:25,878 --> 01:47:27,711
The computer's been doing some other things,
2104
01:47:27,711 --> 01:47:31,161
there's a bit of work that happens even before your code runs in the computer,
2105
01:47:31,161 --> 01:47:34,401
so there might be remnants of past ints, chars, strings,
2106
01:47:34,401 --> 01:47:37,041
floats-- anything else in there and what you're seeing
2107
01:47:37,041 --> 01:47:42,661
is those garbage values, which is to say you should never forget,
2108
01:47:42,661 --> 01:47:45,601
as I just did, to initialize the value of some variable.
2109
01:47:45,601 --> 01:47:47,601
And this is actually pretty dangerous, and there
2110
01:47:47,601 --> 01:47:51,081
have been many examples of software being compromised
2111
01:47:51,081 --> 01:47:54,261
because of one of these issues where a variable wasn't initialized
2112
01:47:54,261 --> 01:47:58,611
and all of a sudden users, maybe people on the internet in the context of web
2113
01:47:58,611 --> 01:48:02,481
applications, could suddenly see the contents of someone else's memory,
2114
01:48:02,481 --> 01:48:03,591
or remnants.
2115
01:48:03,591 --> 01:48:06,051
Maybe someone's password that had been previously typed in
2116
01:48:06,051 --> 01:48:08,031
or some other value like a credit card number
2117
01:48:08,031 --> 01:48:09,591
that had been previously typed in.
2118
01:48:09,591 --> 01:48:11,571
There are different defense mechanisms in place
2119
01:48:11,571 --> 01:48:15,111
to generally make this not so likely, but it's certainly
2120
01:48:15,111 --> 01:48:18,171
very possible, at least in this kind of context,
2121
01:48:18,171 --> 01:48:22,101
to see values that you probably shouldn't because they
2122
01:48:22,101 --> 01:48:25,621
might be remnants from something else that used them.
2123
01:48:25,621 --> 01:48:29,701
So this is to say again, you have this great power now to manipulate memory,
2124
01:48:29,701 --> 01:48:33,021
but also now you have this great hacking ability to poke around
2125
01:48:33,021 --> 01:48:36,441
the contents of memory, and this is exactly what hackers sometimes do when
2126
01:48:36,441 --> 01:48:40,431
trying to find ways to exploit systems.
2127
01:48:40,431 --> 01:48:41,661
Are any questions here?
2128
01:48:44,571 --> 01:48:45,071
No?
2129
01:48:45,071 --> 01:48:47,111
All right, let's go ahead and take a quick five minute break
2130
01:48:47,111 --> 01:48:49,511
and when we come back, we'll build on these final topics.
2131
01:48:49,511 --> 01:48:50,381
See you in five.
2132
01:48:50,381 --> 01:48:51,671
We are back.
2133
01:48:51,671 --> 01:48:55,481
First, just a little programmer humor from XKCD, which hopefully now
2134
01:48:55,481 --> 01:48:57,851
will make a little bit of sense to you.
2135
01:48:57,851 --> 01:49:02,321
And what we'll also do next to take a look at a short two minute video that
2136
01:49:02,321 --> 01:49:05,501
animates with claymation, if you will, from our friends at Stanford,
2137
01:49:05,501 --> 01:49:08,501
exactly what happens now if you have an understanding of what garbage
2138
01:49:08,501 --> 01:49:12,004
values are and how they get there, and what happens then if you misuse them.
2139
01:49:12,004 --> 01:49:14,171
It's one thing just to print them out as I just did,
2140
01:49:14,171 --> 01:49:18,431
it's another if you actually mistake a garbage value for a valid pointer,
2141
01:49:18,431 --> 01:49:21,881
because garbage values are just zeros and ones somewhere-- numbers, that is.
2142
01:49:21,881 --> 01:49:24,761
But if you use that new dereference operator, the star,
2143
01:49:24,761 --> 01:49:29,111
and try to go to a garbage value thinking incorrectly that it's
2144
01:49:29,111 --> 01:49:31,511
a valid pointer, bad things can happen.
2145
01:49:31,511 --> 01:49:36,431
Computers can crash or more familiarly, segmentation faults can happen.
2146
01:49:36,431 --> 01:49:39,401
So allow me to introduce, if we could dim the lights for two minutes,
2147
01:49:39,401 --> 01:49:41,111
our friend Binky from Stanford.
2148
01:49:44,951 --> 01:49:46,541
SPEAKER 1: Hey Binky, wake up.
2149
01:49:46,541 --> 01:49:49,221
It's time for pointer fun.
2150
01:49:49,221 --> 01:49:50,331
BINKY: What's that?
2151
01:49:50,331 --> 01:49:51,921
Learn about pointers?
2152
01:49:51,921 --> 01:49:53,184
Oh, goody!
2153
01:49:53,184 --> 01:49:55,101
SPEAKER 1: Well, to get started, I guess we're
2154
01:49:55,101 --> 01:49:56,721
going to need a couple of pointers.
2155
01:49:56,721 --> 01:50:00,998
BINKY: OK, this code allocates two pointers which can point to integers.
2156
01:50:00,998 --> 01:50:01,581
SPEAKER 1: OK.
2157
01:50:01,581 --> 01:50:05,188
Well, I see the two pointers, but they don't seem to be pointing to anything.
2158
01:50:05,188 --> 01:50:06,021
BINKY: That's right.
2159
01:50:06,021 --> 01:50:08,151
Initially, pointers don't point to anything.
2160
01:50:08,151 --> 01:50:11,181
The things they point to are called pointees, and setting them up
2161
01:50:11,181 --> 01:50:12,174
is a separate step.
2162
01:50:12,174 --> 01:50:13,341
SPEAKER 1: Oh, right, right.
2163
01:50:13,341 --> 01:50:14,031
I knew that.
2164
01:50:14,031 --> 01:50:16,021
The pointees are separate.
2165
01:50:16,021 --> 01:50:18,351
So how do you allocate a pointee?
2166
01:50:18,351 --> 01:50:21,921
BINKY: OK, well this code allocates a new integer pointee,
2167
01:50:21,921 --> 01:50:24,994
and this part sets x to point to it.
2168
01:50:24,994 --> 01:50:26,411
SPEAKER 1: Hey, that looks better.
2169
01:50:26,411 --> 01:50:28,021
So make it do something.
2170
01:50:28,021 --> 01:50:31,411
BINKY: OK, I'll dereference the pointer x to store the number
2171
01:50:31,411 --> 01:50:33,541
42 into its pointee.
2172
01:50:33,541 --> 01:50:37,201
For this trick, I'll need my magic wand of dereferencing.
2173
01:50:37,201 --> 01:50:40,591
SPEAKER 1: Your magic wand of dereferencing?
2174
01:50:40,591 --> 01:50:42,441
That great.
2175
01:50:42,441 --> 01:50:44,151
BINKY: This is what the code looks like.
2176
01:50:44,151 --> 01:50:46,946
I'll just set up the number and--
2177
01:50:46,946 --> 01:50:47,821
SPEAKER 1: Hey, look.
2178
01:50:47,821 --> 01:50:49,171
There it goes.
2179
01:50:49,171 --> 01:50:54,091
So doing a dereference on x follows the arrow to access its pointee,
2180
01:50:54,091 --> 01:50:56,131
in this case to store 42 in there.
2181
01:50:56,131 --> 01:51:00,751
Hey, try using it to store the number 13 through the other pointer, y.
2182
01:51:00,751 --> 01:51:01,891
BINKY: OK.
2183
01:51:01,891 --> 01:51:06,271
I'll just go over here to y and get the number 13 set up,
2184
01:51:06,271 --> 01:51:10,801
and then take the wand of dereferencing and just--
2185
01:51:10,801 --> 01:51:11,881
whoa!
2186
01:51:11,881 --> 01:51:14,101
SPEAKER 1: Oh hey, that didn't work.
2187
01:51:14,101 --> 01:51:17,821
Say, Binky, I don't think dereferencing y is a good idea
2188
01:51:17,821 --> 01:51:21,016
because setting up the pointee is a separate step
2189
01:51:21,016 --> 01:51:23,551
and I don't think we ever did it.
2190
01:51:23,551 --> 01:51:24,601
BINKY: Good point.
2191
01:51:24,601 --> 01:51:27,031
SPEAKER 1: Yeah, we allocated the pointer y,
2192
01:51:27,031 --> 01:51:30,271
but we never set it to point to a pointee.
2193
01:51:30,271 --> 01:51:31,439
BINKY: Very observant.
2194
01:51:31,439 --> 01:51:33,481
SPEAKER 1: Hey, you're looking good there, Binky.
2195
01:51:33,481 --> 01:51:36,361
Can you fix it so that y points to the same pointee as x?
2196
01:51:36,361 --> 01:51:39,721
BINKY: Sure, I'll use my magic wand of pointer assignment.
2197
01:51:39,721 --> 01:51:41,971
SPEAKER 1: Is that going to be a problem, like before?
2198
01:51:41,971 --> 01:51:43,861
BINKY: No, this doesn't touch the pointees,
2199
01:51:43,861 --> 01:51:47,491
it just changes one pointer to point to the same thing as another.
2200
01:51:47,491 --> 01:51:48,511
SPEAKER 1: Oh, I see.
2201
01:51:48,511 --> 01:51:51,181
Now y points to the same place as x.
2202
01:51:51,181 --> 01:51:53,071
So wait, now y is fixed.
2203
01:51:53,071 --> 01:51:56,131
It has a pointee so you can try the wand of dereferencing again
2204
01:51:56,131 --> 01:51:58,741
to send the 13 over.
2205
01:51:58,741 --> 01:52:01,073
BINKY: OK, here it goes.
2206
01:52:01,073 --> 01:52:02,281
SPEAKER 1: Hey, look at that.
2207
01:52:02,281 --> 01:52:04,111
Now dereferencing works on y.
2208
01:52:04,111 --> 01:52:08,161
And because the pointers are sharing that one pointee, they both see the 13.
2209
01:52:08,161 --> 01:52:09,301
BINKY: Yeah, sharing.
2210
01:52:09,301 --> 01:52:09,871
Whatever.
2211
01:52:09,871 --> 01:52:11,911
So are we going to switch places now?
2212
01:52:11,911 --> 01:52:13,831
SPEAKER 1: Oh look, we're out of time.
2213
01:52:13,831 --> 01:52:14,951
BINKY: But--
2214
01:52:14,951 --> 01:52:17,171
That's from our friend Nick Parlante at Stanford.
2215
01:52:17,171 --> 01:52:19,511
So let's consider what Nick did here as Binky.
2216
01:52:19,511 --> 01:52:21,581
So here is all the code together.
2217
01:52:21,581 --> 01:52:25,258
These first couple of lines were not bad, and notice that in Stanford's code
2218
01:52:25,258 --> 01:52:26,591
they move the stars to the left.
2219
01:52:26,591 --> 01:52:27,341
That's fine.
2220
01:52:27,341 --> 01:52:30,251
Again, more conventional might be this syntax here.
2221
01:52:30,251 --> 01:52:31,461
These two lines are fine.
2222
01:52:31,461 --> 01:52:34,781
It's OK to create variables, even pointers,
2223
01:52:34,781 --> 01:52:38,411
and not assign them a value initially so long as you eventually do.
2224
01:52:38,411 --> 01:52:40,931
So we eventually do here, with this line.
2225
01:52:40,931 --> 01:52:43,991
We assign to x the return value of malloc, which
2226
01:52:43,991 --> 01:52:45,821
is presumably the address of something.
2227
01:52:45,821 --> 01:52:49,071
To be fair, we should really be checking for null as well,
2228
01:52:49,071 --> 01:52:50,991
but that's not the biggest problem here.
2229
01:52:50,991 --> 01:52:53,481
The biggest problem is not even this next line,
2230
01:52:53,481 --> 01:52:59,231
which means go to the memory location in x and store the number 42 there.
2231
01:52:59,231 --> 01:53:01,451
That's fine, because again, malloc returns
2232
01:53:01,451 --> 01:53:03,701
the address of some chunk of memory.
2233
01:53:03,701 --> 01:53:05,801
This chunk of memory is big enough for an int.
2234
01:53:05,801 --> 01:53:08,711
x is therefore going to store the address of that chunk that's
2235
01:53:08,711 --> 01:53:09,671
big enough for an int.
2236
01:53:09,671 --> 01:53:13,541
Star x recalls the dereference operator, means go to that address
2237
01:53:13,541 --> 01:53:15,341
and put 42 in it.
2238
01:53:15,341 --> 01:53:18,461
It's like going to the mailbox and putting the number 42 in it
2239
01:53:18,461 --> 01:53:21,371
instead of taking the number 50 out, like we did before.
2240
01:53:21,371 --> 01:53:23,051
But why is this line bad?
2241
01:53:23,051 --> 01:53:26,291
This is where Binky lost his head, so to speak.
2242
01:53:26,291 --> 01:53:27,641
Why is this bad?
2243
01:53:27,641 --> 01:53:28,681
Yeah.
2244
01:53:28,681 --> 01:53:30,681
AUDIENCE: We haven't yet allocated space for it.
2245
01:53:30,681 --> 01:53:31,231
DAVID J. MALAN: Exactly.
2246
01:53:31,231 --> 01:53:33,141
We haven't yet allocated space for y.
2247
01:53:33,141 --> 01:53:36,051
There's no mention of malloc, there's no assignment of y,
2248
01:53:36,051 --> 01:53:37,591
even to that same memory.
2249
01:53:37,591 --> 01:53:40,441
So this would be, go to the address in y,
2250
01:53:40,441 --> 01:53:43,831
but if there is no known address in y, it is a so-called garbage value,
2251
01:53:43,831 --> 01:53:46,761
which means go to some random address that you have no control over,
2252
01:53:46,761 --> 01:53:47,571
and boom--
2253
01:53:47,571 --> 01:53:52,221
that might cause what we've seen in the past, perhaps as a segmentation fault.
2254
01:53:52,221 --> 01:53:54,111
Now this, fortunately, is the kind of thing
2255
01:53:54,111 --> 01:53:58,041
that if you don't quite have the eye for it yet, Valgrins, that new tool,
2256
01:53:58,041 --> 01:53:59,911
could help you find as well.
2257
01:53:59,911 --> 01:54:03,681
But it's just another example of again, the sort of upside and downside
2258
01:54:03,681 --> 01:54:07,111
of having control now over memory at this level.
2259
01:54:07,111 --> 01:54:07,611
All right.
2260
01:54:07,611 --> 01:54:09,444
Well, let's go ahead and do one other thing.
2261
01:54:09,444 --> 01:54:12,586
Considering from last week that this notion of swapping
2262
01:54:12,586 --> 01:54:14,211
was actually a really common operation.
2263
01:54:14,211 --> 01:54:17,211
We had all of our volunteers come up, we had to swap a lot of things
2264
01:54:17,211 --> 01:54:19,581
during bubble sorts and even selection sort,
2265
01:54:19,581 --> 01:54:21,681
and we just took for granted that the two
2266
01:54:21,681 --> 01:54:23,613
humans would swap themselves just fine.
2267
01:54:23,613 --> 01:54:25,821
But there needs to be code to do that if you actually
2268
01:54:25,821 --> 01:54:29,638
implement bubble sort, selection sort, or anything that involves swapping.
2269
01:54:29,638 --> 01:54:31,221
So let's consider some code like this.
2270
01:54:31,221 --> 01:54:33,291
We'll keep it simple like last week, and where
2271
01:54:33,291 --> 01:54:40,339
we wanted to swap some values like int A and int B, for instance, here.
2272
01:54:40,339 --> 01:54:43,131
Void because I'm not going to return a value, but I have a function
2273
01:54:43,131 --> 01:54:44,031
called swap.
2274
01:54:44,031 --> 01:54:49,341
So here, for instance, might be some code for this.
2275
01:54:49,341 --> 01:54:50,549
But why is it so complicated?
2276
01:54:50,549 --> 01:54:52,133
Here, let's actually take a step back.
2277
01:54:52,133 --> 01:54:53,301
Why don't we do this here.
2278
01:54:53,301 --> 01:54:54,921
I think we have time for one more volunteer.
2279
01:54:54,921 --> 01:54:56,379
Could we get someone to come on up?
2280
01:54:56,379 --> 01:54:58,671
You have to be comfy on camera and you're
2281
01:54:58,671 --> 01:55:01,701
being asked to help with your-- oh, I'll go with the friend, pointing.
2282
01:55:01,701 --> 01:55:05,641
So whoever has their friend doing this here--
2283
01:55:05,641 --> 01:55:06,621
no?
2284
01:55:06,621 --> 01:55:08,511
Now they're pointing it over here.
2285
01:55:08,511 --> 01:55:10,251
Now, literally an arm is being twisted.
2286
01:55:10,251 --> 01:55:11,751
OK.
2287
01:55:11,751 --> 01:55:12,471
Come on down.
2288
01:55:12,471 --> 01:55:13,341
That backfired.
2289
01:55:18,311 --> 01:55:18,956
Come on over.
2290
01:55:24,481 --> 01:55:26,241
And what is your name?
2291
01:55:26,241 --> 01:55:27,153
AUDIENCE: Marina.
2292
01:55:27,153 --> 01:55:28,111
DAVID J. MALAN: Marina.
2293
01:55:28,111 --> 01:55:29,641
Nice to meet you.
2294
01:55:29,641 --> 01:55:31,718
Who were you trying to volunteer?
2295
01:55:31,718 --> 01:55:32,801
AUDIENCE: My friend Jesse.
2296
01:55:32,801 --> 01:55:33,971
DAVID J. MALAN: OK.
2297
01:55:33,971 --> 01:55:38,291
So here we have for Marina two glasses of liquid, orange and purple,
2298
01:55:38,291 --> 01:55:39,821
just so that they're super obvious.
2299
01:55:39,821 --> 01:55:42,226
And suppose that the problem at hand, like last week,
2300
01:55:42,226 --> 01:55:45,101
it's just to swap two values, as though these two glasses represented
2301
01:55:45,101 --> 01:55:47,111
two people and we want to swap them.
2302
01:55:47,111 --> 01:55:50,501
But let's consider these glasses to be like variables, or location
2303
01:55:50,501 --> 01:55:52,211
in an array, and you know what?
2304
01:55:52,211 --> 01:55:54,681
I'd really like you to swap the values.
2305
01:55:54,681 --> 01:55:58,241
So orange has to go in there, and purple has to go in there.
2306
01:55:58,241 --> 01:55:59,194
How would you do it?
2307
01:55:59,194 --> 01:56:01,361
And we'll see if we can then translate that to code.
2308
01:56:01,361 --> 01:56:03,508
AUDIENCE: [INAUDIBLE]
2309
01:56:03,508 --> 01:56:04,591
DAVID J. MALAN: OK, what--
2310
01:56:04,591 --> 01:56:06,444
say it a little louder.
2311
01:56:06,444 --> 01:56:07,111
All right, yeah.
2312
01:56:07,111 --> 01:56:09,571
So presumably, you're struggling mentally
2313
01:56:09,571 --> 01:56:12,781
with how you would do this without having an extra cup, so good foresight
2314
01:56:12,781 --> 01:56:13,321
here.
2315
01:56:13,321 --> 01:56:16,191
Let me go ahead and we do have a temporary variable, if you will.
2316
01:56:16,191 --> 01:56:18,691
So if I hand you this, how would you now solve this problem?
2317
01:56:21,181 --> 01:56:22,931
AUDIENCE: I would go like that, but it's--
2318
01:56:22,931 --> 01:56:23,581
DAVID J. MALAN: No, that's--
2319
01:56:23,581 --> 01:56:24,371
Oh.
2320
01:56:24,371 --> 01:56:24,871
Well, OK.
2321
01:56:24,871 --> 01:56:27,981
Go do it-- go with your instincts.
2322
01:56:27,981 --> 01:56:29,541
OK.
2323
01:56:29,541 --> 01:56:30,681
Sure, go ahead.
2324
01:56:30,681 --> 01:56:32,811
Go to whatever your instincts are.
2325
01:56:39,201 --> 01:56:41,828
Yeah, so a little-- so strictly speaking, probably
2326
01:56:41,828 --> 01:56:43,911
shouldn't have moved the glasses just because that
2327
01:56:43,911 --> 01:56:45,931
would be like moving the array locations,
2328
01:56:45,931 --> 01:56:48,611
so let's actually do it one more time but the glasses now
2329
01:56:48,611 --> 01:56:50,361
have to go back where they originally are.
2330
01:56:50,361 --> 01:56:55,051
So how would you swap these now, using this temporary variable?
2331
01:56:55,051 --> 01:56:56,476
OK, good.
2332
01:56:56,476 --> 01:56:59,101
Otherwise we'd be completely uprooting the array, for instance,
2333
01:56:59,101 --> 01:57:01,081
by just physically moving it around.
2334
01:57:01,081 --> 01:57:03,571
So you moved the orange into this temporary variable,
2335
01:57:03,571 --> 01:57:05,911
then you copied the purple into where the orange was,
2336
01:57:05,911 --> 01:57:08,281
and now, presumably, excellent.
2337
01:57:08,281 --> 01:57:11,101
The orange is going to end up where the purple once was
2338
01:57:11,101 --> 01:57:13,621
and this temporary variable, it stored up some extra memory.
2339
01:57:13,621 --> 01:57:16,441
It was necessary at the time, but not necessary, ultimately.
2340
01:57:16,441 --> 01:57:22,131
But a round of applause if we could, and thank you for doing that so well.
2341
01:57:22,131 --> 01:57:26,311
So the fact that it instantly occurred to Mariana
2342
01:57:26,311 --> 01:57:29,711
that you need some temporary variable is a perfect translation to code,
2343
01:57:29,711 --> 01:57:32,951
and in fact this code here, that we might glimpse now,
2344
01:57:32,951 --> 01:57:35,038
is reminiscent of exactly that algorithm,
2345
01:57:35,038 --> 01:57:37,871
where A and B, at the end of the day, are the same chunks of memory.
2346
01:57:37,871 --> 01:57:39,881
Just like the second time, the two glasses
2347
01:57:39,881 --> 01:57:42,281
have to kind of stay put, even though we're physically lifting them,
2348
01:57:42,281 --> 01:57:44,031
but they're going back to where they were,
2349
01:57:44,031 --> 01:57:46,031
is kind of like having two values, A and B,
2350
01:57:46,031 --> 01:57:49,091
and you just have a temporary variable into which you copy A,
2351
01:57:49,091 --> 01:57:52,331
then you change A with B, then you go and change
2352
01:57:52,331 --> 01:57:55,271
B with whatever the original value of A was,
2353
01:57:55,271 --> 01:57:59,921
because you temporarily stored it in this temporary variable, tmp.
2354
01:57:59,921 --> 01:58:04,161
Unfortunately, this code doesn't necessarily work as intended.
2355
01:58:04,161 --> 01:58:07,391
So let me go over to my VS Code here and open up
2356
01:58:07,391 --> 01:58:10,661
a program called swap.c, and in swap.c, let
2357
01:58:10,661 --> 01:58:15,641
me whip up something really quickly here with, how about include stdio.h,
2358
01:58:15,641 --> 01:58:17,561
int main(void).
2359
01:58:17,561 --> 01:58:22,751
Inside of main let me do something like x gets 1 and y gets 2.
2360
01:58:22,751 --> 01:58:27,881
Let me just print out as a visual confirmation that x is %i,
2361
01:58:27,881 --> 01:58:32,891
y is %i backslash n, plugging in x and y, respectively.
2362
01:58:32,891 --> 01:58:36,071
Then let me call a swap function that we'll invent in just a moment.
2363
01:58:36,071 --> 01:58:42,761
Swap x and y And then let me print out again x is %i, y is %i backslash n,
2364
01:58:42,761 --> 01:58:46,331
just to print out again what they are, because presumably I should see 1,
2365
01:58:46,331 --> 01:58:49,494
2 first, then 2, 1 the second time.
2366
01:58:49,494 --> 01:58:51,161
Now how is swap going to be implemented?
2367
01:58:51,161 --> 01:58:54,591
Let me implement it exactly as on the screen a moment ago.
2368
01:58:54,591 --> 01:58:57,011
So void swap int x--
2369
01:58:57,011 --> 01:58:59,501
or let's call it int A for consistency, int B.
2370
01:58:59,501 --> 01:59:01,661
But I could always call those anything I want.
2371
01:59:01,661 --> 01:59:05,891
Int tmp gets A, A gets B, B gets tmp.
2372
01:59:05,891 --> 01:59:08,981
So exactly as I proposed a moment ago, and exactly
2373
01:59:08,981 --> 01:59:12,761
as Mariana really implemented it using these glasses of water.
2374
01:59:12,761 --> 01:59:16,571
I need to now include my prototype, as always, so nothing new there.
2375
01:59:16,571 --> 01:59:20,261
And I'll just copy/paste that up here, and now let's go ahead and run this.
2376
01:59:20,261 --> 01:59:23,471
So make swap-- so far, so good-- swap--
2377
01:59:23,471 --> 01:59:28,331
x is now 1, y is 2, x is 1, y is 2.
2378
01:59:28,331 --> 01:59:34,091
So there seems to be a bit of a bug here, but why might this be?
2379
01:59:34,091 --> 01:59:37,931
This code does not in fact work, even though it obviously works in reality.
2380
01:59:37,931 --> 01:59:39,725
Yeah?
2381
01:59:39,725 --> 01:59:46,239
AUDIENCE: Because A and B have different addresses than x and y [INAUDIBLE]..
2382
01:59:46,239 --> 01:59:48,031
DAVID J. MALAN: Good, and let me summarize.
2383
01:59:48,031 --> 01:59:51,361
A and B do indeed have different addresses of x and y,
2384
01:59:51,361 --> 01:59:54,961
and in fact what happens when you call a function like this on line 11,
2385
01:59:54,961 --> 01:59:59,221
calling swap, passing in x and y, you are calling a function
2386
01:59:59,221 --> 02:00:00,851
by value, so to speak.
2387
02:00:00,851 --> 02:00:02,611
And this is a term of art that just means
2388
02:00:02,611 --> 02:00:07,321
you are passing in copies of x and y, respectively, and calling them
2389
02:00:07,321 --> 02:00:11,551
A and B in the context of this function, but they're indeed copies.
2390
02:00:11,551 --> 02:00:15,451
Now technically, these names are local only.
2391
02:00:15,451 --> 02:00:18,211
I could have called this x, I could have called this y,
2392
02:00:18,211 --> 02:00:22,531
I could have changed this to x, this to y, this to x, and this to y.
2393
02:00:22,531 --> 02:00:24,031
The problem would still remain.
2394
02:00:24,031 --> 02:00:27,961
Just because you use the same names in one function as you do elsewhere,
2395
02:00:27,961 --> 02:00:29,551
that doesn't mean they're the same.
2396
02:00:29,551 --> 02:00:31,121
They just look the same to you.
2397
02:00:31,121 --> 02:00:35,821
But indeed, swap is going to get copies of this x and y, and in this context,
2398
02:00:35,821 --> 02:00:38,461
this scope, so to speak--
2399
02:00:38,461 --> 02:00:40,801
x and y will be copies of the original.
2400
02:00:40,801 --> 02:00:43,141
So for clarity, let me revert this back to A and B
2401
02:00:43,141 --> 02:00:46,951
just to make super clear that they're indeed different, albeit copies,
2402
02:00:46,951 --> 02:00:48,901
but there's indeed a problem there.
2403
02:00:48,901 --> 02:00:51,041
This function actually works fine.
2404
02:00:51,041 --> 02:00:52,361
In fact, notice this.
2405
02:00:52,361 --> 02:00:56,921
Let me go ahead and print out inside of this. printf A is %i,
2406
02:00:56,921 --> 02:01:00,991
B is %i backslash n, and then I'll print A and B.
2407
02:01:00,991 --> 02:01:04,201
And let me do that same thing at the beginning of this function before it
2408
02:01:04,201 --> 02:01:05,381
does any work.
2409
02:01:05,381 --> 02:01:06,751
Let me go ahead and rerun.
2410
02:01:06,751 --> 02:01:10,741
Make swap, ./swap, and this is promising.
2411
02:01:10,741 --> 02:01:17,371
Initially, x is 1, y is 2, A is 1, B is 2, A is 2, B is 1,
2412
02:01:17,371 --> 02:01:19,598
but then nope-- x is 1, y is 2.
2413
02:01:19,598 --> 02:01:21,931
So if anything, I've confirmed that the logic is right--
2414
02:01:21,931 --> 02:01:25,051
Mariana's logic is right, but there's something about C.
2415
02:01:25,051 --> 02:01:28,921
There's something about using one function versus another that's actually
2416
02:01:28,921 --> 02:01:30,671
creating a problem here.
2417
02:01:30,671 --> 02:01:35,021
The fact that I'm passing in copies of these values is creating this problem.
2418
02:01:35,021 --> 02:01:36,391
So what in fact is going on?
2419
02:01:36,391 --> 02:01:39,211
Well again, inside of your computer's memory there is these little chips,
2420
02:01:39,211 --> 02:01:41,086
and we've been talking about them abstractly,
2421
02:01:41,086 --> 02:01:43,141
it's just this grid of memory locations.
2422
02:01:43,141 --> 02:01:46,343
It turns out that your computer uses this memory
2423
02:01:46,343 --> 02:01:47,551
in a pretty conventional way.
2424
02:01:47,551 --> 02:01:51,631
It's not just random, where it just puts stuff wherever is available,
2425
02:01:51,631 --> 02:01:55,591
it actually uses different parts of the memory for different purposes.
2426
02:01:55,591 --> 02:01:58,981
And you have control over a lot of it, but the computer uses some of it
2427
02:01:58,981 --> 02:01:59,823
for itself.
2428
02:01:59,823 --> 02:02:01,531
And let's go ahead and zoom out from this
2429
02:02:01,531 --> 02:02:05,581
and consider that within your computer's memory, what a computer will typically
2430
02:02:05,581 --> 02:02:09,001
do is actually store initially, all of the zeros and ones
2431
02:02:09,001 --> 02:02:13,001
that you compiled in the top of your computer's memory, so to speak.
2432
02:02:13,001 --> 02:02:16,231
So when you compile a program and then you run it with ./whatever,
2433
02:02:16,231 --> 02:02:19,651
or on a Mac or PC you double click on it, the computer first--
2434
02:02:19,651 --> 02:02:24,781
the operating system first-- loads all of your program zeros and ones, a.k.a.
2435
02:02:24,781 --> 02:02:29,371
Machine code, into just one big chunk of memory at the top, so to speak.
2436
02:02:29,371 --> 02:02:33,301
Below that it stores global variables-- any variables
2437
02:02:33,301 --> 02:02:37,183
you have created in your program that are outside of main and outside
2438
02:02:37,183 --> 02:02:37,891
of any functions.
2439
02:02:37,891 --> 02:02:39,691
Generally, the top of your file.
2440
02:02:39,691 --> 02:02:41,634
Globals tend to go at the top there.
2441
02:02:41,634 --> 02:02:44,551
Then there's this chunk of memory that's generally known as the heap--
2442
02:02:44,551 --> 02:02:46,951
and we saw that word briefly in Valgin's output,
2443
02:02:46,951 --> 02:02:50,581
and then there's this other chunk of memory called the stack.
2444
02:02:50,581 --> 02:02:55,711
And it turns out that up until this week you were using the stack heavily.
2445
02:02:55,711 --> 02:03:00,961
Any time you use local variables in a function they end up on the stack.
2446
02:03:00,961 --> 02:03:04,681
Any time you use malloc, that memory ends up on the heap.
2447
02:03:04,681 --> 02:03:06,751
Now as the arrow suggests, this actually looks
2448
02:03:06,751 --> 02:03:09,834
like a problem waiting to happen because if you use more and more and more
2449
02:03:09,834 --> 02:03:11,671
heap, and more and more and more stack, it's
2450
02:03:11,671 --> 02:03:14,401
like two things barreling down the tracks at one another-- this does not
2451
02:03:14,401 --> 02:03:14,891
end well.
2452
02:03:14,891 --> 02:03:16,141
And that's actually a problem.
2453
02:03:16,141 --> 02:03:19,481
If you've ever heard the phrase stack overflow, or use the website,
2454
02:03:19,481 --> 02:03:21,271
this is the origin of its name.
2455
02:03:21,271 --> 02:03:23,521
When you start to use more and more and more
2456
02:03:23,521 --> 02:03:25,801
memory by calling lots and lots of functions
2457
02:03:25,801 --> 02:03:28,261
or using lots and lots of local variables,
2458
02:03:28,261 --> 02:03:30,511
you use a lot of this stack memory.
2459
02:03:30,511 --> 02:03:33,961
Or if you use malloc a lot and keep calling malloc, malloc, malloc,
2460
02:03:33,961 --> 02:03:37,681
and never really, or rarely calling free, you just use more and more memory
2461
02:03:37,681 --> 02:03:41,521
and eventually these two things might overflow each other, at which point
2462
02:03:41,521 --> 02:03:42,571
you're just out of luck.
2463
02:03:42,571 --> 02:03:45,191
The program will crash or something bad will happen.
2464
02:03:45,191 --> 02:03:47,971
So the onus is on you just to don't do that.
2465
02:03:47,971 --> 02:03:50,221
But this is the design, generally, of what's
2466
02:03:50,221 --> 02:03:52,111
going on inside of your computer's memory.
2467
02:03:52,111 --> 02:03:55,711
Now within that memory, though, there are certain conventions
2468
02:03:55,711 --> 02:03:57,571
focusing on here, the stack.
2469
02:03:57,571 --> 02:04:00,031
And in fact, let me go over here with a marker
2470
02:04:00,031 --> 02:04:03,521
and say that this represents the bottom of my memory, ultimately.
2471
02:04:03,521 --> 02:04:07,801
And so here we have a whole bunch of wooden blocks and each of these squares
2472
02:04:07,801 --> 02:04:10,091
represents a byte of memory and this, for instance,
2473
02:04:10,091 --> 02:04:12,781
might represent four bytes altogether-- good enough for an int,
2474
02:04:12,781 --> 02:04:14,111
or something like that.
2475
02:04:14,111 --> 02:04:18,451
So in my original code that I wrote earlier, that is in fact, buggy,
2476
02:04:18,451 --> 02:04:20,851
what is in fact going on inside the swap function?
2477
02:04:20,851 --> 02:04:24,901
We can visualize it like this-- when you run ./swap or any program for that
2478
02:04:24,901 --> 02:04:28,501
matter, main is the first function to get called with a C program,
2479
02:04:28,501 --> 02:04:32,011
and so I'm just going to label this bottom row of memory as main.
2480
02:04:32,011 --> 02:04:36,381
And what were the two variables I had in main called in this code?
2481
02:04:36,381 --> 02:04:37,631
Yeah.
2482
02:04:37,631 --> 02:04:38,201
x and y.
2483
02:04:38,201 --> 02:04:40,401
And each of those was an int, so that's four bytes,
2484
02:04:40,401 --> 02:04:43,121
so it's deliberate that I reserved four--
2485
02:04:43,121 --> 02:04:45,951
a chunk of wood here that's four bytes.
2486
02:04:45,951 --> 02:04:49,901
So let me just call this x, and I'm just going to write the number 1 in this box
2487
02:04:49,901 --> 02:04:50,411
here.
2488
02:04:50,411 --> 02:04:54,431
And then I had my other variable y, and I'm going to put the number 2 there.
2489
02:04:54,431 --> 02:04:58,641
What happens when main calls swap like it does in this code here?
2490
02:04:58,641 --> 02:05:04,931
Well, it has two variables of its own, A and B, and A initially is 1
2491
02:05:04,931 --> 02:05:09,341
and B is initially 2, but it has a third variable, tmp,
2492
02:05:09,341 --> 02:05:12,371
which is a local variable in addition to the arguments A and B
2493
02:05:12,371 --> 02:05:16,931
that are passed in, so I'm going to call this tmp, tmp over here.
2494
02:05:16,931 --> 02:05:18,156
And what is the value of tmp?
2495
02:05:18,156 --> 02:05:19,781
Well, we have to look back at the code.
2496
02:05:19,781 --> 02:05:24,431
tmp initially gets the value of A. All right, the value of a was 1,
2497
02:05:24,431 --> 02:05:26,141
so tmp initially gets 1.
2498
02:05:26,141 --> 02:05:28,601
That's step one in my three line program.
2499
02:05:28,601 --> 02:05:32,621
OK, A equals B. So that is assigned from the right to the left of the B
2500
02:05:32,621 --> 02:05:36,251
into the A So B is 2, A is this, so let me go ahead
2501
02:05:36,251 --> 02:05:38,361
and erase this and just overwrite that.
2502
02:05:38,361 --> 02:05:41,891
So at this moment in the story you have two copies of two,
2503
02:05:41,891 --> 02:05:44,711
so that's OK though, because the third line of code
2504
02:05:44,711 --> 02:05:47,741
says tmp gets copied into B. So what's tmp--
2505
02:05:47,741 --> 02:05:53,171
1, gets copied into B, so let me overwrite this 2 with a 1,
2506
02:05:53,171 --> 02:05:54,821
and now what happens?
2507
02:05:54,821 --> 02:05:57,941
Now unfortunately, the code ends.
2508
02:05:57,941 --> 02:06:01,511
swap doesn't actually do anything with the result, and the problem in C
2509
02:06:01,511 --> 02:06:03,521
is that I could have had a return value.
2510
02:06:03,521 --> 02:06:05,741
I could go in there and change void to int,
2511
02:06:05,741 --> 02:06:07,511
but which one am I going to return?
2512
02:06:07,511 --> 02:06:09,221
The A or the B?
2513
02:06:09,221 --> 02:06:11,631
The whole goal is to swap two values, and it
2514
02:06:11,631 --> 02:06:13,631
seems kind of lame if you can't write a function
2515
02:06:13,631 --> 02:06:16,661
to do something as common per last week sorting algorithms
2516
02:06:16,661 --> 02:06:18,191
as swapping two values.
2517
02:06:18,191 --> 02:06:19,541
But what really happens?
2518
02:06:19,541 --> 02:06:22,751
Well, even though when this program starts running,
2519
02:06:22,751 --> 02:06:25,991
main is using this chunk of memory at the bottom in the so-called stack,
2520
02:06:25,991 --> 02:06:28,661
and the stack is just like a cafeteria stack of trays--
2521
02:06:28,661 --> 02:06:30,201
it grows up, like this.
2522
02:06:30,201 --> 02:06:32,291
Here's main's memory on the stack.
2523
02:06:32,291 --> 02:06:34,571
Here's the swap function's memory on the stack.
2524
02:06:34,571 --> 02:06:37,241
It's using three ints instead of two--
2525
02:06:37,241 --> 02:06:38,951
instead of only two.
2526
02:06:38,951 --> 02:06:42,461
What happens when the function returns, whether it's void or not?
2527
02:06:42,461 --> 02:06:45,701
The sort of recollection that this is swap's memory goes away
2528
02:06:45,701 --> 02:06:47,291
and garbage values are left.
2529
02:06:47,291 --> 02:06:51,531
So, adorably, we get rid of these values here,
2530
02:06:51,531 --> 02:06:55,991
and there's still data there-- technically, the numbers 1, 1, and 2
2531
02:06:55,991 --> 02:06:59,591
are still there in the computer's memory but they no longer belong to us
2532
02:06:59,591 --> 02:07:01,341
because the function has now returned.
2533
02:07:01,341 --> 02:07:04,421
So they're still in there and this is kind of an example visually
2534
02:07:04,421 --> 02:07:07,781
of why there's other stuff in memory even though you didn't put it there,
2535
02:07:07,781 --> 02:07:08,621
necessarily.
2536
02:07:08,621 --> 02:07:11,071
Sometimes you did put it there, but now once
2537
02:07:11,071 --> 02:07:14,711
swap returns you only should be touching memory inside of main.
2538
02:07:14,711 --> 02:07:19,001
But we've never actually copied one value into main.
2539
02:07:19,001 --> 02:07:22,661
We haven't returned anything and we haven't solved this fundamentally.
2540
02:07:22,661 --> 02:07:24,291
So how could we do this?
2541
02:07:24,291 --> 02:07:28,301
Well, what if we instead passed into swap not copies of x and y,
2542
02:07:28,301 --> 02:07:32,681
calling them A and B. What if they passed in breadcrumbs to x and y,
2543
02:07:32,681 --> 02:07:35,861
sort of a treasure map that will lead swap to the actual x
2544
02:07:35,861 --> 02:07:37,241
and to the actual y?
2545
02:07:37,241 --> 02:07:41,051
Today we have that capability using pointers.
2546
02:07:41,051 --> 02:07:44,921
So suppose that we use this code instead.
2547
02:07:44,921 --> 02:07:47,831
There's a lot of stars going on here, which is a bit annoying,
2548
02:07:47,831 --> 02:07:50,501
but let's consider what it is we're trying to achieve.
2549
02:07:50,501 --> 02:07:55,391
What if we pass in not x and y, but the address of x and the address of y,
2550
02:07:55,391 --> 02:07:57,501
respectively-- breadcrumbs, if you will--
2551
02:07:57,501 --> 02:08:00,521
that will lead swap to the original values.
2552
02:08:00,521 --> 02:08:04,331
Then what we do is we still give ourselves a tmp variable,
2553
02:08:04,331 --> 02:08:05,351
like an empty glass.
2554
02:08:05,351 --> 02:08:07,691
It's still a glass, so we still call it an int,
2555
02:08:07,691 --> 02:08:10,071
but what do we want to put into that temporary variable?
2556
02:08:10,071 --> 02:08:12,654
We don't want to put A into it, because that's an address now.
2557
02:08:12,654 --> 02:08:15,371
We want to go to that address per the star
2558
02:08:15,371 --> 02:08:17,141
and put whatever's at that address.
2559
02:08:17,141 --> 02:08:18,381
What do we then want to do?
2560
02:08:18,381 --> 02:08:22,121
Well, we want to then copy into whatever's at location A,
2561
02:08:22,121 --> 02:08:24,911
we want to copy over to location A's contents
2562
02:08:24,911 --> 02:08:29,111
whatever is at location B's contents and then lastly, we
2563
02:08:29,111 --> 02:08:32,261
want to copy tmp into whatever's at location B.
2564
02:08:32,261 --> 02:08:36,149
So again, we're very deliberately introducing all of these stars
2565
02:08:36,149 --> 02:08:38,441
because we don't want to change any of these addresses,
2566
02:08:38,441 --> 02:08:41,861
we want to go to these addresses per the reference operator
2567
02:08:41,861 --> 02:08:46,221
and put values there, or get values from.
2568
02:08:46,221 --> 02:08:47,691
So what does this actually mean?
2569
02:08:47,691 --> 02:08:52,001
Well, if I kind of rewind in this story and I go back here, I still have tmp,
2570
02:08:52,001 --> 02:08:57,671
although I'm going to delete its value to begin with, I still have B
2571
02:08:57,671 --> 02:09:01,121
and I still have A, but what's going to be different
2572
02:09:01,121 --> 02:09:05,051
this time is how I use A and B. So let me finish erasing those.
2573
02:09:05,051 --> 02:09:07,181
That's A on the left, this is B on the right.
2574
02:09:07,181 --> 02:09:09,701
At this point in the story, we're rerunning swap
2575
02:09:09,701 --> 02:09:13,151
with this new and improved version, and let's see what happens.
2576
02:09:13,151 --> 02:09:16,871
Well, x is presumably at some address.
2577
02:09:16,871 --> 02:09:20,351
Maybe it's like 0x123, as always.
2578
02:09:20,351 --> 02:09:23,471
What then does A get when I'm using this code?
2579
02:09:23,471 --> 02:09:27,131
The value of A is 0x123.
2580
02:09:27,131 --> 02:09:28,391
What is the value of B?
2581
02:09:28,391 --> 02:09:31,661
Maybe y is that 0x456.
2582
02:09:31,661 --> 02:09:32,651
What goes in B?
2583
02:09:32,651 --> 02:09:38,281
Well, I'm going to put 0x456, and the what am I going to do?
2584
02:09:38,281 --> 02:09:40,471
Based on these three lines of code, I'm going
2585
02:09:40,471 --> 02:09:44,671
to store in tmp whatever is at the address in A. What is the address in A?
2586
02:09:44,671 --> 02:09:47,701
That's this thing here, so I'm going to put 1 in tmp.
2587
02:09:47,701 --> 02:09:50,251
Line two-- I'm going to go to B--
2588
02:09:50,251 --> 02:09:53,131
all right, B is 456, so I'm going to B and I'm
2589
02:09:53,131 --> 02:09:57,931
going to store 2 at whatever is at location A, and at location A
2590
02:09:57,931 --> 02:10:01,211
is 123, so that's this, so what am I going to do?
2591
02:10:01,211 --> 02:10:03,901
I'm going to change this 1 to a 2.
2592
02:10:03,901 --> 02:10:06,631
Last line of code-- get the value of tmp, which is 1,
2593
02:10:06,631 --> 02:10:11,731
and then put it at whatever the location B is, so B, 456, go there
2594
02:10:11,731 --> 02:10:16,291
and change it to be the value of tmp, tmp, which puts 1 here.
2595
02:10:16,291 --> 02:10:17,521
That's it for the code.
2596
02:10:17,521 --> 02:10:19,081
There's still no return value.
2597
02:10:19,081 --> 02:10:22,381
swap returns, which means these three temporary variables
2598
02:10:22,381 --> 02:10:24,091
are garbage values now.
2599
02:10:24,091 --> 02:10:26,471
They can be reused by subsequent function calls
2600
02:10:26,471 --> 02:10:31,091
but now, I've actually swapped the values of x and y.
2601
02:10:31,091 --> 02:10:35,041
Which is to say what came as naturally as the real world here for Mariana
2602
02:10:35,041 --> 02:10:38,521
is not quite as simply done in C because again,
2603
02:10:38,521 --> 02:10:40,861
functions are isolated from each other.
2604
02:10:40,861 --> 02:10:44,141
You can pass in values but you get copies of those values.
2605
02:10:44,141 --> 02:10:48,691
If you want one function to affect the value of a variable somewhere else,
2606
02:10:48,691 --> 02:10:52,021
you have to 1, understand what's going on but 2,
2607
02:10:52,021 --> 02:10:54,971
pass things in as by a pointer here.
2608
02:10:54,971 --> 02:10:58,561
So if I go back to my code here, I need to make a few changes now.
2609
02:10:58,561 --> 02:11:00,661
Let me get rid of these extra printf's.
2610
02:11:00,661 --> 02:11:03,391
Let me go in and add all these stars.
2611
02:11:03,391 --> 02:11:07,411
So I'm dereferencing these actual addresses here and here,
2612
02:11:07,411 --> 02:11:09,821
and I've got to make one more change.
2613
02:11:09,821 --> 02:11:16,381
How do I now call swap if swap is expecting an int star and an int star?
2614
02:11:16,381 --> 02:11:19,441
That is, the address of an int and the address of another int.
2615
02:11:19,441 --> 02:11:21,931
What do I change on line 11 here?
2616
02:11:21,931 --> 02:11:24,231
Yeah.
2617
02:11:24,231 --> 02:11:25,983
Sorry, a little louder.
2618
02:11:25,983 --> 02:11:30,231
AUDIENCE: [INAUDIBLE]
2619
02:11:30,231 --> 02:11:33,051
DAVID J. MALAN: Sorry, the address of operator.
2620
02:11:33,051 --> 02:11:37,731
So up here on line 11, we do ampersand x and ampersand y.
2621
02:11:37,731 --> 02:11:41,001
So that yes, we're technically passing in a copy of a value,
2622
02:11:41,001 --> 02:11:43,881
but this time the copy we're passing in is technically an address,
2623
02:11:43,881 --> 02:11:47,271
and as soon as we have an address, just like when I held up the fuzzy finger--
2624
02:11:47,271 --> 02:11:50,571
the foamy finger-- I can point at that address, I can go to that address
2625
02:11:50,571 --> 02:11:54,561
and actually get a value from the mailbox or put a value into the mailbox
2626
02:11:54,561 --> 02:11:56,821
if I even want.
2627
02:11:56,821 --> 02:12:01,551
So let's cross our fingers now and do make swap, Enter.
2628
02:12:01,551 --> 02:12:02,721
Oh my God, so many mistakes.
2629
02:12:02,721 --> 02:12:04,881
Oh, I didn't remember to change my prototype,
2630
02:12:04,881 --> 02:12:08,421
so let me go way up here and add two more stars because I
2631
02:12:08,421 --> 02:12:09,801
made that change already.
2632
02:12:09,801 --> 02:12:14,961
Make swap, ./swap, and viola-- now I have actually swapped.
2633
02:12:14,961 --> 02:12:15,741
Thank you.
2634
02:12:19,291 --> 02:12:19,831
Thank you.
2635
02:12:19,831 --> 02:12:21,661
The two values.
2636
02:12:21,661 --> 02:12:24,491
All right, so what more can we do here?
2637
02:12:24,491 --> 02:12:29,461
Well, let me consider that all this time we've
2638
02:12:29,461 --> 02:12:33,691
been deliberately using GetString and GetInt and GetFloat
2639
02:12:33,691 --> 02:12:35,111
and so forth, but for a reason.
2640
02:12:35,111 --> 02:12:38,069
These aren't just training wheels for the sake of making things easier,
2641
02:12:38,069 --> 02:12:41,071
they're actually in place to make your code safer.
2642
02:12:41,071 --> 02:12:45,511
And to illustrate this, let me go ahead and open up one other file here.
2643
02:12:45,511 --> 02:12:49,861
How about a file called scanf.c.
2644
02:12:49,861 --> 02:12:52,891
It turns out that the old school way-- the way in C,
2645
02:12:52,891 --> 02:12:57,151
really, of getting user input, is via functions like scanf,
2646
02:12:57,151 --> 02:13:00,751
and let me go ahead and include stdio.h, int main(void),
2647
02:13:00,751 --> 02:13:04,441
and without using the CS50 library at all for strings or for any of those
2648
02:13:04,441 --> 02:13:05,611
get functions.
2649
02:13:05,611 --> 02:13:08,161
Let me give myself an int called x.
2650
02:13:08,161 --> 02:13:12,076
Let me just print out what the value of x is, even though it's going to be a--
2651
02:13:12,076 --> 02:13:15,361
or rather, ask the user for the value by asking them for x.
2652
02:13:15,361 --> 02:13:18,781
And I'm going to use a function called scanf that's going to scan
2653
02:13:18,781 --> 02:13:25,351
in an integer using %i, and I'm going to store whatever the human types
2654
02:13:25,351 --> 02:13:27,306
in at this location.
2655
02:13:27,306 --> 02:13:30,181
And then I'm going to go ahead and, just so we can see what happened,
2656
02:13:30,181 --> 02:13:34,231
I'm going to print out with %i whatever the human typed in as follows.
2657
02:13:34,231 --> 02:13:37,321
All right, so line eight is week 1 style code.
2658
02:13:37,321 --> 02:13:40,991
Line five and six is week 1 style code.
2659
02:13:40,991 --> 02:13:46,411
So the curiosity today is this new line. scanf is another function in stdio.h,
2660
02:13:46,411 --> 02:13:47,971
and notice what I'm doing.
2661
02:13:47,971 --> 02:13:50,671
I'm using the same syntax that I use for printf,
2662
02:13:50,671 --> 02:13:54,091
which is kind of a little clue-- a format code to tell scanf what it is I
2663
02:13:54,091 --> 02:13:57,031
want to scan in, that is, read from the human's keyboard--
2664
02:13:57,031 --> 02:14:00,571
and I'm telling it where to put whatever the human typed in.
2665
02:14:00,571 --> 02:14:04,321
I can't just say x, because we run into the same darn problem as with swap.
2666
02:14:04,321 --> 02:14:06,811
I have to give a little breadcrumb to the variable
2667
02:14:06,811 --> 02:14:10,111
where I want scanf to put the human's integer.
2668
02:14:10,111 --> 02:14:13,541
And so this just tells the computer to get an int.
2669
02:14:13,541 --> 02:14:15,781
This is what you would have had to type, essentially,
2670
02:14:15,781 --> 02:14:18,691
in week 1 just to get an int from the user,
2671
02:14:18,691 --> 02:14:21,541
and there's a whole bunch of things that can go wrong still,
2672
02:14:21,541 --> 02:14:24,931
but that's the cryptic syntax we would have had to show you in week 1.
2673
02:14:24,931 --> 02:14:26,881
Let me go ahead and make scanf here--
2674
02:14:26,881 --> 02:14:29,941
oops-- user error.
2675
02:14:29,941 --> 02:14:31,891
Put the semicolon in the wrong place.
2676
02:14:31,891 --> 02:14:33,781
Make scanf, Enter.
2677
02:14:33,781 --> 02:14:35,281
Oh my God.
2678
02:14:35,281 --> 02:14:36,676
Non void doesn't return a value.
2679
02:14:40,371 --> 02:14:42,591
Oh, thank you.
2680
02:14:42,591 --> 02:14:43,221
Strike two.
2681
02:14:43,221 --> 02:14:43,851
OK.
2682
02:14:43,851 --> 02:14:45,141
Make scanf.
2683
02:14:45,141 --> 02:14:45,831
There we go.
2684
02:14:45,831 --> 02:14:46,971
OK, so scanf--
2685
02:14:46,971 --> 02:14:49,951
I'm going to type in a number like 50 and it just prints it back out.
2686
02:14:49,951 --> 02:14:54,181
So that is the traditional way of implementing something like GetInt.
2687
02:14:54,181 --> 02:14:57,651
The problem, though, is when you start to get into strings, things
2688
02:14:57,651 --> 02:14:59,121
get dangerous quickly.
2689
02:14:59,121 --> 02:15:01,289
Let me delete all of this and give myself
2690
02:15:01,289 --> 02:15:03,831
a string s, although wait a minute-- we don't call it strings
2691
02:15:03,831 --> 02:15:06,891
anymore-- char star to store a string.
2692
02:15:06,891 --> 02:15:10,731
Then let me go ahead and just prompt the user for a string, using just printf.
2693
02:15:10,731 --> 02:15:15,531
Then let me go ahead and use scanf, ask them for a string this time with %s,
2694
02:15:15,531 --> 02:15:18,211
and store it at that address.
2695
02:15:18,211 --> 02:15:20,751
Then let me go ahead and print out whatever the human typed
2696
02:15:20,751 --> 02:15:23,641
in just by using the same notation.
2697
02:15:23,641 --> 02:15:28,791
So here, line five is the same thing as string s, but we've taken back
2698
02:15:28,791 --> 02:15:31,191
that layer today so it's char star s.
2699
02:15:31,191 --> 02:15:35,991
This is just week one this is just week one, line seven is new.
2700
02:15:35,991 --> 02:15:41,811
scanf will also read from the human's keyboard a string and store it at s.
2701
02:15:41,811 --> 02:15:43,641
But that's OK, because s is an address.
2702
02:15:43,641 --> 02:15:46,551
It's correct not to do the ampersand.
2703
02:15:46,551 --> 02:15:47,451
It's not necessary.
2704
02:15:47,451 --> 02:15:52,071
A string is and has always been a char star, a.k.a string.
2705
02:15:52,071 --> 02:15:54,091
The problem, though, arises as follows--
2706
02:15:54,091 --> 02:15:56,411
if I do make scanf--
2707
02:15:56,411 --> 02:15:57,911
oh my God, what did I do wrong--
2708
02:15:57,911 --> 02:16:00,431
I can't-- OK, we have certain defenses in place with make.
2709
02:16:00,431 --> 02:16:06,881
Let me do clang of scanf.c, an output of program called scanf.
2710
02:16:06,881 --> 02:16:09,838
All right, so I'm overriding some of our pedagogical defenses
2711
02:16:09,838 --> 02:16:11,171
that we have in place with make.
2712
02:16:11,171 --> 02:16:15,761
Let me now run scanf of this version, Enter, and let me type in something
2713
02:16:15,761 --> 02:16:20,341
like, how about hi again.
2714
02:16:20,341 --> 02:16:23,161
So it didn't even store something and it weirdly printed out null.
2715
02:16:23,161 --> 02:16:26,821
This time it's in lowercase, but that is somewhat related.
2716
02:16:26,821 --> 02:16:31,561
What did I fundamentally do wrong though, here?
2717
02:16:31,561 --> 02:16:33,691
Why is this getting more and more dangerous?
2718
02:16:33,691 --> 02:16:35,471
And let me illustrate the point even more.
2719
02:16:35,471 --> 02:16:38,741
What if I type in not just something like hello, which also doesn't work.
2720
02:16:38,741 --> 02:16:44,581
What if I do like, hellooooo and make a really long string, Enter--
2721
02:16:44,581 --> 02:16:45,871
that still works.
2722
02:16:45,871 --> 02:16:48,191
Can I do this again?
2723
02:16:48,191 --> 02:16:50,091
Let's try again.
2724
02:16:50,091 --> 02:16:53,271
Right, a really long, unexpectedly long string.
2725
02:16:53,271 --> 02:16:55,131
This is the nondeterminism kicking in.
2726
02:16:55,131 --> 02:16:55,851
Enter.
2727
02:16:55,851 --> 02:16:56,421
All right, damn it.
2728
02:16:56,421 --> 02:16:58,254
I was trying to trigger a segmentation fault
2729
02:16:58,254 --> 02:17:01,491
but it wouldn't, but the point still remains.
2730
02:17:01,491 --> 02:17:06,181
It's still not working, but what's the essence of why this isn't working,
2731
02:17:06,181 --> 02:17:07,851
and it's not storing my actual input?
2732
02:17:07,851 --> 02:17:08,731
Yeah.
2733
02:17:08,731 --> 02:17:10,666
AUDIENCE: Do you have to make a space?
2734
02:17:10,666 --> 02:17:12,541
DAVID J. MALAN: We have to make space for it.
2735
02:17:12,541 --> 02:17:15,781
So what we're missing here is malloc, or something like that.
2736
02:17:15,781 --> 02:17:18,741
So I could do that, I could do something like this.
2737
02:17:18,741 --> 02:17:21,441
Well, let the human type in at least a three letter word
2738
02:17:21,441 --> 02:17:25,581
so I could do malloc of 3 plus 1 for the null character.
2739
02:17:25,581 --> 02:17:29,961
So let me give them four characters, and let me go ahead and do make scanf--
2740
02:17:29,961 --> 02:17:30,921
whoops.
2741
02:17:30,921 --> 02:17:33,081
Nope, sorry. clang, I have to--
2742
02:17:33,081 --> 02:17:33,721
nope.
2743
02:17:33,721 --> 02:17:34,221
Dammit.
2744
02:17:34,221 --> 02:17:40,811
Oh, include stdlib.h-- there we go.
2745
02:17:40,811 --> 02:17:43,836
That gives me malloc, now I'm going to recompile this with clang,
2746
02:17:43,836 --> 02:17:46,961
now I'm going to rerun it, and now I'm going to type in my first thing, hi.
2747
02:17:46,961 --> 02:17:48,341
That now works.
2748
02:17:48,341 --> 02:17:52,061
And let me get a little aggressive now and type in hello, which is too long.
2749
02:17:52,061 --> 02:17:54,101
Still works, but I'm getting lucky.
2750
02:17:54,101 --> 02:17:57,671
Let me try a hellooooooo.
2751
02:17:57,671 --> 02:17:59,995
Damn it, that still works, too.
2752
02:17:59,995 --> 02:18:01,091
Sort of.
2753
02:18:01,091 --> 02:18:03,290
But it actually-- not quite.
2754
02:18:03,290 --> 02:18:05,411
There's some weirdness going on there already.
2755
02:18:05,411 --> 02:18:07,011
It turns out I can also do this.
2756
02:18:07,011 --> 02:18:10,390
I could actually just say char star four and give myself
2757
02:18:10,390 --> 02:18:11,681
an array of four characters.
2758
02:18:11,681 --> 02:18:13,101
Let me try this one more time.
2759
02:18:13,101 --> 02:18:16,661
So let me rerun clang ./scanf.
2760
02:18:16,661 --> 02:18:21,460
Hellooooooo, clearly exceeding the four characters--
2761
02:18:21,460 --> 02:18:22,091
there we go.
2762
02:18:22,091 --> 02:18:23,080
Thank you, all right.
2763
02:18:26,821 --> 02:18:29,342
So the point here, though, is if we hadn't given you GetInt,
2764
02:18:29,342 --> 02:18:31,800
you would have had to use the scanf thing-- not a huge deal
2765
02:18:31,800 --> 02:18:33,071
because it seemed to work.
2766
02:18:33,071 --> 02:18:36,321
But if we hadn't given you GetString you would have had to do stuff like this,
2767
02:18:36,321 --> 02:18:39,481
knowing about malloc already or knowing about strings being erased,
2768
02:18:39,481 --> 02:18:41,550
and even now there's a danger.
2769
02:18:41,550 --> 02:18:45,751
If the human types in five letters, six letters, 100 letters-- this code,
2770
02:18:45,751 --> 02:18:49,501
like with the Hello input, will probably just crash, which is bad.
2771
02:18:49,501 --> 02:18:51,481
So GetString also has this functionality built
2772
02:18:51,481 --> 02:18:53,790
in where we have a fancy loop inside such
2773
02:18:53,790 --> 02:18:58,321
that we allocate using malloc as many bytes as you physically type in,
2774
02:18:58,321 --> 02:19:00,271
and we use malloc essentially every keystroke.
2775
02:19:00,271 --> 02:19:05,101
The moment you type in h-e-l-l-o, we're laying the tracks as we go and we keep
2776
02:19:05,101 --> 02:19:09,571
allocating more and more memory so that we theoretically will never crash with
2777
02:19:09,571 --> 02:19:12,300
GetString even though it's this easy to crack--
2778
02:19:12,300 --> 02:19:15,451
this easy to crash your code using scanf if you again
2779
02:19:15,451 --> 02:19:18,121
did it without the help of a library.
2780
02:19:18,121 --> 02:19:20,178
So where are we all going with this?
2781
02:19:20,178 --> 02:19:22,261
Well, let me show you a few final examples that'll
2782
02:19:22,261 --> 02:19:24,601
pave the way for what will be problem set four.
2783
02:19:24,601 --> 02:19:27,761
Let me go ahead and open up from today's code--
2784
02:19:27,761 --> 02:19:29,880
which is available on the course's website--
2785
02:19:29,880 --> 02:19:36,841
for instance, a program like this, called phonebook.c,
2786
02:19:36,841 --> 02:19:39,540
and I'm just going to give you a quick tour of it,
2787
02:19:39,540 --> 02:19:42,502
that you'll see more details on in the context of p-set four itself.
2788
02:19:42,502 --> 02:19:45,210
We're going to introduce a few new functions you're going to see.
2789
02:19:45,210 --> 02:19:48,451
You're going to see a function called fopen, which stands for file open,
2790
02:19:48,451 --> 02:19:51,842
and it takes two arguments-- the name of a file to open like a CSV
2791
02:19:51,842 --> 02:19:55,050
that you might manipulate in Excel or Google Spreadsheets or the like-- comma
2792
02:19:55,050 --> 02:19:59,851
separated values, and then something like A for append, R for read,
2793
02:19:59,851 --> 02:20:02,790
W for write, depending on whether you want to add to the file,
2794
02:20:02,790 --> 02:20:05,321
just open it up, or change it.
2795
02:20:05,321 --> 02:20:07,831
We're going to introduce you to a file pointer.
2796
02:20:07,831 --> 02:20:09,671
You'll see that capital file--
2797
02:20:09,671 --> 02:20:12,271
which is a little bit unconventional-- capital file is
2798
02:20:12,271 --> 02:20:15,121
a pointer to an actual file on the computer's hard drive
2799
02:20:15,121 --> 02:20:17,640
so that you can actually access something like a CSV file,
2800
02:20:17,640 --> 02:20:18,991
or heck, even images.
2801
02:20:18,991 --> 02:20:21,300
And we're going to see down below that you're also
2802
02:20:21,300 --> 02:20:25,050
going to have the ability to write files as well, or print to files.
2803
02:20:25,050 --> 02:20:28,981
You'll see functions like printf printf for file printf.
2804
02:20:28,981 --> 02:20:34,111
Or fwrite-- file write-- which now that you will begin to understand pointers,
2805
02:20:34,111 --> 02:20:37,951
you'll have the ability to actually not only read files--
2806
02:20:37,951 --> 02:20:41,470
text files, images, other things-- but also write them out.
2807
02:20:41,470 --> 02:20:46,921
In fact for instance, just as a teaser here, JPEGs will be one of the things
2808
02:20:46,921 --> 02:20:49,321
we focus on this week where we give you a forensic image
2809
02:20:49,321 --> 02:20:51,991
and your goal is to recover as many photographs
2810
02:20:51,991 --> 02:20:55,651
from this forensic image of a digital camera as you possibly can.
2811
02:20:55,651 --> 02:20:59,071
And the way you're going to do that is by knowing in advance
2812
02:20:59,071 --> 02:21:03,571
that every JPEG in the world starts with these three bytes, written
2813
02:21:03,571 --> 02:21:05,800
in hexadecimal, but these three numbers.
2814
02:21:05,800 --> 02:21:08,521
And so in fact, just as a teaser, let me open up
2815
02:21:08,521 --> 02:21:11,701
an example you'll see on the course's website for today.
2816
02:21:11,701 --> 02:21:14,436
If I scroll through here, you'll see a program
2817
02:21:14,436 --> 02:21:16,061
that does a little something like this.
2818
02:21:16,061 --> 02:21:18,211
And again, more on this--
2819
02:21:18,211 --> 02:21:20,401
if we could hit the button--
2820
02:21:20,401 --> 02:21:21,041
there we go.
2821
02:21:21,041 --> 02:21:26,221
So here we have the notion of a byte we're going to create for ourselves.
2822
02:21:26,221 --> 02:21:29,101
We'll see a data type called byte, which is a common convention.
2823
02:21:29,101 --> 02:21:30,341
This gives me three bytes.
2824
02:21:30,341 --> 02:21:32,674
And you're going to learn about a function called fread,
2825
02:21:32,674 --> 02:21:36,571
which reads from a file some number of bytes-- for instance, three bytes.
2826
02:21:36,571 --> 02:21:38,341
We might then use code like this.
2827
02:21:38,341 --> 02:21:42,001
If bytes bracket zero equals equals 0xFF and bytes
2828
02:21:42,001 --> 02:21:47,761
bracket 1 equals 0xD8 and bytes bracket 2 equals 0xFF, all three of those
2829
02:21:47,761 --> 02:21:52,481
bytes I just claimed represent a JPEG, you'll see an output like this.
2830
02:21:52,481 --> 02:21:55,811
Let me go ahead and run this program as follows.
2831
02:21:55,811 --> 02:21:59,921
Let me copy jpeg.c into my directory from today's distribution.
2832
02:21:59,921 --> 02:22:08,071
Let me do make jpeg, and let me run jpeg on a file which is available online
2833
02:22:08,071 --> 02:22:11,841
called lecture.jpeg, and I claim yes, it's possibly a JPEG.
2834
02:22:11,841 --> 02:22:12,841
Well, what is that file?
2835
02:22:12,841 --> 02:22:16,481
Let me open it up for us, called lecture.jpeg, and here, for instance,
2836
02:22:16,481 --> 02:22:20,581
is that same photo with which we began class, namely implemented as a JPEG.
2837
02:22:20,581 --> 02:22:22,711
But what we're also going to do this week
2838
02:22:22,711 --> 02:22:27,631
is start to implement our own sort of filters a la Instagram, whereby
2839
02:22:27,631 --> 02:22:30,901
we might take images and actually run them through a program that
2840
02:22:30,901 --> 02:22:32,919
creates different versions thereof.
2841
02:22:32,919 --> 02:22:34,711
For instance, using a different file format
2842
02:22:34,711 --> 02:22:38,501
called BMP, which essentially lays out all of its pixels from left to right,
2843
02:22:38,501 --> 02:22:39,901
top to bottom, in a grid.
2844
02:22:39,901 --> 02:22:41,461
You're going to see a struct--
2845
02:22:41,461 --> 02:22:43,501
a data struct in C that's way more complicated
2846
02:22:43,501 --> 02:22:45,631
than the candidate structure from the past,
2847
02:22:45,631 --> 02:22:47,866
or the person structure from the past, that
2848
02:22:47,866 --> 02:22:50,491
looks like this, which is just a whole bunch more values in it,
2849
02:22:50,491 --> 02:22:52,408
but we'll walk you through these in the p-set.
2850
02:22:52,408 --> 02:22:54,421
And we might take a photograph like this and ask
2851
02:22:54,421 --> 02:22:56,881
you to run a few different filters on it a la Instagram,
2852
02:22:56,881 --> 02:23:00,511
like a black and white filter, or grayscale, a sepia filter
2853
02:23:00,511 --> 02:23:04,531
to give it some old school feel, or a reflection like this to invert it,
2854
02:23:04,531 --> 02:23:07,121
or blur it, even in this way.
2855
02:23:07,121 --> 02:23:10,111
And just to end on a note here, I have a version
2856
02:23:10,111 --> 02:23:13,621
of this code ready to go that doesn't implement all of those filters,
2857
02:23:13,621 --> 02:23:16,351
it just implements one filter initially.
2858
02:23:16,351 --> 02:23:19,051
Let me go ahead and just ready this on my computer here.
2859
02:23:19,051 --> 02:23:21,106
I'm going to go into my own version of filter
2860
02:23:21,106 --> 02:23:22,981
and you'll see a few files that will give you
2861
02:23:22,981 --> 02:23:26,621
a tour of this coming week in bitmap.h, for instance,
2862
02:23:26,621 --> 02:23:31,511
is a version of this structure that I claimed existed a moment ago.
2863
02:23:31,511 --> 02:23:39,361
And let me show you this file here, helpers.c, in which there is a function
2864
02:23:39,361 --> 02:23:43,051
called filter that I've already implemented in advance today.
2865
02:23:43,051 --> 02:23:46,111
But the ones we give you for the piece that won't already be implemented,
2866
02:23:46,111 --> 02:23:48,486
this function called filter takes the height of an image,
2867
02:23:48,486 --> 02:23:51,581
the width of an image, and a two dimensional array.
2868
02:23:51,581 --> 02:23:54,571
So rows and columns of pixels, and then I
2869
02:23:54,571 --> 02:23:58,411
have a loop like this that iterates over all of the pixels in an image from top
2870
02:23:58,411 --> 02:24:00,041
to bottom, left to right.
2871
02:24:00,041 --> 02:24:02,011
And then notice what I'm going to do here.
2872
02:24:02,011 --> 02:24:05,191
I'm going to change the blue value to be zero in this case,
2873
02:24:05,191 --> 02:24:07,601
and the green value to be zero in this case.
2874
02:24:07,601 --> 02:24:08,341
But why?
2875
02:24:08,341 --> 02:24:12,091
Well, the image I have here in mind is this one,
2876
02:24:12,091 --> 02:24:14,881
whereby we have this hidden image that simply
2877
02:24:14,881 --> 02:24:18,151
has old school style-- a secret message embedded in it.
2878
02:24:18,151 --> 02:24:21,361
And if you don't happen to have in your dorm one of these secret decoder
2879
02:24:21,361 --> 02:24:23,581
glasses that essentially make everything red--
2880
02:24:23,581 --> 02:24:26,456
getting rid of the green in the world and the blue in the world--
2881
02:24:26,456 --> 02:24:28,831
you can actually-- I'm actually probably the only one who
2882
02:24:28,831 --> 02:24:31,111
can read this right now-- see what message
2883
02:24:31,111 --> 02:24:33,391
is hidden behind all of this red noise.
2884
02:24:33,391 --> 02:24:39,121
But if using my code written here in helpers.c I get rid of all the blue
2885
02:24:39,121 --> 02:24:41,821
in the picture and I get rid of all the green in the picture,
2886
02:24:41,821 --> 02:24:44,431
essentially implementing the idea of this filter--
2887
02:24:44,431 --> 02:24:47,251
this red filter where you only see red--
2888
02:24:47,251 --> 02:24:50,501
well, let's go ahead and compile this program.
2889
02:24:50,501 --> 02:24:55,471
Make filter, run ./filter on this hidden message.bmp.
2890
02:24:55,471 --> 02:24:58,531
I'm going to save it in a new file called message.bmp,
2891
02:24:58,531 --> 02:25:01,471
and with one final flourish we're going to open up
2892
02:25:01,471 --> 02:25:05,371
message.bmp, which is the result of having put on these glasses,
2893
02:25:05,371 --> 02:25:08,521
and hopefully now you too will see what I see.
2894
02:25:17,531 --> 02:25:18,931
All right, that's it for CS50!
2895
02:25:18,931 --> 02:25:19,931
We'll see you next time.
2896
02:25:21,731 --> 02:25:25,681
[MUSIC PLAYING]
245641
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.