Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:00:06,030
>> [MUSIC PLAYING]
1
00:00:06,030 --> 00:00:08,390
>> DOUG LLOYD: Pointers, here we are.
2
00:00:08,390 --> 00:00:11,080
This is probably going to be the most difficult topic
3
00:00:11,080 --> 00:00:12,840
that we talk about in CS50.
4
00:00:12,840 --> 00:00:15,060
And if you've read anything about pointers
5
00:00:15,060 --> 00:00:19,080
before you might be a little bit intimidating going into this video.
6
00:00:19,080 --> 00:00:21,260
It's true the pointers do allow you the ability
7
00:00:21,260 --> 00:00:23,740
to perhaps screw up pretty badly when you're
8
00:00:23,740 --> 00:00:27,450
working with variables, and data, and causing your program to crash.
9
00:00:27,450 --> 00:00:30,490
But they're actually really useful and they allow us a really great way
10
00:00:30,490 --> 00:00:33,340
to pass data back and forth between functions,
11
00:00:33,340 --> 00:00:35,490
that we're otherwise unable to do.
12
00:00:35,490 --> 00:00:37,750
>> And so what we really want to do here is train
13
00:00:37,750 --> 00:00:41,060
you to have good pointer discipline, so that you can use pointers effectively
14
00:00:41,060 --> 00:00:43,850
to make your programs that much better.
15
00:00:43,850 --> 00:00:48,220
As I said pointers give us a different way to pass data between functions.
16
00:00:48,220 --> 00:00:50,270
Now if you recall from an earlier video, when
17
00:00:50,270 --> 00:00:53,720
we were talking about variable scope, I mentioned
18
00:00:53,720 --> 00:01:00,610
that all the data that we pass between functions in C is passed by value.
19
00:01:00,610 --> 00:01:03,070
And I may not have used that term, what I meant there
20
00:01:03,070 --> 00:01:07,170
was that we are passing copies of data.
21
00:01:07,170 --> 00:01:12,252
When we pass a variable to a function, we're not actually passing the variable
22
00:01:12,252 --> 00:01:13,210
to the function, right?
23
00:01:13,210 --> 00:01:17,670
We're passing a copy of that data to the function.
24
00:01:17,670 --> 00:01:20,760
The function does what it will and it calculates some value,
25
00:01:20,760 --> 00:01:23,180
and maybe we use that value when it gives it back.
26
00:01:23,180 --> 00:01:26,700
>> There was one exception to this rule of passing by value,
27
00:01:26,700 --> 00:01:31,210
and we'll come back to what that is a little later on in this video.
28
00:01:31,210 --> 00:01:34,880
If we use pointers instead of using variables,
29
00:01:34,880 --> 00:01:38,180
or instead of using the variables themselves or copies of the variables,
30
00:01:38,180 --> 00:01:43,790
we can now pass the variables around between functions in a different way.
31
00:01:43,790 --> 00:01:46,550
This means that if we make a change in one function,
32
00:01:46,550 --> 00:01:49,827
that change will actually take effect in a different function.
33
00:01:49,827 --> 00:01:52,160
Again, this is something that we couldn't do previously,
34
00:01:52,160 --> 00:01:56,979
and if you've ever tried to swap the value of two variables in a function,
35
00:01:56,979 --> 00:01:59,270
you've noticed this problem sort of creeping up, right?
36
00:01:59,270 --> 00:02:04,340
>> If we want to swap X and Y, and we pass them to a function called swap,
37
00:02:04,340 --> 00:02:08,680
inside of the function swap the variables do exchange values.
38
00:02:08,680 --> 00:02:12,600
One becomes two, two becomes one, but we don't actually
39
00:02:12,600 --> 00:02:16,890
change anything in the original function, in the caller.
40
00:02:16,890 --> 00:02:19,550
Because we can't, we're only working with copies of them.
41
00:02:19,550 --> 00:02:24,760
With pointers though, we can actually pass X and Y to a function.
42
00:02:24,760 --> 00:02:26,960
That function can do something with them.
43
00:02:26,960 --> 00:02:29,250
And those variables values can actually change.
44
00:02:29,250 --> 00:02:33,710
So that's quite a change in our ability to work with data.
45
00:02:33,710 --> 00:02:36,100
>> Before we dive into pointers, I think it's worth
46
00:02:36,100 --> 00:02:38,580
taking a few minutes to go back to basics here.
47
00:02:38,580 --> 00:02:41,000
And have a look at how computer memory works
48
00:02:41,000 --> 00:02:45,340
because these two subjects are going to actually be pretty interrelated.
49
00:02:45,340 --> 00:02:48,480
As you probably know, on your computer system
50
00:02:48,480 --> 00:02:51,310
you have a hard drive or perhaps a solid state drive,
51
00:02:51,310 --> 00:02:54,430
some sort of file storage location.
52
00:02:54,430 --> 00:02:57,950
It's usually somewhere in the neighborhood of 250 gigabytes
53
00:02:57,950 --> 00:02:59,810
to maybe a couple of terabytes now.
54
00:02:59,810 --> 00:03:02,270
And it's where all of your files ultimately live,
55
00:03:02,270 --> 00:03:04,870
even when your computer is shut off, you can turn it back on
56
00:03:04,870 --> 00:03:09,190
and you'll find your files are there again when you reboot your system.
57
00:03:09,190 --> 00:03:14,820
But disk drives, like a hard disk drive, an HDD, or a solid state drive, an SSD,
58
00:03:14,820 --> 00:03:16,050
are just storage space.
59
00:03:16,050 --> 00:03:20,400
>> We can't actually do anything with the data that is in hard disk,
60
00:03:20,400 --> 00:03:22,080
or in a solid state drive.
61
00:03:22,080 --> 00:03:24,950
In order to actually change data or move it around,
62
00:03:24,950 --> 00:03:28,800
we have to move it to RAM, random access memory.
63
00:03:28,800 --> 00:03:31,170
Now RAM, you have a lot less of in your computer.
64
00:03:31,170 --> 00:03:34,185
You may have somewhere in the neighborhood of 512 megabytes
65
00:03:34,185 --> 00:03:38,850
if you have an older computer, to maybe two, four, eight, 16,
66
00:03:38,850 --> 00:03:41,820
possibly even a little more, gigabytes of RAM.
67
00:03:41,820 --> 00:03:46,390
So that's much smaller, but that's where all of the volatile data exists.
68
00:03:46,390 --> 00:03:48,270
That's where we can change things.
69
00:03:48,270 --> 00:03:53,350
But when we turn our computer off, all of the data in RAM is destroyed.
70
00:03:53,350 --> 00:03:57,150
>> So that's why we need to have hard disk for the more permanent location of it,
71
00:03:57,150 --> 00:03:59,720
so that it exists- it would be really bad if every time we
72
00:03:59,720 --> 00:04:03,310
turned our computer off, every file in our system was obliterated.
73
00:04:03,310 --> 00:04:05,600
So we work inside of RAM.
74
00:04:05,600 --> 00:04:09,210
And every time we're talking about memory, pretty much, in CS50,
75
00:04:09,210 --> 00:04:15,080
we're talking about RAM, not hard disk.
76
00:04:15,080 --> 00:04:18,657
>> So when we move things into memory, it takes up a certain amount of space.
77
00:04:18,657 --> 00:04:20,740
All of the data types that we've been working with
78
00:04:20,740 --> 00:04:23,480
take up different amounts of space in RAM.
79
00:04:23,480 --> 00:04:27,600
So every time you create an integer variable, four bytes of memory
80
00:04:27,600 --> 00:04:30,750
are set aside in RAM so you can work with that integer.
81
00:04:30,750 --> 00:04:34,260
You can declare the integer, change it, assign it
82
00:04:34,260 --> 00:04:36,700
to a value 10 incremented by one, so on and so on.
83
00:04:36,700 --> 00:04:39,440
All that needs to happen in RAM, and you get four bytes
84
00:04:39,440 --> 00:04:42,550
to work with for every integer that you create.
85
00:04:42,550 --> 00:04:45,410
>> Every character you create gets one byte.
86
00:04:45,410 --> 00:04:48,160
That's just how much space is needed to store a character.
87
00:04:48,160 --> 00:04:51,310
Every float, a real number, gets four bytes
88
00:04:51,310 --> 00:04:53,390
unless it's a double precision floating point
89
00:04:53,390 --> 00:04:56,510
number, which allows you to have more precise or more digits
90
00:04:56,510 --> 00:04:59,300
after the decimal point without losing precision,
91
00:04:59,300 --> 00:05:01,820
which take up eight bytes of memory.
92
00:05:01,820 --> 00:05:06,730
Long longs, really big integers, also take up eight bytes of memory.
93
00:05:06,730 --> 00:05:09,000
How many bytes of memory do strings take up?
94
00:05:09,000 --> 00:05:12,990
Well let's put a pin in that question for now, but we'll come back to it.
95
00:05:12,990 --> 00:05:17,350
>> So back to this idea of memory as a big array of byte-sized cells.
96
00:05:17,350 --> 00:05:20,871
That's really all it is, it's just a huge array of cells,
97
00:05:20,871 --> 00:05:23,370
just like any other array that you're familiar with and see,
98
00:05:23,370 --> 00:05:26,430
except every element is one byte wide.
99
00:05:26,430 --> 00:05:30,030
And just like an array, every element has an address.
100
00:05:30,030 --> 00:05:32,120
Every element of an array has an index, and we
101
00:05:32,120 --> 00:05:36,302
can use that index to do so-called random access on the array.
102
00:05:36,302 --> 00:05:38,510
We don't have to start at the beginning of the array,
103
00:05:38,510 --> 00:05:40,569
iterate through every single element thereof,
104
00:05:40,569 --> 00:05:41,860
to find what we're looking for.
105
00:05:41,860 --> 00:05:45,790
We can just say, I want to get to the 15th element or the 100th element.
106
00:05:45,790 --> 00:05:49,930
And you can just pass in that number and get the value you're looking for.
107
00:05:49,930 --> 00:05:54,460
>> Similarly every location in memory has an address.
108
00:05:54,460 --> 00:05:57,320
So your memory might look something like this.
109
00:05:57,320 --> 00:06:01,420
Here's a very small chunk of memory, this is 20 bytes of memory.
110
00:06:01,420 --> 00:06:04,060
The first 20 bytes because my addresses there at the bottom
111
00:06:04,060 --> 00:06:08,890
are 0, 1, 2, 3, and so on all the way up to 19.
112
00:06:08,890 --> 00:06:13,190
And when I declare variables and when I start to work with them,
113
00:06:13,190 --> 00:06:15,470
the system is going to set aside some space for me
114
00:06:15,470 --> 00:06:17,595
in this memory to work with my variables.
115
00:06:17,595 --> 00:06:21,610
So I might say, char c equals capital H. And what's going to happen?
116
00:06:21,610 --> 00:06:23,880
Well the system is going to set aside for me one byte.
117
00:06:23,880 --> 00:06:27,870
In this case it chose byte number four, the byte at address four,
118
00:06:27,870 --> 00:06:31,310
and it's going to store the letter capital H in there for me.
119
00:06:31,310 --> 00:06:34,350
If I then say int speed limit equals 65, it's
120
00:06:34,350 --> 00:06:36,806
going to set aside four bytes of memory for me.
121
00:06:36,806 --> 00:06:39,180
And it's going to treat those four bytes as a single unit
122
00:06:39,180 --> 00:06:41,305
because what we're working with is an integer here.
123
00:06:41,305 --> 00:06:44,350
And it's going to store 65 in there.
124
00:06:44,350 --> 00:06:47,000
>> Now already I'm kind of telling you a bit of a lie,
125
00:06:47,000 --> 00:06:50,150
right, because we know that computers work in binary.
126
00:06:50,150 --> 00:06:53,100
They don't understand necessarily what a capital H is
127
00:06:53,100 --> 00:06:57,110
or what a 65 is, they only understand binary, zeros and ones.
128
00:06:57,110 --> 00:06:59,000
And so actually what we're storing in there
129
00:06:59,000 --> 00:07:03,450
is not the letter H and the number 65, but rather the binary representations
130
00:07:03,450 --> 00:07:06,980
thereof, which look a little something like this.
131
00:07:06,980 --> 00:07:10,360
And in particular in the context of the integer variable,
132
00:07:10,360 --> 00:07:13,559
it's not going to just spit it into, it's not going to treat it as one four
133
00:07:13,559 --> 00:07:15,350
byte chunk necessarily, it's actually going
134
00:07:15,350 --> 00:07:19,570
to treat it as four one byte chunks, which might look something like this.
135
00:07:19,570 --> 00:07:22,424
And even this isn't entirely true either,
136
00:07:22,424 --> 00:07:24,840
because of something called an endianness, which we're not
137
00:07:24,840 --> 00:07:26,965
going to get into now, but if you're curious about,
138
00:07:26,965 --> 00:07:29,030
you can read up on little and big endianness.
139
00:07:29,030 --> 00:07:31,640
But for the sake of this argument, for the sake of this video,
140
00:07:31,640 --> 00:07:34,860
let's just assume that is, in fact, how the number 65 would
141
00:07:34,860 --> 00:07:36,970
be represented in memory on every system,
142
00:07:36,970 --> 00:07:38,850
although it's not entirely true.
143
00:07:38,850 --> 00:07:41,700
>> But let's actually just get rid of all binary entirely,
144
00:07:41,700 --> 00:07:44,460
and just think about as H and 65, it's a lot easier
145
00:07:44,460 --> 00:07:47,900
to think about it like that as a human being.
146
00:07:47,900 --> 00:07:51,420
All right, so it also seems maybe a little random that I've- my system
147
00:07:51,420 --> 00:07:55,130
didn't give me bytes 5, 6, 7, and 8 to store the integer.
148
00:07:55,130 --> 00:07:58,580
There's a reason for that, too, which we won't get into right now, but suffice
149
00:07:58,580 --> 00:08:00,496
it to say that what the computer is doing here
150
00:08:00,496 --> 00:08:02,810
is probably a good move on its part.
151
00:08:02,810 --> 00:08:06,020
To not give me memory that's necessarily back to back.
152
00:08:06,020 --> 00:08:10,490
Although it's going to do it now if I want to get another string,
153
00:08:10,490 --> 00:08:13,080
called surname, and I want to put Lloyd in there.
154
00:08:13,080 --> 00:08:18,360
I'm going to need to fit one character, each letter of that's
155
00:08:18,360 --> 00:08:21,330
going to require one character, one byte of memory.
156
00:08:21,330 --> 00:08:26,230
So if I could put Lloyd into my array like this I'm pretty good to go, right?
157
00:08:26,230 --> 00:08:28,870
What's missing?
158
00:08:28,870 --> 00:08:31,840
>> Remember that every string we work with in C ends with backslash zero,
159
00:08:31,840 --> 00:08:33,339
and we can't omit that here, either.
160
00:08:33,339 --> 00:08:36,090
We need to set aside one byte of memory to hold that so we
161
00:08:36,090 --> 00:08:39,130
know when our string has ended.
162
00:08:39,130 --> 00:08:41,049
So again this arrangement of the way things
163
00:08:41,049 --> 00:08:42,799
appear in memory might be a little random,
164
00:08:42,799 --> 00:08:44,870
but it actually is how most systems are designed.
165
00:08:44,870 --> 00:08:48,330
To line them up on multiples of four, for reasons again
166
00:08:48,330 --> 00:08:50,080
that we don't need to get into right now.
167
00:08:50,080 --> 00:08:53,060
But this, so suffice it to say that after these three lines of code,
168
00:08:53,060 --> 00:08:54,810
this is what memory might look like.
169
00:08:54,810 --> 00:08:58,930
If I need memory locations 4, 8, and 12 to hold my data,
170
00:08:58,930 --> 00:09:01,100
this is what my memory might look like.
171
00:09:01,100 --> 00:09:04,062
>> And just be particularly pedantic here, when
172
00:09:04,062 --> 00:09:06,020
we're talking about memory addresses we usually
173
00:09:06,020 --> 00:09:08,390
do so using hexadecimal notations.
174
00:09:08,390 --> 00:09:12,030
So why don't we convert all of these from decimal to hexadecimal notation
175
00:09:12,030 --> 00:09:15,010
just because that's generally how we refer to memory.
176
00:09:15,010 --> 00:09:17,880
So instead of being 0 through 19, what we have is zero
177
00:09:17,880 --> 00:09:20,340
x zero through zero x1 three.
178
00:09:20,340 --> 00:09:23,790
Those are the 20 bytes of memory that we have or we're looking at in this image
179
00:09:23,790 --> 00:09:25,540
right here.
180
00:09:25,540 --> 00:09:29,310
>> So all of that being said, let's step away from memory for a second
181
00:09:29,310 --> 00:09:30,490
and back to pointers.
182
00:09:30,490 --> 00:09:32,420
Here is the most important thing to remember
183
00:09:32,420 --> 00:09:34,070
as we start working with pointers.
184
00:09:34,070 --> 00:09:36,314
A pointer is nothing more than an address.
185
00:09:36,314 --> 00:09:38,230
I'll say it again because it's that important,
186
00:09:38,230 --> 00:09:42,730
a pointer is nothing more than an address.
187
00:09:42,730 --> 00:09:47,760
Pointers are addresses to locations in memory where variables live.
188
00:09:47,760 --> 00:09:52,590
Knowing that it becomes hopefully a little bit easier to work with them.
189
00:09:52,590 --> 00:09:54,550
Another thing I like to do is to have sort
190
00:09:54,550 --> 00:09:58,510
of diagrams visually representing what's happening with various lines of code.
191
00:09:58,510 --> 00:10:00,660
And we'll do this a couple of times in pointers,
192
00:10:00,660 --> 00:10:03,354
and when we talk about dynamic memory allocation as well.
193
00:10:03,354 --> 00:10:06,020
Because I think that these diagrams can be particularly helpful.
194
00:10:06,020 --> 00:10:09,540
>> So if I say for example, int k in my code, what is happening?
195
00:10:09,540 --> 00:10:12,524
Well what's basically happening is I'm getting memory set aside for me,
196
00:10:12,524 --> 00:10:14,690
but I don't even like to think about it like that, I
197
00:10:14,690 --> 00:10:16,300
like to think about it like a box.
198
00:10:16,300 --> 00:10:20,090
I have a box and it's colored green because I
199
00:10:20,090 --> 00:10:21,750
can put integers in green boxes.
200
00:10:21,750 --> 00:10:23,666
If it was a character I might have a blue box.
201
00:10:23,666 --> 00:10:27,290
But I always say, if I'm creating a box that can hold integers
202
00:10:27,290 --> 00:10:28,950
that box is colored green.
203
00:10:28,950 --> 00:10:33,020
And I take a permanent marker and I write k on the side of it.
204
00:10:33,020 --> 00:10:37,590
So I have a box called k, into which I can put integers.
205
00:10:37,590 --> 00:10:41,070
So when I say int k, that's what happens in my head.
206
00:10:41,070 --> 00:10:43,140
If I say k equals five, what am I doing?
207
00:10:43,140 --> 00:10:45,110
Well, I'm putting five in the box, right.
208
00:10:45,110 --> 00:10:48,670
This is pretty straightforward, if I say int k, create a box called k.
209
00:10:48,670 --> 00:10:52,040
If I say k equals 5, put five into the box.
210
00:10:52,040 --> 00:10:53,865
Hopefully that's not too much of a leap.
211
00:10:53,865 --> 00:10:55,990
Here's where things go a little interesting though.
212
00:10:55,990 --> 00:11:02,590
If I say int*pk, well even if I don't know what this necessarily means,
213
00:11:02,590 --> 00:11:06,150
it's clearly got something to do with an integer.
214
00:11:06,150 --> 00:11:08,211
So I'm going to color this box green-ish,
215
00:11:08,211 --> 00:11:10,210
I know it's got something to do with an integer,
216
00:11:10,210 --> 00:11:13,400
but it's not an integer itself, because it's an int star.
217
00:11:13,400 --> 00:11:15,390
There's something slightly different about it.
218
00:11:15,390 --> 00:11:17,620
So an integer's involved, but otherwise it's
219
00:11:17,620 --> 00:11:19,830
not too different from what we were talking about.
220
00:11:19,830 --> 00:11:24,240
It's a box, its got a label, it's wearing a label pk,
221
00:11:24,240 --> 00:11:27,280
and it's capable of holding int stars, whatever those are.
222
00:11:27,280 --> 00:11:29,894
They have something to do with integers, clearly.
223
00:11:29,894 --> 00:11:31,060
Here's the last line though.
224
00:11:31,060 --> 00:11:37,650
If I say pk=&k, whoa, what just happened, right?
225
00:11:37,650 --> 00:11:41,820
So this random number, seemingly random number, gets thrown into the box there.
226
00:11:41,820 --> 00:11:44,930
All that is, is pk gets the address of k.
227
00:11:44,930 --> 00:11:52,867
So I'm sticking where k lives in memory, its address, the address of its bytes.
228
00:11:52,867 --> 00:11:55,200
All I'm doing is I'm saying that value is what I'm going
229
00:11:55,200 --> 00:11:59,430
to put inside of my box called pk.
230
00:11:59,430 --> 00:12:02,080
And because these things are pointers, and because looking
231
00:12:02,080 --> 00:12:04,955
at a string like zero x eight zero c seven four eight
232
00:12:04,955 --> 00:12:07,790
two zero is probably not very meaningful.
233
00:12:07,790 --> 00:12:12,390
When we generally visualize pointers, we actually do so as pointers.
234
00:12:12,390 --> 00:12:17,000
Pk gives us the information we need to find k in memory.
235
00:12:17,000 --> 00:12:19,120
So basically pk has an arrow in it.
236
00:12:19,120 --> 00:12:21,670
And if we walk the length of that arrow, imagine
237
00:12:21,670 --> 00:12:25,280
it's something you can walk on, if we walk along the length of the arrow,
238
00:12:25,280 --> 00:12:29,490
at the very tip of that arrow, we will find the location in memory
239
00:12:29,490 --> 00:12:31,390
where k lives.
240
00:12:31,390 --> 00:12:34,360
And that's really important because once we know where k lives,
241
00:12:34,360 --> 00:12:37,870
we can start to work with the data inside of that memory location.
242
00:12:37,870 --> 00:12:40,780
Though we're getting a teeny bit ahead of ourselves for now.
243
00:12:40,780 --> 00:12:42,240
>> So what is a pointer?
244
00:12:42,240 --> 00:12:45,590
A pointer is a data item whose value is a memory address.
245
00:12:45,590 --> 00:12:49,740
That was that zero x eight zero stuff going on, that was a memory address.
246
00:12:49,740 --> 00:12:52,060
That was a location in memory.
247
00:12:52,060 --> 00:12:55,080
And the type of a pointer describes the kind
248
00:12:55,080 --> 00:12:56,930
of data you'll find at that memory address.
249
00:12:56,930 --> 00:12:58,810
So there's the int star part right.
250
00:12:58,810 --> 00:13:03,690
If I follow that arrow, it's going to lead me to a location.
251
00:13:03,690 --> 00:13:06,980
And that location, what I will find there in my example,
252
00:13:06,980 --> 00:13:08,240
is a green colored box.
253
00:13:08,240 --> 00:13:12,650
It's an integer, that's what I will find if I go to that address.
254
00:13:12,650 --> 00:13:14,830
The data type of a pointer describes what
255
00:13:14,830 --> 00:13:17,936
you will find at that memory address.
256
00:13:17,936 --> 00:13:19,560
So here's the really cool thing though.
257
00:13:19,560 --> 00:13:25,090
Pointers allow us to pass variables between functions.
258
00:13:25,090 --> 00:13:28,520
And actually pass variables and not pass copies of them.
259
00:13:28,520 --> 00:13:32,879
Because if we know exactly where in memory to find a variable,
260
00:13:32,879 --> 00:13:35,670
we don't need to make a copy of it, we can just go to that location
261
00:13:35,670 --> 00:13:37,844
and work with that variable.
262
00:13:37,844 --> 00:13:40,260
So in essence pointers sort of make a computer environment
263
00:13:40,260 --> 00:13:42,360
a lot more like the real world, right.
264
00:13:42,360 --> 00:13:44,640
>> So here's an analogy.
265
00:13:44,640 --> 00:13:48,080
Let's say that I have a notebook, right, and it's full of notes.
266
00:13:48,080 --> 00:13:50,230
And I would like you to update it.
267
00:13:50,230 --> 00:13:53,960
You are a function that updates notes, right.
268
00:13:53,960 --> 00:13:56,390
In the way we've been working so far, what
269
00:13:56,390 --> 00:14:02,370
happens is you will take my notebook, you'll go to the copy store,
270
00:14:02,370 --> 00:14:06,410
you'll make a Xerox copy of every page of the notebook.
271
00:14:06,410 --> 00:14:09,790
You'll leave my notebook back on my desk when you're done,
272
00:14:09,790 --> 00:14:14,600
you'll go and cross out things in my notebook that are out of date or wrong,
273
00:14:14,600 --> 00:14:19,280
and then you'll pass back to me the stack of Xerox pages
274
00:14:19,280 --> 00:14:22,850
that is a replica of my notebook with the changes that you've made to it.
275
00:14:22,850 --> 00:14:27,040
And at that point, it's up to me as the calling function, as the caller,
276
00:14:27,040 --> 00:14:30,582
to decide to take your notes and integrate them back into my notebook.
277
00:14:30,582 --> 00:14:32,540
So there's a lot of steps involved here, right.
278
00:14:32,540 --> 00:14:34,850
Like wouldn't it be better if I just say, hey, can you
279
00:14:34,850 --> 00:14:38,370
update my notebook for me, hand you my notebook,
280
00:14:38,370 --> 00:14:40,440
and you take things and literally cross them out
281
00:14:40,440 --> 00:14:42,810
and update my notes in my notebook.
282
00:14:42,810 --> 00:14:45,140
And then give me my notebook back.
283
00:14:45,140 --> 00:14:47,320
That's kind of what pointers allow us to do,
284
00:14:47,320 --> 00:14:51,320
they make this environment a lot more like how we operate in reality.
285
00:14:51,320 --> 00:14:54,640
>> All right so that's what a pointer is, let's talk
286
00:14:54,640 --> 00:14:58,040
about how pointers work in C, and how we can start to work with them.
287
00:14:58,040 --> 00:15:02,550
So there's a very simple pointer in C called the null pointer.
288
00:15:02,550 --> 00:15:04,830
The null pointer points to nothing.
289
00:15:04,830 --> 00:15:08,310
This probably seems like it's actually not a very useful thing,
290
00:15:08,310 --> 00:15:10,500
but as we'll see a little later on, the fact
291
00:15:10,500 --> 00:15:15,410
that this null pointer exists actually really can come in handy.
292
00:15:15,410 --> 00:15:19,090
And whenever you create a pointer, and you don't set its value immediately-
293
00:15:19,090 --> 00:15:21,060
an example of setting its value immediately
294
00:15:21,060 --> 00:15:25,401
will be a couple slides back where I said pk equals & k,
295
00:15:25,401 --> 00:15:28,740
pk gets k's address, as we'll see what that means,
296
00:15:28,740 --> 00:15:32,990
we'll see how to code that shortly- if we don't set its value to something
297
00:15:32,990 --> 00:15:35,380
meaningful immediately, you should always
298
00:15:35,380 --> 00:15:37,480
set your pointer to point to null.
299
00:15:37,480 --> 00:15:40,260
You should set it to point to nothing.
300
00:15:40,260 --> 00:15:43,614
>> That's very different than just leaving the value as it is
301
00:15:43,614 --> 00:15:45,530
and then declaring a pointer and just assuming
302
00:15:45,530 --> 00:15:48,042
it's null because that's rarely true.
303
00:15:48,042 --> 00:15:50,000
So you should always set the value of a pointer
304
00:15:50,000 --> 00:15:55,690
to null if you don't set its value to something meaningful immediately.
305
00:15:55,690 --> 00:15:59,090
You can check whether a pointer's value is null using the equality operator
306
00:15:59,090 --> 00:16:05,450
(==), just like you compare any integer values or character values using (==)
307
00:16:05,450 --> 00:16:06,320
as well.
308
00:16:06,320 --> 00:16:10,994
It's a special sort of constant value that you can use to test.
309
00:16:10,994 --> 00:16:13,160
So that was a very simple pointer, the null pointer.
310
00:16:13,160 --> 00:16:15,320
Another way to create a pointer is to extract
311
00:16:15,320 --> 00:16:18,240
the address of a variable you've already created,
312
00:16:18,240 --> 00:16:22,330
and you do this using the & operator address extraction.
313
00:16:22,330 --> 00:16:26,720
Which we've already seen previously in the first diagram example I showed.
314
00:16:26,720 --> 00:16:31,450
So if x is a variable that we've already created of type integer,
315
00:16:31,450 --> 00:16:35,110
then &x is a pointer to an integer.
316
00:16:35,110 --> 00:16:39,810
&x is- remember, & is going to extract the address of the thing on the right.
317
00:16:39,810 --> 00:16:45,350
And since a pointer is just an address, than &x is a pointer to an integer
318
00:16:45,350 --> 00:16:48,560
whose value is where in memory x lives.
319
00:16:48,560 --> 00:16:50,460
It's x's address.
320
00:16:50,460 --> 00:16:53,296
So &x is the address of x.
321
00:16:53,296 --> 00:16:55,670
Let's take this one step further and connect to something
322
00:16:55,670 --> 00:16:58,380
I alluded to in a prior video.
323
00:16:58,380 --> 00:17:06,730
If arr is an array of doubles, then &arr square bracket i is a pointer
324
00:17:06,730 --> 00:17:08,109
to a double.
325
00:17:08,109 --> 00:17:08,970
OK.
326
00:17:08,970 --> 00:17:12,160
arr square bracket i, if arr is an array of doubles,
327
00:17:12,160 --> 00:17:19,069
then arr square bracket i is the i-th element of that array,
328
00:17:19,069 --> 00:17:29,270
and &arr square bracket i is where in memory the i-th element of arr exists.
329
00:17:29,270 --> 00:17:31,790
>> So what's the implication here?
330
00:17:31,790 --> 00:17:34,570
An arrays name, the implication of this whole thing,
331
00:17:34,570 --> 00:17:39,290
is that an array's name is actually itself a pointer.
332
00:17:39,290 --> 00:17:41,170
You've been working with pointers all along
333
00:17:41,170 --> 00:17:45,290
every time that you've used an array.
334
00:17:45,290 --> 00:17:49,090
Remember from the example on variable scope,
335
00:17:49,090 --> 00:17:53,420
near the end of the video I present an example where we have a function
336
00:17:53,420 --> 00:17:56,890
called set int and a function called set array.
337
00:17:56,890 --> 00:18:00,490
And your challenge to determine whether or not, or what the
338
00:18:00,490 --> 00:18:03,220
values that we printed out the end of the function,
339
00:18:03,220 --> 00:18:05,960
at the end of the main program.
340
00:18:05,960 --> 00:18:08,740
>> If you recall from that example or if you've watched the video,
341
00:18:08,740 --> 00:18:13,080
you know that when you- the call to set int effectively does nothing.
342
00:18:13,080 --> 00:18:16,390
But the call to set array does.
343
00:18:16,390 --> 00:18:19,280
And I sort of glossed over why that was the case at the time.
344
00:18:19,280 --> 00:18:22,363
I just said, well it's an array, it's special, you know, there's a reason.
345
00:18:22,363 --> 00:18:25,020
The reason is that an array's name is really just a pointer,
346
00:18:25,020 --> 00:18:28,740
and there's this special square bracket syntax that
347
00:18:28,740 --> 00:18:30,510
make things a lot nicer to work with.
348
00:18:30,510 --> 00:18:34,410
And they make the idea of a pointer a lot less intimidating,
349
00:18:34,410 --> 00:18:36,800
and that's why they're sort of presented in that way.
350
00:18:36,800 --> 00:18:38,600
But really arrays are just pointers.
351
00:18:38,600 --> 00:18:41,580
And that's why when we made a change to the array,
352
00:18:41,580 --> 00:18:44,880
when we passed an array as a parameter to a function or as an argument
353
00:18:44,880 --> 00:18:50,110
to a function, the contents of the array actually changed in both the callee
354
00:18:50,110 --> 00:18:51,160
and in the caller.
355
00:18:51,160 --> 00:18:55,846
Which for every other kind of variable we saw was not the case.
356
00:18:55,846 --> 00:18:58,970
So that's just something to keep in mind when you're working with pointers,
357
00:18:58,970 --> 00:19:01,610
is that the name of an array actually a pointer
358
00:19:01,610 --> 00:19:04,750
to the first element of that array.
359
00:19:04,750 --> 00:19:08,930
>> OK so now we have all these facts, let's keep going, right.
360
00:19:08,930 --> 00:19:11,370
Why do we care about where something lives.
361
00:19:11,370 --> 00:19:14,120
Well like I said, it's pretty useful to know where something lives
362
00:19:14,120 --> 00:19:17,240
so you can go there and change it.
363
00:19:17,240 --> 00:19:19,390
Work with it and actually have the thing that you
364
00:19:19,390 --> 00:19:23,710
want to do to that variable take effect, and not take effect on some copy of it.
365
00:19:23,710 --> 00:19:26,150
This is called dereferencing.
366
00:19:26,150 --> 00:19:28,690
We go to the reference and we change the value there.
367
00:19:28,690 --> 00:19:32,660
So if we have a pointer and it's called pc, and it points to a character,
368
00:19:32,660 --> 00:19:40,610
then we can say *pc and *pc is the name of what we'll find if we go
369
00:19:40,610 --> 00:19:42,910
to the address pc.
370
00:19:42,910 --> 00:19:47,860
What we'll find there is a character and *pc is how we refer to the data at that
371
00:19:47,860 --> 00:19:48,880
location.
372
00:19:48,880 --> 00:19:54,150
So we could say something like *pc=D or something like that,
373
00:19:54,150 --> 00:19:59,280
and that means that whatever was at memory address pc,
374
00:19:59,280 --> 00:20:07,040
whatever character was previously there, is now D, if we say *pc=D.
375
00:20:07,040 --> 00:20:10,090
>> So here we go again with some weird C stuff, right.
376
00:20:10,090 --> 00:20:14,560
So we've seen * previously as being somehow part of the data type,
377
00:20:14,560 --> 00:20:17,160
and now it's being used in a slightly different context
378
00:20:17,160 --> 00:20:19,605
to access the data at a location.
379
00:20:19,605 --> 00:20:22,480
I know it's a little confusing and that's actually part of this whole
380
00:20:22,480 --> 00:20:25,740
like, why pointers have this mythology around them as being so complex,
381
00:20:25,740 --> 00:20:28,250
is kind of a syntax problem, honestly.
382
00:20:28,250 --> 00:20:31,810
But * is used in both contexts, both as part of the type name,
383
00:20:31,810 --> 00:20:34,100
and we'll see a little later something else, too.
384
00:20:34,100 --> 00:20:36,490
And right now is the dereference operator.
385
00:20:36,490 --> 00:20:38,760
So it goes to the reference, it accesses the data
386
00:20:38,760 --> 00:20:43,000
at the location of the pointer, and allows you to manipulate it at will.
387
00:20:43,000 --> 00:20:45,900
>> Now this is very similar to visiting your neighbor, right.
388
00:20:45,900 --> 00:20:48,710
If you know what your neighbor lives, you're
389
00:20:48,710 --> 00:20:50,730
not hanging out with your neighbor.
390
00:20:50,730 --> 00:20:53,510
You know you happen to know where they live,
391
00:20:53,510 --> 00:20:56,870
but that doesn't mean that by virtue of having that knowledge
392
00:20:56,870 --> 00:20:59,170
you are interacting with them.
393
00:20:59,170 --> 00:21:01,920
If you want to interact with them, you have to go to their house,
394
00:21:01,920 --> 00:21:03,760
you have to go to where they live.
395
00:21:03,760 --> 00:21:07,440
And once you do that, then you can interact
396
00:21:07,440 --> 00:21:09,420
with them just like you'd want to.
397
00:21:09,420 --> 00:21:12,730
And similarly with variables, you need to go to their address
398
00:21:12,730 --> 00:21:15,320
if you want to interact them, you can't just know the address.
399
00:21:15,320 --> 00:21:21,495
And the way you go to the address is to use *, the dereference operator.
400
00:21:21,495 --> 00:21:23,620
What do you think happens if we try and dereference
401
00:21:23,620 --> 00:21:25,260
a pointer whose value is null?
402
00:21:25,260 --> 00:21:28,470
Recall that the null pointer points to nothing.
403
00:21:28,470 --> 00:21:34,110
So if you try and dereference nothing or go to an address nothing,
404
00:21:34,110 --> 00:21:36,800
what do you think happens?
405
00:21:36,800 --> 00:21:39,630
Well if you guessed segmentation fault, you'd be right.
406
00:21:39,630 --> 00:21:41,390
If you try and dereference a null pointer,
407
00:21:41,390 --> 00:21:43,140
you suffer a segmentation fault. But wait,
408
00:21:43,140 --> 00:21:45,820
didn't I tell you, that if you're not going
409
00:21:45,820 --> 00:21:49,220
to set your value of your pointer to something meaningful,
410
00:21:49,220 --> 00:21:51,000
you should set to null?
411
00:21:51,000 --> 00:21:55,290
I did and actually the segmentation fault is kind of a good behavior.
412
00:21:55,290 --> 00:21:58,680
>> Have you ever declared a variable and not assigned its value immediately?
413
00:21:58,680 --> 00:22:02,680
So you just say int x; you don't actually assign it to anything
414
00:22:02,680 --> 00:22:05,340
and then later on in your code, you print out the value of x,
415
00:22:05,340 --> 00:22:07,650
having still not assigned it to anything.
416
00:22:07,650 --> 00:22:10,370
Frequently you'll get zero, but sometimes you
417
00:22:10,370 --> 00:22:15,000
might get some random number, and you have no idea where it came from.
418
00:22:15,000 --> 00:22:16,750
Similarly can things happen with pointers.
419
00:22:16,750 --> 00:22:20,110
When you declare a pointer int*pk for example,
420
00:22:20,110 --> 00:22:23,490
and you don't assign it to a value, you get four bytes for memory.
421
00:22:23,490 --> 00:22:25,950
Whatever four bytes of memory the system can
422
00:22:25,950 --> 00:22:28,970
find that have some meaningful value.
423
00:22:28,970 --> 00:22:31,760
And there might have been something already there that
424
00:22:31,760 --> 00:22:34,190
is no longer needed by another function, so you just have
425
00:22:34,190 --> 00:22:35,900
whatever data was there.
426
00:22:35,900 --> 00:22:40,570
>> What if you tried to do dereference some address that you don't- there were
427
00:22:40,570 --> 00:22:43,410
already bytes and information in there, that's now in your pointer.
428
00:22:43,410 --> 00:22:47,470
If you try and dereference that pointer, you might be messing with some memory
429
00:22:47,470 --> 00:22:49,390
that you didn't intend to mess with it all.
430
00:22:49,390 --> 00:22:51,639
And in fact you could do something really devastating,
431
00:22:51,639 --> 00:22:54,880
like break another program, or break another function,
432
00:22:54,880 --> 00:22:58,289
or do something malicious that you didn't intend to do at all.
433
00:22:58,289 --> 00:23:00,080
And so that's why it's actually a good idea
434
00:23:00,080 --> 00:23:04,030
to set your pointers to null if you don't set them to something meaningful.
435
00:23:04,030 --> 00:23:06,760
It's probably better at the end of the day for your program
436
00:23:06,760 --> 00:23:09,840
to crash then for it to do something that screws up
437
00:23:09,840 --> 00:23:12,400
another program or another function.
438
00:23:12,400 --> 00:23:15,207
That behavior is probably even less ideal than just crashing.
439
00:23:15,207 --> 00:23:17,040
And so that's why it's actually a good habit
440
00:23:17,040 --> 00:23:20,920
to get into to set your pointers to null if you don't set them
441
00:23:20,920 --> 00:23:24,540
to a meaningful value immediately, a value that you know
442
00:23:24,540 --> 00:23:27,260
and that you can safely the dereference.
443
00:23:27,260 --> 00:23:32,240
>> So let's come back now and take a look at the overall syntax of the situation.
444
00:23:32,240 --> 00:23:37,400
If I say int *p;, what have I just done?
445
00:23:37,400 --> 00:23:38,530
What I've done is this.
446
00:23:38,530 --> 00:23:43,290
I know the value of p is an address because all pointers are just
447
00:23:43,290 --> 00:23:44,660
addresses.
448
00:23:44,660 --> 00:23:47,750
I can dereference p using the * operator.
449
00:23:47,750 --> 00:23:51,250
In this context here, at the very top recall the * is part of the type.
450
00:23:51,250 --> 00:23:53,510
Int * is the data type.
451
00:23:53,510 --> 00:23:56,150
But I can dereference p using the * operator,
452
00:23:56,150 --> 00:24:01,897
and if I do so, if I go to that address, what will I find at that address?
453
00:24:01,897 --> 00:24:02,855
I will find an integer.
454
00:24:02,855 --> 00:24:05,910
So int*p is basically saying, p is an address.
455
00:24:05,910 --> 00:24:09,500
I can dereference p and if I do, I will find an integer
456
00:24:09,500 --> 00:24:11,920
at that memory location.
457
00:24:11,920 --> 00:24:14,260
>> OK so I said there was another annoying thing with stars
458
00:24:14,260 --> 00:24:17,060
and here's where that annoying thing with stars is.
459
00:24:17,060 --> 00:24:21,640
Have you ever tried to declare multiple variables of the same type
460
00:24:21,640 --> 00:24:24,409
on the same line of code?
461
00:24:24,409 --> 00:24:27,700
So for a second, pretend that the line, the code I actually have there in green
462
00:24:27,700 --> 00:24:29,366
isn't there and it just says int x,y,z;.
463
00:24:29,366 --> 00:24:31,634
464
00:24:31,634 --> 00:24:34,550
What that would do is actually create three integer variables for you,
465
00:24:34,550 --> 00:24:36,930
one called x, one called y, and one called z.
466
00:24:36,930 --> 00:24:41,510
It's a way to do it without having to split onto three lines.
467
00:24:41,510 --> 00:24:43,890
>> Here's where stars get annoying again though,
468
00:24:43,890 --> 00:24:49,200
because the * is actually part of both the type name and part
469
00:24:49,200 --> 00:24:50,320
of the variable name.
470
00:24:50,320 --> 00:24:56,430
And so if I say int *px,py,pz, what I actually get is a pointer to an integer
471
00:24:56,430 --> 00:25:01,650
called px and two integers, py and pz.
472
00:25:01,650 --> 00:25:04,950
And that's probably not what we want, that's not good.
473
00:25:04,950 --> 00:25:09,290
>> So if I want to create multiple pointers on the same line, of the same type,
474
00:25:09,290 --> 00:25:12,140
and stars, what I actually need to do is say int *pa,*pb,*pc.
475
00:25:12,140 --> 00:25:17,330
476
00:25:17,330 --> 00:25:20,300
Now having just said that and now telling you this,
477
00:25:20,300 --> 00:25:22,170
you probably will never do this.
478
00:25:22,170 --> 00:25:25,170
And it's probably a good thing honestly, because you might inadvertently
479
00:25:25,170 --> 00:25:26,544
omit a star, something like that.
480
00:25:26,544 --> 00:25:29,290
It's probably best to maybe declare pointers on individual lines,
481
00:25:29,290 --> 00:25:31,373
but it's just another one of those annoying syntax
482
00:25:31,373 --> 00:25:35,310
things with stars that make pointers so difficult to work with.
483
00:25:35,310 --> 00:25:39,480
Because it's just this syntactic mess you have to work through.
484
00:25:39,480 --> 00:25:41,600
With practice it does really become second nature.
485
00:25:41,600 --> 00:25:45,410
I still make mistakes with it still after programming for 10 years,
486
00:25:45,410 --> 00:25:49,630
so don't be upset if something happens to you, it's pretty common honestly.
487
00:25:49,630 --> 00:25:52,850
It's really kind of a flaw of the syntax.
488
00:25:52,850 --> 00:25:54,900
>> OK so I kind of promised that we would revisit
489
00:25:54,900 --> 00:25:59,370
the concept of how large is a string.
490
00:25:59,370 --> 00:26:02,750
Well if I told you that a string, we've really kind of
491
00:26:02,750 --> 00:26:04,140
been lying to you the whole time.
492
00:26:04,140 --> 00:26:06,181
There's no data type called string, and in fact I
493
00:26:06,181 --> 00:26:09,730
mentioned this in one of our earliest videos on data types,
494
00:26:09,730 --> 00:26:13,820
that string was a data type that was created for you in CS50.h.
495
00:26:13,820 --> 00:26:17,050
You have to #include CS50.h in order to use it.
496
00:26:17,050 --> 00:26:19,250
>> Well string is really just an alias for something
497
00:26:19,250 --> 00:26:23,600
called the char *, a pointer to a character.
498
00:26:23,600 --> 00:26:26,010
Well pointers, recall, are just addresses.
499
00:26:26,010 --> 00:26:28,780
So what is the size in bytes of a string?
500
00:26:28,780 --> 00:26:29,796
Well it's four or eight.
501
00:26:29,796 --> 00:26:32,170
And the reason I say four or eight is because it actually
502
00:26:32,170 --> 00:26:36,730
depends on the system, If you're using CS50 ide, char * is the size of a char
503
00:26:36,730 --> 00:26:39,340
* is eight, it's a 64-bit system.
504
00:26:39,340 --> 00:26:43,850
Every address in memory is 64 bits long.
505
00:26:43,850 --> 00:26:48,270
If you're using CS50 appliance or using any 32-bit machine,
506
00:26:48,270 --> 00:26:51,640
and you've heard that term 32-bit machine, what is a 32-bit machine?
507
00:26:51,640 --> 00:26:56,090
Well it just means that every address in memory is 32 bits long.
508
00:26:56,090 --> 00:26:59,140
And so 32 bits is four bytes.
509
00:26:59,140 --> 00:27:02,710
So a char * is four or eight bytes depending on your system.
510
00:27:02,710 --> 00:27:06,100
And indeed any data types, and a pointer to any data
511
00:27:06,100 --> 00:27:12,030
type, since all pointers are just addresses, are four or eight bytes.
512
00:27:12,030 --> 00:27:14,030
So let's revisit this diagram and let's conclude
513
00:27:14,030 --> 00:27:18,130
this video with a little exercise here.
514
00:27:18,130 --> 00:27:21,600
So here's the diagram we left off with at the very beginning of the video.
515
00:27:21,600 --> 00:27:23,110
So what happens now if I say *pk=35?
516
00:27:23,110 --> 00:27:26,370
517
00:27:26,370 --> 00:27:30,530
So what does it mean when I say, *pk=35?
518
00:27:30,530 --> 00:27:32,420
Take a second.
519
00:27:32,420 --> 00:27:34,990
*pk.
520
00:27:34,990 --> 00:27:39,890
In context here, * is dereference operator.
521
00:27:39,890 --> 00:27:42,110
So when the dereference operator is used,
522
00:27:42,110 --> 00:27:48,520
we go to the address pointed to by pk, and we change what we find.
523
00:27:48,520 --> 00:27:55,270
So *pk=35 effectively does this to the picture.
524
00:27:55,270 --> 00:27:58,110
So it's basically syntactically identical to of having said k=35.
525
00:27:58,110 --> 00:28:00,740
526
00:28:00,740 --> 00:28:01,930
>> One more.
527
00:28:01,930 --> 00:28:05,510
If I say int m, I create a new variable called m.
528
00:28:05,510 --> 00:28:08,260
A new box, it's a green box because it's going to hold an integer,
529
00:28:08,260 --> 00:28:09,840
and it's labeled m.
530
00:28:09,840 --> 00:28:14,960
If I say m=4, I put an integer into that box.
531
00:28:14,960 --> 00:28:20,290
If say pk=&m, how does this diagram change?
532
00:28:20,290 --> 00:28:28,760
Pk=&m, do you recall what the & operator does or is called?
533
00:28:28,760 --> 00:28:34,430
Remember that & some variable name is the address of a variable name.
534
00:28:34,430 --> 00:28:38,740
So what we're saying is pk gets the address of m.
535
00:28:38,740 --> 00:28:42,010
And so effectively what happens the diagram is that pk no longer points
536
00:28:42,010 --> 00:28:46,420
to k, but points to m.
537
00:28:46,420 --> 00:28:48,470
>> Again pointers are very tricky to work with
538
00:28:48,470 --> 00:28:50,620
and they take a lot of practice, but because
539
00:28:50,620 --> 00:28:54,150
of their ability to allow you to pass data between functions
540
00:28:54,150 --> 00:28:56,945
and actually have those changes take effect,
541
00:28:56,945 --> 00:28:58,820
getting your head around is really important.
542
00:28:58,820 --> 00:29:02,590
It probably is the most complicated topic we discuss in CS50,
543
00:29:02,590 --> 00:29:05,910
but the value that you get from using pointers
544
00:29:05,910 --> 00:29:09,200
far outweighs the complications that come from learning them.
545
00:29:09,200 --> 00:29:12,690
So I wish you the best of luck learning about pointers.
546
00:29:12,690 --> 00:29:15,760
I'm Doug Lloyd, this is CS50.
547
00:29:15,760 --> 00:29:17,447
47663
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.