All language subtitles for lecture4(1337)-720p-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian Download
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:00,000 --> 00:01:17,581 [MUSIC PLAYING] 1 00:01:17,581 --> 00:01:18,631 2 00:01:18,631 --> 00:01:22,651 DAVID J. MALAN: Well, this is CS50, and already this is week four, 3 00:01:22,651 --> 00:01:24,631 and recall that last week, week three, we 4 00:01:24,631 --> 00:01:27,571 began to explore the inside of a computer's memory a bit more. 5 00:01:27,571 --> 00:01:30,631 We talked about arrays, which were just chunks of memory 6 00:01:30,631 --> 00:01:33,451 back to back to back that really lay things out left to right, top 7 00:01:33,451 --> 00:01:36,721 to bottom, and this is actually a pretty common paradigm, even if you're 8 00:01:36,721 --> 00:01:38,761 new to programming, and certainly new to C. 9 00:01:38,761 --> 00:01:43,771 You've seen this approach of just using memory in some way to lay things out, 10 00:01:43,771 --> 00:01:45,161 like images, for instance. 11 00:01:45,161 --> 00:01:50,371 So for instance, here is a photo taken of last week's front row, for instance, 12 00:01:50,371 --> 00:01:53,791 and this is an opportunity to explore exactly what happens 13 00:01:53,791 --> 00:01:56,911 if we start to zoom in and zoom in and zoom in, because it seems like most 14 00:01:56,911 --> 00:02:00,661 any TV show like CSI, or whatever, or any movie that 15 00:02:00,661 --> 00:02:06,601 explores forensic information might have the investigators zoom in 16 00:02:06,601 --> 00:02:09,994 on an image like this to see what the glint in someone's eye 17 00:02:09,994 --> 00:02:12,661 is because that reveals the license plate number of someone that 18 00:02:12,661 --> 00:02:13,556 just drove past. 19 00:02:13,556 --> 00:02:15,431 Something that's a little over the top there, 20 00:02:15,431 --> 00:02:18,661 but there's an opportunity here to speak to why that is so unrealistic. 21 00:02:18,661 --> 00:02:21,661 For instance, let's zoom on this puppet here's eye and let's 22 00:02:21,661 --> 00:02:23,971 zoom in a little more to see what might be reflected. 23 00:02:23,971 --> 00:02:26,581 Let's zoom in a little more, and that's it. 24 00:02:26,581 --> 00:02:29,051 There's only finite amount of information 25 00:02:29,051 --> 00:02:31,171 if you have an image represented in this way. 26 00:02:31,171 --> 00:02:34,321 We're using pixels-- these dots on the screen as rows and columns-- 27 00:02:34,321 --> 00:02:36,781 because if you're only using a finite amount of memory 28 00:02:36,781 --> 00:02:40,111 then at the end of the day, you can only store a finite amount of information. 29 00:02:40,111 --> 00:02:43,921 At least I don't really see in this grid here any glint of a license plate 30 00:02:43,921 --> 00:02:46,651 or something like that that you might otherwise see in Hollywood. 31 00:02:46,651 --> 00:02:49,681 So today we'll explore these kinds of representations 32 00:02:49,681 --> 00:02:52,501 of how you might use memory in new and interesting ways 33 00:02:52,501 --> 00:02:55,861 to represent now, very familiar things, but also 34 00:02:55,861 --> 00:02:59,071 start to explore what some of the limitations are of this representation. 35 00:02:59,071 --> 00:03:02,851 But consider after all that this doesn't need to be even as high resolution, 36 00:03:02,851 --> 00:03:05,161 as many pixels as something like this other image, 37 00:03:05,161 --> 00:03:09,131 you can imagine just doing something silly with Post-It notes, like this. 38 00:03:09,131 --> 00:03:11,821 And if you think of an image as just having rows and columns, 39 00:03:11,821 --> 00:03:14,131 these rows otherwise known as scan lines-- something 40 00:03:14,131 --> 00:03:17,701 we'll explore in the coming week-- you could make this fun smiley face 41 00:03:17,701 --> 00:03:22,111 by just using two different values, maybe a zero and a one. 42 00:03:22,111 --> 00:03:26,141 Or yellow and purple, or vice versa, just to make something come to life. 43 00:03:26,141 --> 00:03:30,331 Now in practice, recall we talked about storing not just a zero or one, 44 00:03:30,331 --> 00:03:37,414 but maybe an R, a G, and a B value-- like 24 bits, or three bytes in total-- 45 00:03:37,414 --> 00:03:38,581 but we'll come back to that. 46 00:03:38,581 --> 00:03:40,289 That would just be a more involved image. 47 00:03:40,289 --> 00:03:46,111 But for fun, if today you want to tackle something passively in the background, 48 00:03:46,111 --> 00:03:49,531 if you go to this URL here, we've put together an opportunity 49 00:03:49,531 --> 00:03:52,201 to do a bit of pixel art. 50 00:03:52,201 --> 00:03:55,801 If you go to this URL here, that'll redirect you to a Google Spreadsheet. 51 00:03:55,801 --> 00:03:58,141 If you have a laptop with you today that'll 52 00:03:58,141 --> 00:04:01,541 look a little something like this, which we've organized in rows and columns. 53 00:04:01,541 --> 00:04:05,881 So if you'd like to go ahead and use Google Spreadsheet's colorization 54 00:04:05,881 --> 00:04:09,331 feature to color in those individual squares if you'd like, 55 00:04:09,331 --> 00:04:12,751 see if you can't make something a little creative and then email it to Carter 56 00:04:12,751 --> 00:04:16,841 and we'll exhibit some of the best or favorites on the website thereafter. 57 00:04:16,841 --> 00:04:20,064 So let's transition then to something a little more familiar-- images. 58 00:04:20,064 --> 00:04:22,231 And not all of you have used, presumably, Photoshop, 59 00:04:22,231 --> 00:04:25,481 but you're probably generally familiar with Photoshop as a program for editing 60 00:04:25,481 --> 00:04:27,701 and creating images or photos or the like. 61 00:04:27,701 --> 00:04:30,631 And here is a screenshot of p's color picker, 62 00:04:30,631 --> 00:04:32,618 via which you can change what color you're 63 00:04:32,618 --> 00:04:34,951 going to draw with the paint brush, or what color you're 64 00:04:34,951 --> 00:04:36,931 going to fill in with the paint bucket. 65 00:04:36,931 --> 00:04:39,031 It's representative of any kind of graphical tool. 66 00:04:39,031 --> 00:04:41,441 And there's a lot of information in here, 67 00:04:41,441 --> 00:04:43,921 but there's perhaps some familiar terms now-- 68 00:04:43,921 --> 00:04:47,791 R, G, and B. In fact, right now this is Photoshop's way 69 00:04:47,791 --> 00:04:50,491 of saying you're about to fill in your background or foreground 70 00:04:50,491 --> 00:04:52,681 with the color black, and that appears to be 71 00:04:52,681 --> 00:04:56,131 represented with an R, a G, and a B value of zero, zero, zero. 72 00:04:56,131 --> 00:05:01,981 Or alternatively, using a hash symbol and then 000000. 73 00:05:01,981 --> 00:05:04,441 And if some of you have already made web pages before 74 00:05:04,441 --> 00:05:06,331 and you know a little bit of HTML and CSS, 75 00:05:06,331 --> 00:05:08,671 you probably are familiar with this kind of syntax-- 76 00:05:08,671 --> 00:05:12,531 a hash symbol and then six, or sometimes three digits thereafter. 77 00:05:12,531 --> 00:05:15,031 And if we look at a few different colors here, for instance, 78 00:05:15,031 --> 00:05:17,131 here might be the representation of white. 79 00:05:17,131 --> 00:05:23,311 Now the R, the G, and the B values went way up from 0 to 255, 255, 255. 80 00:05:23,311 --> 00:05:28,111 Or alternatively, it looks like Photoshop, and in turn web browsers, 81 00:05:28,111 --> 00:05:31,589 could represent that same color white with FFFFFF. 82 00:05:31,589 --> 00:05:32,881 And let's just do a few others. 83 00:05:32,881 --> 00:05:37,621 Here is red, and it turns out that red is a whole lot of red, 255, 84 00:05:37,621 --> 00:05:39,181 but no green, no blue. 85 00:05:39,181 --> 00:05:40,326 Or, a.k.a. 86 00:05:40,326 --> 00:05:42,549 FF0000. 87 00:05:42,549 --> 00:05:44,341 So there's perhaps a pattern here emerging. 88 00:05:44,341 --> 00:05:48,421 Here is green, zero, 255, zero, a.k.a. 89 00:05:48,421 --> 00:05:52,661 00FF00, or lastly, here blue, which is no red, 90 00:05:52,661 --> 00:05:56,371 no green but apparently a lot of blue, 255 again, a.k.a. 91 00:05:56,371 --> 00:05:58,471 0000FF. 92 00:05:58,471 --> 00:06:01,861 Now some of you, again, might have seen this notation before, 93 00:06:01,861 --> 00:06:05,071 these zeros and these F's and all of the numbers and letters in between, 94 00:06:05,071 --> 00:06:06,844 but this is another form of notation. 95 00:06:06,844 --> 00:06:08,761 And in fact, we'll explore this today-- really 96 00:06:08,761 --> 00:06:11,491 is just a precondition for talking about some other concepts. 97 00:06:11,491 --> 00:06:14,641 But the ideas, ultimately, are really no different. 98 00:06:14,641 --> 00:06:17,821 What we're about to see is a different base system-- 99 00:06:17,821 --> 00:06:19,951 not just binary, not just decimal, but something 100 00:06:19,951 --> 00:06:21,871 we're about to call hexadecimal. 101 00:06:21,871 --> 00:06:25,831 But first, recall that with RGB we previously did the following. 102 00:06:25,831 --> 00:06:28,231 Any RGB value-- red, green, blue-- just combine 103 00:06:28,231 --> 00:06:30,761 some amount of red or green or blue. 104 00:06:30,761 --> 00:06:35,341 So here we have 72, 73, 33, which in the context of an email or text, of course, 105 00:06:35,341 --> 00:06:36,901 said what-- 106 00:06:36,901 --> 00:06:38,401 a couple of weeks back? 107 00:06:38,401 --> 00:06:40,891 Just hi with an exclamation point, but in the context 108 00:06:40,891 --> 00:06:45,121 of a Photoshop-like program, this might instead be representing, 109 00:06:45,121 --> 00:06:47,558 collectively, this shade of yellow, for instance, 110 00:06:47,558 --> 00:06:50,141 when you combine that much red that much green that much blue. 111 00:06:50,141 --> 00:06:51,451 So here is the same idea. 112 00:06:51,451 --> 00:06:53,701 If you've got a lot of red, no green, no blue, 113 00:06:53,701 --> 00:06:55,291 together that's going to give us red. 114 00:06:55,291 --> 00:06:58,081 If you've got no red, a lot of green, no blue, 115 00:06:58,081 --> 00:06:59,851 that's going to give us, of course, green. 116 00:06:59,851 --> 00:07:03,169 If you've got no red, no green, a lot of blue, that of course, 117 00:07:03,169 --> 00:07:04,211 is going to give us blue. 118 00:07:04,211 --> 00:07:08,401 So there's a pattern emerging here where apparently 00 is none, as always, 119 00:07:08,401 --> 00:07:10,591 and FF is apparently a lot. 120 00:07:10,591 --> 00:07:17,281 And it's maybe somehow equated with 255, at least per that Photoshop screenshot. 121 00:07:17,281 --> 00:07:20,551 Meanwhile, if we combine one last one, a lot of red, a lot of green, 122 00:07:20,551 --> 00:07:21,631 a lot of blue-- 123 00:07:21,631 --> 00:07:25,359 that's actually going to give us a single white pixel like this. 124 00:07:25,359 --> 00:07:26,401 All right, so think back. 125 00:07:26,401 --> 00:07:30,119 Here was binary-- in the world of binary you had just two digits, zero and one. 126 00:07:30,119 --> 00:07:31,411 Could have been anything else-- 127 00:07:31,411 --> 00:07:36,541 A or B, X or Y, but the world standardized on these numerals 128 00:07:36,541 --> 00:07:37,381 zero and one. 129 00:07:37,381 --> 00:07:40,591 In our world's decimal system, of course, you have zero through nine. 130 00:07:40,591 --> 00:07:44,101 As of today though, we're going to start using hexadecimal sometimes 131 00:07:44,101 --> 00:07:47,986 in the context of images and also files just because it's a convention 132 00:07:47,986 --> 00:07:49,834 and there's some conveniences to it. 133 00:07:49,834 --> 00:07:51,751 Where now, you're going to be able to count up 134 00:07:51,751 --> 00:07:54,601 to F in a notation called hexadecimal. 135 00:07:54,601 --> 00:07:59,671 From zero through nine, then you keep going to A to B to C to D to E to F, 136 00:07:59,671 --> 00:08:02,641 the idea being each of these, even though it's weirdly 137 00:08:02,641 --> 00:08:06,781 a letter of the English alphabet, it's still just a single symbol. 138 00:08:06,781 --> 00:08:12,241 It's not one zero for 10, or 1 1 for eleven-- all 16 of these values, 139 00:08:12,241 --> 00:08:15,601 these digits, so to speak, are indeed still just single symbols, 140 00:08:15,601 --> 00:08:19,211 and that's a characteristic of just using this other notational system. 141 00:08:19,211 --> 00:08:24,751 So how do we get from 00 and FF to something like 0 and 255, respectively? 142 00:08:24,751 --> 00:08:26,761 Well, this hexadecimal system, a.k.a. 143 00:08:26,761 --> 00:08:30,186 Base 16, just does the math from week zero and really, 144 00:08:30,186 --> 00:08:31,811 grade school, a little bit differently. 145 00:08:31,811 --> 00:08:34,981 For instance, if you have a number that's got two digits, 146 00:08:34,981 --> 00:08:38,921 or hexadecimal digits as of today, the columns are just a little different. 147 00:08:38,921 --> 00:08:42,511 Instead of powers of two or powers of 10, which we saw for binary and decimal 148 00:08:42,511 --> 00:08:45,271 respectively, it's powers of 16. 149 00:08:45,271 --> 00:08:48,001 So if we just do the math out, that's the ones column, 150 00:08:48,001 --> 00:08:50,731 this is the 16s column, and so forth. 151 00:08:50,731 --> 00:08:53,741 Things get actually pretty big pretty quickly in this system. 152 00:08:53,741 --> 00:08:56,746 But now let's just consider how we would represent familiar numbers. 153 00:08:56,746 --> 00:08:59,371 If you've got two hexadecimal digits for which these hashes are 154 00:08:59,371 --> 00:09:02,431 just placeholders, zero, zero is going to mathematically 155 00:09:02,431 --> 00:09:04,931 equal the decimal number you and I know, of course, as zero. 156 00:09:04,931 --> 00:09:05,431 Why? 157 00:09:05,431 --> 00:09:06,721 Same thing as week zero-- 158 00:09:06,721 --> 00:09:11,041 16 times zero plus one times zero is the number you and I know as zero. 159 00:09:11,041 --> 00:09:12,521 And we can count up from here. 160 00:09:12,521 --> 00:09:15,031 This, in hexadecimal, would be how a computer 161 00:09:15,031 --> 00:09:16,831 represents the number we know as one. 162 00:09:16,831 --> 00:09:18,821 It would be zero one in this case. 163 00:09:18,821 --> 00:09:24,181 This would be two, three, four, five, six, seven, eight, nine-- 164 00:09:24,181 --> 00:09:26,141 in decimal, we're about to go to 10. 165 00:09:26,141 --> 00:09:29,211 But in hexadecimal, to be clear, what comes next? 166 00:09:29,211 --> 00:09:38,021 So, apparently A, so 0A, 0B, which is now 10, or 11, or 12, 13, 14, 15. 167 00:09:38,021 --> 00:09:41,111 So using hexadecimal is just an interesting way 168 00:09:41,111 --> 00:09:44,951 of using single symbols now, zero through F, 169 00:09:44,951 --> 00:09:47,901 to count from zero through 15. 170 00:09:47,901 --> 00:09:50,651 And we'll see why it's 15 in a moment, but as soon as we get to F, 171 00:09:50,651 --> 00:09:54,821 anyone want to conjecture how in hexadecimal, a.k.a. hex, 172 00:09:54,821 --> 00:09:57,731 do we now count up one position higher? 173 00:09:57,731 --> 00:10:01,431 What comes after 0F in hexadecimal? 174 00:10:01,431 --> 00:10:03,701 So, one zero-- it's the same kind of thing-- 175 00:10:03,701 --> 00:10:05,866 once you're at the highest digit possible, F-- 176 00:10:05,866 --> 00:10:07,991 or in our decimal world that would have been nine-- 177 00:10:07,991 --> 00:10:11,111 you add one more, nine wraps around to zero, or in this case, 178 00:10:11,111 --> 00:10:12,821 F wraps around to zero. 179 00:10:12,821 --> 00:10:15,791 You carry the one and voila-- now we're representing 180 00:10:15,791 --> 00:10:17,511 the number you and I know as 16. 181 00:10:17,511 --> 00:10:19,451 And we could keep going forever, literally. 182 00:10:19,451 --> 00:10:23,186 This could be 17, 18, 19, 20, and decimal-- 183 00:10:23,186 --> 00:10:25,061 but let's just wave our hands at it and count 184 00:10:25,061 --> 00:10:27,821 as high as we can-- dot, dot, dot-- the highest 185 00:10:27,821 --> 00:10:31,181 we could count in hexadecimal with two digits, just logically, 186 00:10:31,181 --> 00:10:32,981 would be what, in hexadecimal? 187 00:10:32,981 --> 00:10:35,091 Something, something. 188 00:10:35,091 --> 00:10:35,951 FF, I heard. 189 00:10:35,951 --> 00:10:39,531 So yes, that's the biggest digit possible, so FF is what we have. 190 00:10:39,531 --> 00:10:43,163 So how high can you count in hexadecimal if you've got just two of these digits? 191 00:10:43,163 --> 00:10:44,621 Well, it's the same math as always. 192 00:10:44,621 --> 00:10:46,571 16 times F, a.k.a. 193 00:10:46,571 --> 00:10:52,941 15, so that's 16 times 15 plus one times F, or one times 15-- 194 00:10:52,941 --> 00:10:57,341 that gives us 240 plus 15 in decimal, the result of which, of course, now 195 00:10:57,341 --> 00:10:59,421 is 255. 196 00:10:59,421 --> 00:11:02,511 So this hexadecimal system-- you may have seen in the world of web pages, 197 00:11:02,511 --> 00:11:05,261 and if you haven't we'll get to that in this class in a few weeks, 198 00:11:05,261 --> 00:11:07,991 or we just saw in the context of Photoshop-- just 199 00:11:07,991 --> 00:11:14,141 has this shorthand notation of counting as high as 255 but just calling it FF. 200 00:11:14,141 --> 00:11:17,771 Now it's marginal, but that's like 50% savings of how many digits 201 00:11:17,771 --> 00:11:21,491 you need in order to count as high as 255 because in decimal, of course, 202 00:11:21,491 --> 00:11:23,321 255 is three digits. 203 00:11:23,321 --> 00:11:27,131 In hexadecimal you can count as high using just two, 204 00:11:27,131 --> 00:11:30,489 and that difference is going to get magnified the bigger our numbers get. 205 00:11:30,489 --> 00:11:33,281 Let me stipulate for now, you're going to get more and more savings 206 00:11:33,281 --> 00:11:36,431 in terms of just how many symbols you need on the screen to represent 207 00:11:36,431 --> 00:11:39,881 bigger and bigger numbers than that. 208 00:11:39,881 --> 00:11:43,301 All right, let me pause here just to see if there's any questions thus far 209 00:11:43,301 --> 00:11:46,721 on what we've called hexadecimal, which again, just gives us zero through nine 210 00:11:46,721 --> 00:11:53,408 as well as A through F. Any questions or confusion? 211 00:11:53,408 --> 00:11:55,991 And if it feels like we're lingering a bit much on arithmetic, 212 00:11:55,991 --> 00:11:59,331 we're not really going to see other notations besides this moving forward. 213 00:11:59,331 --> 00:12:03,461 These are the go-to three in a programmer's world, typically. 214 00:12:03,461 --> 00:12:04,671 But there are some others. 215 00:12:04,671 --> 00:12:06,240 Yeah. 216 00:12:06,240 --> 00:12:08,532 AUDIENCE: Does the hexadecimal symbol take more storage 217 00:12:08,532 --> 00:12:11,251 than the decimal system? 218 00:12:11,251 --> 00:12:12,501 DAVID J. MALAN: Good question. 219 00:12:12,501 --> 00:12:16,611 Does hexadecimal require more storage or less storage than the decimal system? 220 00:12:16,611 --> 00:12:20,841 Theoretically no, because this is just a way of representing information 221 00:12:20,841 --> 00:12:23,721 and we'll see in a concrete example in a moment. 222 00:12:23,721 --> 00:12:27,111 But inside of the computer, at the end of the day, you're still storing bits. 223 00:12:27,111 --> 00:12:30,228 And using hexadecimal is not using more or fewer bits, 224 00:12:30,228 --> 00:12:32,061 think of this as how you might write it down 225 00:12:32,061 --> 00:12:34,971 on a piece of paper, just how many digits you're going to write 226 00:12:34,971 --> 00:12:37,941 or on a computer screen, how many digits you're going to see at once, 227 00:12:37,941 --> 00:12:41,211 but it doesn't change how the computer is representing information 228 00:12:41,211 --> 00:12:44,331 because all they're representing at the end of the day is zeros and ones. 229 00:12:44,331 --> 00:12:45,621 So in fact, let's go there. 230 00:12:45,621 --> 00:12:49,851 If this-- a moment ago FF I claimed was 255-- 231 00:12:49,851 --> 00:12:51,891 let's just rewind to week zero and if we wanted 232 00:12:51,891 --> 00:12:56,391 to count to 255 in binary, that's as high as you can count, recall, 233 00:12:56,391 --> 00:12:57,411 with eight bits. 234 00:12:57,411 --> 00:12:59,244 And there's only a few of these numbers that 235 00:12:59,244 --> 00:13:03,081 are useful to memorize, like 255 is as high as you can count with eight bits 236 00:13:03,081 --> 00:13:06,981 if you start at zero, because two to the eighth is 256, but if you start at zero 237 00:13:06,981 --> 00:13:09,471 it's zero through 255. 238 00:13:09,471 --> 00:13:13,671 So in binary, recall if you have eight bits, all of which were ones, 239 00:13:13,671 --> 00:13:15,991 and I won't do out the math pedantically here, 240 00:13:15,991 --> 00:13:18,366 but if I do do this plus this plus this, dot, dot, 241 00:13:18,366 --> 00:13:21,391 dot-- that's also going to give me 255. 242 00:13:21,391 --> 00:13:24,441 So this is what's interesting here about hexadecimal. 243 00:13:24,441 --> 00:13:28,851 It turns out that an upside of storing values in hexadecimal 244 00:13:28,851 --> 00:13:32,571 is that we're going to see the first F represents 245 00:13:32,571 --> 00:13:35,901 the left half of all these bits, and the second F in this case 246 00:13:35,901 --> 00:13:38,431 represents the rightmost four of these bits. 247 00:13:38,431 --> 00:13:41,061 So it turns out hexadecimal is very useful when you 248 00:13:41,061 --> 00:13:44,031 want to treat data in units of four. 249 00:13:44,031 --> 00:13:47,181 It's not quite eight, but units of four, and that's not bad. 250 00:13:47,181 --> 00:13:50,271 Which is why-- if you use two digits like I have thus far, 251 00:13:50,271 --> 00:13:53,061 00 or FF or anything in between-- 252 00:13:53,061 --> 00:13:57,921 that's actually a convenient way of representing eight bits in total. 253 00:13:57,921 --> 00:14:02,091 One hex digit for the first four bits, one hex digit for the second. 254 00:14:02,091 --> 00:14:04,791 And again, there's nothing new intellectually here per se, 255 00:14:04,791 --> 00:14:08,571 it's just a different way of representing the same story as before-- 256 00:14:08,571 --> 00:14:09,651 zeros and ones. 257 00:14:09,651 --> 00:14:11,491 So in what context do we see this? 258 00:14:11,491 --> 00:14:12,831 Well, we talked about memory last week, and we're 259 00:14:12,831 --> 00:14:14,414 going to talk more about it this week. 260 00:14:14,414 --> 00:14:16,941 If this is my computer's RAM-- random access memory-- 261 00:14:16,941 --> 00:14:21,111 you can again think of each byte as having a number associated with it-- 262 00:14:21,111 --> 00:14:22,671 its address or location. 263 00:14:22,671 --> 00:14:26,991 This might be zero, this might be 2 billion, and so in the past 264 00:14:26,991 --> 00:14:29,781 I've described these as just this, using decimal numbers. 265 00:14:29,781 --> 00:14:34,131 Here's byte zero, one, two, three, four, five, six, seven, 15, 16 266 00:14:34,131 --> 00:14:35,581 would be here, and so forth. 267 00:14:35,581 --> 00:14:40,071 But it turns out in the world of memory, and thus today, programming, people 268 00:14:40,071 --> 00:14:44,691 tend to count memory bytes using hexadecimal. 269 00:14:44,691 --> 00:14:46,881 Partly just by convention, but also partly 270 00:14:46,881 --> 00:14:49,581 because it's a little more succinct and again, each digit 271 00:14:49,581 --> 00:14:52,641 represents four bits, typically. 272 00:14:52,641 --> 00:14:54,396 So what comes after F here? 273 00:14:54,396 --> 00:14:56,271 Well, if I think about the computer's memory, 274 00:14:56,271 --> 00:15:01,311 I normally might do after F, which is 15, 16. 275 00:15:01,311 --> 00:15:05,931 But instead, one zero, one one, one two, one three-- this 276 00:15:05,931 --> 00:15:10,551 is not 10, 11, 12, 13, because I claim I'm in the context of hexadecimal now. 277 00:15:10,551 --> 00:15:12,621 As per the previous slide, we already started 278 00:15:12,621 --> 00:15:15,441 going into A's through F's, so you immediately 279 00:15:15,441 --> 00:15:18,111 see here a possible problem. 280 00:15:18,111 --> 00:15:21,081 Why is this now worrisome, if all of a sudden you're 281 00:15:21,081 --> 00:15:26,791 seeing seemingly familiar numbers like 10, 11, 12, 13? 282 00:15:26,791 --> 00:15:28,928 We didn't really stumble across this problem 283 00:15:28,928 --> 00:15:30,511 when it was all zeros and ones before. 284 00:15:30,511 --> 00:15:31,614 Yeah. 285 00:15:31,614 --> 00:15:33,156 AUDIENCE: Try to do math [INAUDIBLE]. 286 00:15:35,284 --> 00:15:37,951 DAVID J. MALAN: Yeah, so if you're writing some code in C that's 287 00:15:37,951 --> 00:15:39,809 doing some math, you might accidentally-- 288 00:15:39,809 --> 00:15:42,601 or the computer might accidentally confuse hexadecimal with decimal 289 00:15:42,601 --> 00:15:45,161 if they look in some context the same. 290 00:15:45,161 --> 00:15:47,251 Any number on the board that doesn't have a letter 291 00:15:47,251 --> 00:15:51,041 is ambiguously hexadecimal or decimal at this point, 292 00:15:51,041 --> 00:15:52,751 and so how might we resolve this? 293 00:15:52,751 --> 00:15:55,711 Well, it turns out that what computers typically do is this. 294 00:15:55,711 --> 00:16:00,481 By convention, any time you see 0x and then a number, 295 00:16:00,481 --> 00:16:02,911 that's a human convention of saying-- 296 00:16:02,911 --> 00:16:06,371 signaling to the reader that this is in fact a hexadecimal number. 297 00:16:06,371 --> 00:16:10,441 So if it's 0x10, that is not the number 10, 298 00:16:10,441 --> 00:16:15,611 that is the hexadecimal number one zero, which recall we said earlier, 299 00:16:15,611 --> 00:16:18,631 is how you count up to 16. 300 00:16:18,631 --> 00:16:21,151 And again, these are not the kinds of things to memorize, 301 00:16:21,151 --> 00:16:24,561 it's really just the system for how you think about these things. 302 00:16:24,561 --> 00:16:27,061 So henceforth today, we're going to start seeing hexadecimal 303 00:16:27,061 --> 00:16:28,471 in a bunch of contexts. 304 00:16:28,471 --> 00:16:31,501 When you write code, you might even write code using some hexadecimal 305 00:16:31,501 --> 00:16:34,001 but again, it's just a different way of representing numbers 306 00:16:34,001 --> 00:16:37,261 and humans have different conventions for different contexts. 307 00:16:37,261 --> 00:16:40,771 All right, so with that said, any questions now on this building block? 308 00:16:40,771 --> 00:16:46,321 But here on out, we'll start using it in some actual code. 309 00:16:46,321 --> 00:16:48,011 Any questions? 310 00:16:48,011 --> 00:16:49,581 Nothing so far? 311 00:16:49,581 --> 00:16:50,081 All right. 312 00:16:50,081 --> 00:16:53,821 So, let's go ahead and consider maybe a familiar example. 313 00:16:53,821 --> 00:16:57,571 Something where involving code, where I initialize a variable like n 314 00:16:57,571 --> 00:16:59,389 to a value like 50, in this case. 315 00:16:59,389 --> 00:17:01,681 And then let's start to tinker around with what's going 316 00:17:01,681 --> 00:17:03,391 on inside of the computer's memory. 317 00:17:03,391 --> 00:17:06,191 In a moment I'm going to load up VS Code on my computer 318 00:17:06,191 --> 00:17:09,511 and I'm going to go ahead and whip up a program that very simply assigns 319 00:17:09,511 --> 00:17:13,231 a value like the number 50 to a variable called n, 320 00:17:13,231 --> 00:17:19,036 but today, keep in mind that that variable n and that value 50 321 00:17:19,036 --> 00:17:21,404 is going to be stored somewhere in my computer's memory, 322 00:17:21,404 --> 00:17:24,571 and it turns out today we'll introduce a bit more syntax so you can actually 323 00:17:24,571 --> 00:17:27,011 see where things are being stored. 324 00:17:27,011 --> 00:17:28,711 So let me click over to VS Code here. 325 00:17:28,711 --> 00:17:31,681 I'm going to create a program called address.c just 326 00:17:31,681 --> 00:17:34,171 to explore computer's addresses today, and I'm 327 00:17:34,171 --> 00:17:38,701 going to do an include stdio.h, int main(void), as usual. 328 00:17:38,701 --> 00:17:40,441 No command line arguments for now. 329 00:17:40,441 --> 00:17:43,043 I'm going to declare that variable n equals 50, 330 00:17:43,043 --> 00:17:45,251 and then I'm just going to go ahead and print it out. 331 00:17:45,251 --> 00:17:50,731 So nothing very interesting but I'll use %i backslash n and then comma n 332 00:17:50,731 --> 00:17:52,321 to print out that value. 333 00:17:52,321 --> 00:17:55,311 Nothing here should be very interesting to compile or run, 334 00:17:55,311 --> 00:17:57,811 but I'll do it just to make sure I didn't make any mistakes. 335 00:17:57,811 --> 00:18:03,301 Looks like as expected, it simply prints out the number 50, like this. 336 00:18:03,301 --> 00:18:06,781 But let's consider then, what this code is doing underneath the hood 337 00:18:06,781 --> 00:18:09,521 when it's actually run on your machine. 338 00:18:09,521 --> 00:18:11,401 So here we have that grid of memory. 339 00:18:11,401 --> 00:18:15,451 That variable n is an int, and if you think back, 340 00:18:15,451 --> 00:18:19,051 how many bytes typically do we use for an int? 341 00:18:19,051 --> 00:18:20,131 Yeah. 342 00:18:20,131 --> 00:18:22,690 Four, so four bytes, or 32 bits. 343 00:18:22,690 --> 00:18:26,491 So if each of these squares represents one byte, then my computer, somewhere 344 00:18:26,491 --> 00:18:29,813 in my memory, or RAM, is using four of these squares. 345 00:18:29,813 --> 00:18:32,521 Maybe it ends up over here just because there's other stuff being 346 00:18:32,521 --> 00:18:33,731 used elsewhere, for instance. 347 00:18:33,731 --> 00:18:35,481 Though I don't really know, and frankly, I 348 00:18:35,481 --> 00:18:38,273 don't really care where it ends up, just that it ends up somewhere. 349 00:18:38,273 --> 00:18:41,940 So the variable-- the value 50 is stored here in a variable called n. 350 00:18:41,940 --> 00:18:45,581 Even though I've written it as decimal, just like in my code-- 351 00:18:45,581 --> 00:18:50,184 let me again remind that this is 32 zeros and ones representing that 50-- 352 00:18:50,184 --> 00:18:53,351 it's just going to be very tedious if we start writing everything in binary, 353 00:18:53,351 --> 00:18:56,351 so I'll use the more comfortable human decimal system. 354 00:18:56,351 --> 00:18:59,141 So that's what's going on inside of the computer's memory. 355 00:18:59,141 --> 00:19:03,571 So what if I actually wanted to start tinkering with its location, 356 00:19:03,571 --> 00:19:06,091 or maybe just knowing its location? 357 00:19:06,091 --> 00:19:09,901 Well, this variable n indeed has a name, n-- 358 00:19:09,901 --> 00:19:13,763 that's a label of sorts for it-- but at the end of the day that 50 is 359 00:19:13,763 --> 00:19:16,471 technically at a specific address, and I'm going to make one up-- 360 00:19:16,471 --> 00:19:19,501 0x123, and it's 123 because I really don't 361 00:19:19,501 --> 00:19:22,421 care what it is, I just want an address for the sake of discussion. 362 00:19:22,421 --> 00:19:28,951 So way over here off screen might be byte zero, way down here is byte 0x123. 363 00:19:28,951 --> 00:19:32,861 It's in hexadecimal notation just by convention. 364 00:19:32,861 --> 00:19:36,691 So how can I actually see where my variables are ending up 365 00:19:36,691 --> 00:19:38,341 in memory if I'm curious to do so? 366 00:19:38,341 --> 00:19:41,821 Well, let me go back to my code here and let me actually 367 00:19:41,821 --> 00:19:44,081 change this just a little bit. 368 00:19:44,081 --> 00:19:49,381 Let me go ahead and introduce, for instance, another symbol 369 00:19:49,381 --> 00:19:53,581 here and another topic altogether, namely pointers. 370 00:19:53,581 --> 00:19:59,111 So a pointer is a variable that stores the address of some value-- 371 00:19:59,111 --> 00:20:02,371 the location of some value or more specifically, 372 00:20:02,371 --> 00:20:05,681 the specific byte in which that value is stored. 373 00:20:05,681 --> 00:20:08,941 So again, if you think of your memory as being a whole bunch of bytes-- 374 00:20:08,941 --> 00:20:11,701 zero at top left, 2 billion or whatever at bottom right, 375 00:20:11,701 --> 00:20:13,201 depending on how much RAM you have-- 376 00:20:13,201 --> 00:20:15,481 each of those things has a location, or an address. 377 00:20:15,481 --> 00:20:19,571 A pointer is just a variable storing one such address. 378 00:20:19,571 --> 00:20:24,751 So it turns out that in the world of C, there's a couple of new symbols 379 00:20:24,751 --> 00:20:29,111 we can use if we want to see what it is we're talking about here, 380 00:20:29,111 --> 00:20:32,041 and those two operators, as of today, are these. 381 00:20:32,041 --> 00:20:35,831 You can use the ampersand operator in C in a couple of ways. 382 00:20:35,831 --> 00:20:38,761 We already saw it very briefly to do ampersand ampersand-- 383 00:20:38,761 --> 00:20:42,271 it's kind of and two Boolean expressions together 384 00:20:42,271 --> 00:20:43,811 in the context of a conditional. 385 00:20:43,811 --> 00:20:44,821 This is different. 386 00:20:44,821 --> 00:20:48,631 A single ampersand is the address of operator. 387 00:20:48,631 --> 00:20:52,651 So literally, in your code, if you've got a variable like n or anything else 388 00:20:52,651 --> 00:20:57,901 and you write &n, C is going to figure out for you what is the address of that 389 00:20:57,901 --> 00:21:00,371 variable n in the computer's memory. 390 00:21:00,371 --> 00:21:06,001 And it's going to give you a number, otherwise known as the address of that. 391 00:21:06,001 --> 00:21:09,781 If you want to store that address in a variable 392 00:21:09,781 --> 00:21:15,841 even though yes, it's a number like 0x123, you have to tell C in advance 393 00:21:15,841 --> 00:21:21,721 that you want to store not an int per se, but the address of an int. 394 00:21:21,721 --> 00:21:25,351 And the syntax for doing that-- somewhat nonobviously-- is 395 00:21:25,351 --> 00:21:29,071 to use an asterisk here, a star operator, and you 396 00:21:29,071 --> 00:21:30,871 say this when creating the variable. 397 00:21:30,871 --> 00:21:35,371 If you want p to be a pointer, that is the address of some other variable, 398 00:21:35,371 --> 00:21:37,051 you do int star p. 399 00:21:37,051 --> 00:21:41,191 And the star just tells the computer, this is not an integer per se, 400 00:21:41,191 --> 00:21:44,641 this is the address of something that yes, is an int, 401 00:21:44,641 --> 00:21:46,401 but we're just being more precise. 402 00:21:46,401 --> 00:21:49,301 So on the right hand side you have the address of operator. 403 00:21:49,301 --> 00:21:52,281 As always with the equal sign, you copy from right to left. 404 00:21:52,281 --> 00:21:56,231 Because &n is by definition the address of something you have to store it 405 00:21:56,231 --> 00:22:01,781 in a pointer, and the way to declare a pointer is to specify the type of value 406 00:22:01,781 --> 00:22:05,831 whose address you're storing, and then use the star to indicate that this is 407 00:22:05,831 --> 00:22:09,341 indeed a pointer and not just a regular old int. 408 00:22:09,341 --> 00:22:10,811 So let's see this in practice. 409 00:22:10,811 --> 00:22:13,871 Let me go back to my own source code here and let 410 00:22:13,871 --> 00:22:15,881 me make just a couple of tweaks. 411 00:22:15,881 --> 00:22:18,221 I'm going to leave n alone here but I'm going 412 00:22:18,221 --> 00:22:22,761 to go ahead and initially just do this. 413 00:22:22,761 --> 00:22:27,341 Let me say int star p equals ampersand n, 414 00:22:27,341 --> 00:22:31,961 and then down here, I'm going to print out not n this time, but p-- 415 00:22:31,961 --> 00:22:33,401 the variable p. 416 00:22:33,401 --> 00:22:38,171 And then even though yes, it's just a number and therefore I could use %i 417 00:22:38,171 --> 00:22:42,311 for integers, there's actually a special format code in printf for printing 418 00:22:42,311 --> 00:22:45,521 pointers or addresses, and that's %p. 419 00:22:45,521 --> 00:22:48,821 So now let's go ahead and recompile this, make address-- 420 00:22:48,821 --> 00:22:53,871 so far so good-- ./address, Enter, and a little weirdly, 421 00:22:53,871 --> 00:22:58,511 but perhaps understandably now, the address in my computer's memory 422 00:22:58,511 --> 00:23:02,381 at which the variable n happened to be stored was not quite as simple 423 00:23:02,381 --> 00:23:03,881 as 0x123. 424 00:23:03,881 --> 00:23:06,431 This computer has a lot more memory so technically, 425 00:23:06,431 --> 00:23:12,491 it was stored at 0x7FFCB4578E5C. 426 00:23:12,491 --> 00:23:14,651 Now that has no special significance to me. 427 00:23:14,651 --> 00:23:16,881 It could have ended up somewhere else altogether, 428 00:23:16,881 --> 00:23:20,381 but this is just where, in my computer-- or technically the cloud 429 00:23:20,381 --> 00:23:22,901 server to which I'm connected using VS Code here-- 430 00:23:22,901 --> 00:23:25,498 that just happens to be where n ended up. 431 00:23:25,498 --> 00:23:28,331 And strictly speaking, I don't even need to introduce this variable. 432 00:23:28,331 --> 00:23:31,181 I could get rid of p and I could just say 433 00:23:31,181 --> 00:23:34,901 print not just n, but the address of n and achieve the same thing. 434 00:23:34,901 --> 00:23:37,361 You don't need to temporarily store it in a variable. 435 00:23:37,361 --> 00:23:40,341 Let me just do make address again, ./address, 436 00:23:40,341 --> 00:23:42,921 and now I see this address here. 437 00:23:42,921 --> 00:23:46,466 And notice if I keep running the program, it's actually moving around. 438 00:23:46,466 --> 00:23:49,091 There's other stuff presumably going on inside of the computer. 439 00:23:49,091 --> 00:23:52,501 Maybe it's actually randomizing it so it's not always at the same location. 440 00:23:52,501 --> 00:23:55,001 That can actually be a security feature underneath the hood, 441 00:23:55,001 --> 00:24:00,521 but this happens to be at that moment in time where that value is in memory, 442 00:24:00,521 --> 00:24:03,491 quite like our picture a moment ago. 443 00:24:03,491 --> 00:24:06,641 All right, so let me pause here to see if there's now 444 00:24:06,641 --> 00:24:08,171 any questions on what we just did. 445 00:24:08,171 --> 00:24:10,171 Yeah? 446 00:24:10,171 --> 00:24:12,391 AUDIENCE: Is there any way to control where 447 00:24:12,391 --> 00:24:15,551 you are storing something in memory? 448 00:24:15,551 --> 00:24:18,746 Does it even matter if it works, or does it just 449 00:24:18,746 --> 00:24:21,271 matter that you could go in and locate where something is? 450 00:24:21,271 --> 00:24:22,813 DAVID J. MALAN: Really good question. 451 00:24:22,813 --> 00:24:25,381 Is there any way to control where something is in memory? 452 00:24:25,381 --> 00:24:28,338 Short answer is yes, and this is both the power in the danger of C, 453 00:24:28,338 --> 00:24:31,171 and we're going to do this today and make a few deliberate mistakes, 454 00:24:31,171 --> 00:24:36,241 because with this power of going to or getting the address of any variable, 455 00:24:36,241 --> 00:24:38,341 I could just arbitrarily right now write code 456 00:24:38,341 --> 00:24:42,611 that stores a value at byte 2 billion, or zero, or anything in between. 457 00:24:42,611 --> 00:24:46,771 But that also means potentially, I could start creepily looking 458 00:24:46,771 --> 00:24:50,831 around at all of the computer's memory, even at things that I didn't put there. 459 00:24:50,831 --> 00:24:53,371 Maybe other programs, maybe other parts of programs 460 00:24:53,371 --> 00:24:55,621 and indeed, this is a potential security threat, 461 00:24:55,621 --> 00:24:57,984 if suddenly you're able to just look anywhere 462 00:24:57,984 --> 00:24:59,401 you want in the computer's memory. 463 00:24:59,401 --> 00:25:04,021 Now, I'm overselling it a little bit because nowadays, in this decade, 464 00:25:04,021 --> 00:25:06,571 there are some defenses in place in compilers 465 00:25:06,571 --> 00:25:09,941 and in our operating systems that do hedge against this a little bit. 466 00:25:09,941 --> 00:25:12,391 But this is still a very frequent source of problems, 467 00:25:12,391 --> 00:25:14,791 and later today we'll talk briefly about things 468 00:25:14,791 --> 00:25:17,651 called stack overflow, which is not just a website, 469 00:25:17,651 --> 00:25:19,831 it is a problem that you can encounter. 470 00:25:19,831 --> 00:25:22,351 Heap overflow, and more generally buffer overflows-- 471 00:25:22,351 --> 00:25:25,801 there's just so many things that can go wrong using this language called C, 472 00:25:25,801 --> 00:25:29,401 and if any of you have encountered a segmentation fault yet? 473 00:25:29,401 --> 00:25:31,321 I think we saw a few hands for that already. 474 00:25:31,321 --> 00:25:33,901 You touched memory that you shouldn't have 475 00:25:33,901 --> 00:25:38,611 and odds are you did it most recently by going too far in an array. 476 00:25:38,611 --> 00:25:42,001 Going to the left, or negative in an array, or somehow looking at memory 477 00:25:42,001 --> 00:25:42,841 you shouldn't have. 478 00:25:42,841 --> 00:25:47,051 And we'll explain today why it is you were able to do that. 479 00:25:47,051 --> 00:25:49,531 Other questions on these primitives so far? 480 00:25:49,531 --> 00:25:51,623 Yeah, from Carter? 481 00:25:51,623 --> 00:25:54,748 AUDIENCE: [INAUDIBLE] pointer star p, but then we used p later in the code. 482 00:25:54,748 --> 00:25:56,031 Is it called star p or p? 483 00:25:56,031 --> 00:25:57,281 DAVID J. MALAN: Good question. 484 00:25:57,281 --> 00:25:58,571 Earlier, we used star p. 485 00:25:58,571 --> 00:26:01,061 Let me rewind in time to the previous version of this code, 486 00:26:01,061 --> 00:26:03,341 where I actually had a variable called p. 487 00:26:03,341 --> 00:26:07,151 Just like with variable declarations in the past, 488 00:26:07,151 --> 00:26:12,621 once you've declared a variable to be an int, a char, a bool, or an int 489 00:26:12,621 --> 00:26:15,761 star, a.k.a. a pointer, you don't thereafter 490 00:26:15,761 --> 00:26:18,671 keep using the word int or now, the star. 491 00:26:18,671 --> 00:26:20,471 Once you've declared it, that's it. 492 00:26:20,471 --> 00:26:21,921 You only refer to it by name. 493 00:26:21,921 --> 00:26:26,111 And so it's very deliberate what I did here, 494 00:26:26,111 --> 00:26:28,661 saying that the type here is int star-- 495 00:26:28,661 --> 00:26:30,671 that is a pointer to an int-- 496 00:26:30,671 --> 00:26:33,611 but here I just said the name of the variable, as always. 497 00:26:33,611 --> 00:26:36,311 I didn't repeat int, and I also didn't repeat star. 498 00:26:36,311 --> 00:26:39,191 But at the risk of bending one's minds a little bit there 499 00:26:39,191 --> 00:26:45,441 is unfortunately one other use for the star operator, and that's as follows. 500 00:26:45,441 --> 00:26:49,181 If you want to print out not the address of something, 501 00:26:49,181 --> 00:26:54,261 but what is at a specific address, you can actually do this. 502 00:26:54,261 --> 00:26:59,621 If I want to print out the integer via %i, that is at that address, 503 00:26:59,621 --> 00:27:04,061 I can actually use the star here, which technically contradicts what I just 504 00:27:04,061 --> 00:27:07,161 said but it has a different function here-- a different purpose. 505 00:27:07,161 --> 00:27:09,561 So let me go ahead and do this in two different ways. 506 00:27:09,561 --> 00:27:11,366 I'm going to leave this line of code as is, 507 00:27:11,366 --> 00:27:13,241 but I'm going to add another line of code now 508 00:27:13,241 --> 00:27:17,201 that prints out what apparently will be an integer, in a moment. 509 00:27:17,201 --> 00:27:21,124 So %i backslash n, and I could see-- and let me just do n for now. 510 00:27:21,124 --> 00:27:23,291 So there's really nothing special happening now, I'm 511 00:27:23,291 --> 00:27:25,301 just adding a sort of mindless printing of n. 512 00:27:25,301 --> 00:27:28,041 So make address, ./address-- 513 00:27:28,041 --> 00:27:31,601 there's the current address of n and there's the value of n. 514 00:27:31,601 --> 00:27:34,571 But what's kind of cool about C here, too, 515 00:27:34,571 --> 00:27:38,861 is if you know that a value is at a specific address like p, 516 00:27:38,861 --> 00:27:42,591 there's one other use for this star operator, the asterisk. 517 00:27:42,591 --> 00:27:46,221 You can use it as the so-called dereference operator, 518 00:27:46,221 --> 00:27:49,071 which means go to that address. 519 00:27:49,071 --> 00:27:54,701 And so here what we actually have is an example of a pointer p, 520 00:27:54,701 --> 00:27:59,631 which is an address like 0x123 or 0x7FF and so forth. 521 00:27:59,631 --> 00:28:03,191 But if you say star p now, you're not redeclaring the variable 522 00:28:03,191 --> 00:28:04,631 because I didn't mention int-- 523 00:28:04,631 --> 00:28:07,391 you're going to that address in p. 524 00:28:07,391 --> 00:28:09,071 So let me recompile this now. 525 00:28:09,071 --> 00:28:15,191 Make address, ./address, and just to be clear-- 526 00:28:15,191 --> 00:28:16,721 what should I see? 527 00:28:16,721 --> 00:28:20,231 I'm first going to see the pointer itself, 0x something. 528 00:28:20,231 --> 00:28:23,096 What's the second line of output I should presumably see now? 529 00:28:25,801 --> 00:28:27,591 Shout a little louder. 530 00:28:27,591 --> 00:28:31,911 So I'm hearing 50, and that's true because if you figure out the address 531 00:28:31,911 --> 00:28:38,151 of n and print it in line seven, but then go to the address of n, a.k.a. p, 532 00:28:38,151 --> 00:28:41,331 that's indeed going to just show you the number n-- 533 00:28:41,331 --> 00:28:44,121 the value of n again. 534 00:28:44,121 --> 00:28:47,028 All right, any questions now on this syntax-- and I will concede, 535 00:28:47,028 --> 00:28:48,861 I think this is confusing-- the fact that we 536 00:28:48,861 --> 00:28:51,051 use the star for multiplication, the fact 537 00:28:51,051 --> 00:28:53,361 that we use the star to declare a pointer, 538 00:28:53,361 --> 00:28:56,601 but then we use a star in a third way to dereference the pointer 539 00:28:56,601 --> 00:28:57,651 and go to the pointer. 540 00:28:57,651 --> 00:29:01,251 It's just too confusing, honestly, but with practice comes comfort. 541 00:29:01,251 --> 00:29:02,681 Yeah. 542 00:29:02,681 --> 00:29:12,501 AUDIENCE: [INAUDIBLE] 543 00:29:12,501 --> 00:29:13,751 DAVID J. MALAN: Good question. 544 00:29:13,751 --> 00:29:17,321 Do you-- when you are using the ampersand operator 545 00:29:17,321 --> 00:29:19,271 to get the address of something, the onus 546 00:29:19,271 --> 00:29:23,411 is on you at the moment to know what you are getting the address of. 547 00:29:23,411 --> 00:29:24,341 Is it a string? 548 00:29:24,341 --> 00:29:25,181 Is it a char? 549 00:29:25,181 --> 00:29:25,901 Is it a bool? 550 00:29:25,901 --> 00:29:26,681 Is it an int? 551 00:29:26,681 --> 00:29:30,041 I wrote this code so I know in line six that I'm 552 00:29:30,041 --> 00:29:33,131 trying to get the address of what is an integer. 553 00:29:33,131 --> 00:29:35,271 AUDIENCE: What about line eight? 554 00:29:35,271 --> 00:29:38,991 DAVID J. MALAN: In line eight you don't have 555 00:29:38,991 --> 00:29:40,821 to worry about that-- good question. 556 00:29:40,821 --> 00:29:44,851 Notice in line eight, I didn't tell the computer, other than the %i, 557 00:29:44,851 --> 00:29:49,551 what kind of address I'm going to, but I did already in line six. 558 00:29:49,551 --> 00:29:52,581 I told the compiler that p, now and forever, 559 00:29:52,581 --> 00:29:55,041 is going to be the address of an int. 560 00:29:55,041 --> 00:29:59,961 That's enough information in advance so that printf, or really the language C, 561 00:29:59,961 --> 00:30:03,951 still knows on line eight that p is a pointer to an int, 562 00:30:03,951 --> 00:30:07,371 and that way it will print out all four bytes at that address, 563 00:30:07,371 --> 00:30:11,288 not just part of it, and not more than those four bytes. 564 00:30:11,288 --> 00:30:11,871 Good question. 565 00:30:11,871 --> 00:30:13,801 Yeah, next to you. 566 00:30:13,801 --> 00:30:15,301 AUDIENCE: Do pointers have pointers? 567 00:30:15,301 --> 00:30:16,601 DAVID J. MALAN: Do pointers have pointers? 568 00:30:16,601 --> 00:30:17,101 Yes. 569 00:30:17,101 --> 00:30:20,731 We won't do this today by having pointers to pointers, 570 00:30:20,731 --> 00:30:24,421 but yes, you can use star star, and then things get-- 571 00:30:24,421 --> 00:30:26,311 I'm sorry. 572 00:30:26,311 --> 00:30:28,501 We won't do that today and we won't do that often. 573 00:30:28,501 --> 00:30:31,051 In fact Python, another language, is just a couple of weeks 574 00:30:31,051 --> 00:30:32,221 away, so hang in there. 575 00:30:32,221 --> 00:30:32,921 Almost there. 576 00:30:32,921 --> 00:30:34,561 A question back here? 577 00:30:34,561 --> 00:30:36,331 Was there? 578 00:30:36,331 --> 00:30:38,191 That was-- more verbal feedback like that 579 00:30:38,191 --> 00:30:40,871 is helpful as we forge into the more complicated stuff. 580 00:30:40,871 --> 00:30:41,551 Other questions? 581 00:30:41,551 --> 00:30:42,909 Yeah. 582 00:30:42,909 --> 00:30:44,785 AUDIENCE: What's the point of [INAUDIBLE]?? 583 00:30:48,071 --> 00:30:51,161 DAVID J. MALAN: What's the point of printing the address? 584 00:30:51,161 --> 00:30:54,451 AUDIENCE: Like, using the address to [INAUDIBLE].. 585 00:30:54,451 --> 00:30:55,381 DAVID J. MALAN: Sure. 586 00:30:55,381 --> 00:30:56,521 What's the point of doing this? 587 00:30:56,521 --> 00:30:58,771 If you don't mind, let me-- let's get there in a moment. 588 00:30:58,771 --> 00:31:01,471 This is not the common use case, just printing out the address-- 589 00:31:01,471 --> 00:31:02,821 who really cares? 590 00:31:02,821 --> 00:31:05,401 At the moment we care only for the sake of discussion. 591 00:31:05,401 --> 00:31:07,453 We're soon going to start using these addresses. 592 00:31:07,453 --> 00:31:09,661 So hang in there just a little bit for that one, too, 593 00:31:09,661 --> 00:31:13,621 but it will solve some problems for us before long. 594 00:31:13,621 --> 00:31:17,311 So let's actually just now depict what was going on inside of the computer's 595 00:31:17,311 --> 00:31:19,691 memory just a moment ago. 596 00:31:19,691 --> 00:31:23,971 So if I toggle back here, let me redraw my computer's memory, 597 00:31:23,971 --> 00:31:27,421 now let me plop into the memory n, which is storing in this program 598 00:31:27,421 --> 00:31:28,471 the number 50. 599 00:31:28,471 --> 00:31:30,631 Where is p in my computer's memory? 600 00:31:30,631 --> 00:31:33,691 Specifically, I don't know and apparently it moves around each time I 601 00:31:33,691 --> 00:31:35,741 run the program so for the sake of discussion, 602 00:31:35,741 --> 00:31:40,711 let's just propose that if 50 ended up at address 0x123, I don't know-- 603 00:31:40,711 --> 00:31:43,471 p ends up over here, at address-- 604 00:31:43,471 --> 00:31:46,661 whoops-- at whatever address this is here. 605 00:31:46,661 --> 00:31:49,111 But notice a couple of curiosities now. 606 00:31:49,111 --> 00:31:52,621 If p is a pointer, it's the address of something. 607 00:31:52,621 --> 00:31:57,961 So the value in p should be an address, and I've indeed written it as such-- 608 00:31:57,961 --> 00:32:02,071 0x123, and technically there's not an x there, there's not a zero there, 609 00:32:02,071 --> 00:32:04,471 there's not even a 123 there per se-- there's 610 00:32:04,471 --> 00:32:08,011 a pattern of bits that represents the address 0x123. 611 00:32:08,011 --> 00:32:11,681 But again, that's weak zero-- don't care about binary day-to-day. 612 00:32:11,681 --> 00:32:17,761 So if this is p, and this I claimed was n, why is p so much bigger? 613 00:32:17,761 --> 00:32:20,231 Can someone conjecture here? 614 00:32:20,231 --> 00:32:25,061 Because it turns out whether n is an int or a char or a bool, 615 00:32:25,061 --> 00:32:27,701 which are different types-- heck, even a long-- 616 00:32:27,701 --> 00:32:31,871 it turns out that p is always going to take up eight squares on the board, 617 00:32:31,871 --> 00:32:33,951 but why might that be? 618 00:32:33,951 --> 00:32:35,261 What might explain that? 619 00:32:39,591 --> 00:32:41,507 Yeah, thoughts? 620 00:32:41,507 --> 00:32:45,451 AUDIENCE: Perhaps it allocates eight bytes, 621 00:32:45,451 --> 00:32:48,959 but it doesn't know the type of the data [INAUDIBLE].. 622 00:32:48,959 --> 00:32:50,001 DAVID J. MALAN: OK, fair. 623 00:32:50,001 --> 00:32:52,191 Maybe it's allocating eight bytes because it doesn't know the type. 624 00:32:52,191 --> 00:32:54,711 Turns out that's OK because an address is an address. 625 00:32:54,711 --> 00:32:58,281 It's really up to the programmer to use it as a string or a char or a bool. 626 00:32:58,281 --> 00:33:00,381 Other thoughts? 627 00:33:00,381 --> 00:33:05,443 AUDIENCE: Maybe the first four for the actual number and the last four 628 00:33:05,443 --> 00:33:11,033 is some null that [INAUDIBLE] where the pointer ends. 629 00:33:11,033 --> 00:33:12,241 DAVID J. MALAN: OK, possibly. 630 00:33:12,241 --> 00:33:15,211 It could be that pointers have some complexity like a backslash n 631 00:33:15,211 --> 00:33:18,091 or something curious like that, like we talked about for strings. 632 00:33:18,091 --> 00:33:19,751 Turns out that's not the case. 633 00:33:19,751 --> 00:33:23,281 It turns out that pointers nowadays typically are, but not 634 00:33:23,281 --> 00:33:25,921 always are eight bytes, a.k.a. 635 00:33:25,921 --> 00:33:29,101 64 bits, because you and I-- our Macs, our PCs, 636 00:33:29,101 --> 00:33:32,911 heck-- even our phones have a lot more memory than they did years ago. 637 00:33:32,911 --> 00:33:34,801 Back in the day, a pointer might have only 638 00:33:34,801 --> 00:33:38,701 been 32 bits, or even only eight bits way back in the day. 639 00:33:38,701 --> 00:33:41,551 It's considered 32 bits, because that was the norm for some time. 640 00:33:41,551 --> 00:33:45,091 How high can you count, roughly, if you've got 32 bits? 641 00:33:45,091 --> 00:33:47,901 What's the number we keep rattling off? 642 00:33:47,901 --> 00:33:53,061 32 bits is roughly 2 to the 32, so it's 4 billion, 643 00:33:53,061 --> 00:33:57,271 and I keep saying it's 2 billion if you do negative, but in the world of memory 644 00:33:57,271 --> 00:34:00,531 there's a reason I keep saying 2 billion bytes, two gigabytes, 645 00:34:00,531 --> 00:34:03,591 because for a very long time that was the maximum amount of memory 646 00:34:03,591 --> 00:34:04,621 a computer could have. 647 00:34:04,621 --> 00:34:05,121 Why? 648 00:34:05,121 --> 00:34:07,491 Because the pointers that the computers were using 649 00:34:07,491 --> 00:34:09,531 were only, for instance, 32 bits. 650 00:34:09,531 --> 00:34:12,591 And with 32 bits, depending on whether you allow for negatives or not, 651 00:34:12,591 --> 00:34:15,621 you can count as high as 2 billion, roughly, or maybe 4 billion 652 00:34:15,621 --> 00:34:17,961 but you know what-- your Mac, your PC, your phone 653 00:34:17,961 --> 00:34:22,441 could not have had five gigabytes of memory, or 5 billion bytes of memory. 654 00:34:22,441 --> 00:34:25,191 You certainly couldn't have had what computers nowadays come with, 655 00:34:25,191 --> 00:34:27,171 which might be 8 gigabytes of memory-- 656 00:34:27,171 --> 00:34:28,561 16 gigabytes of memory. 657 00:34:28,561 --> 00:34:29,211 Why? 658 00:34:29,211 --> 00:34:33,501 Because with 4 bytes, or 32 bits, you literally, physically, 659 00:34:33,501 --> 00:34:37,611 can't count that high, which means if I drew a picture of all of the memory we 660 00:34:37,611 --> 00:34:41,301 would run out of numbers to describe them, which means most of my memory 661 00:34:41,301 --> 00:34:42,631 would just be unusable. 662 00:34:42,631 --> 00:34:45,771 So pointers nowadays are 64 bits, or eight bytes. 663 00:34:45,771 --> 00:34:46,521 That's really big. 664 00:34:46,521 --> 00:34:48,438 I can't even pronounce how big that number is, 665 00:34:48,438 --> 00:34:51,051 but it's plenty for the next many years, and so 666 00:34:51,051 --> 00:34:52,881 we've drawn it that way on the board here. 667 00:34:52,881 --> 00:34:54,501 Now let's just abstract this away. 668 00:34:54,501 --> 00:34:56,209 Let's get rid of all the other bytes that 669 00:34:56,209 --> 00:34:58,911 are storing something or nothing else, and let's now 670 00:34:58,911 --> 00:35:02,241 start to abstract away this complexity because the reality is, 671 00:35:02,241 --> 00:35:04,131 to your question earlier-- 672 00:35:04,131 --> 00:35:06,441 what is this useful for, or what do we-- do we actually 673 00:35:06,441 --> 00:35:07,971 care about these addresses? 674 00:35:07,971 --> 00:35:08,961 Generally, no. 675 00:35:08,961 --> 00:35:11,061 We're doing this so that you see there's no magic. 676 00:35:11,061 --> 00:35:13,951 We're just moving things around and poking around in memory. 677 00:35:13,951 --> 00:35:16,791 But what a person would typically do when talking about pointers 678 00:35:16,791 --> 00:35:19,401 would literally be to just point at something. 679 00:35:19,401 --> 00:35:21,951 I really don't care what address n is at, 680 00:35:21,951 --> 00:35:25,131 so it suffices when general, when drawing pictures on a whiteboard, 681 00:35:25,131 --> 00:35:27,021 having a discussion with another programmer, 682 00:35:27,021 --> 00:35:31,341 you just draw an arrow from the pointer to the value in question, 683 00:35:31,341 --> 00:35:36,470 because neither you nor I probably care about the specifics of 0x whatever. 684 00:35:36,470 --> 00:35:39,813 There's your pointer-- it's literally an arrow, and we can see this. 685 00:35:39,813 --> 00:35:42,021 So it turns out that these pointers, these addresses, 686 00:35:42,021 --> 00:35:45,831 are not that dissimilar to what we've done for hundreds of years 687 00:35:45,831 --> 00:35:48,381 in the form of a postal system. 688 00:35:48,381 --> 00:35:50,121 For instance, here is a post office-- 689 00:35:50,121 --> 00:35:52,731 here, no-- here is a mailbox, and suppose 690 00:35:52,731 --> 00:35:55,431 that this is a mailbox labeled p. 691 00:35:55,431 --> 00:35:58,191 It's a pointer, and suppose there's another mailbox 692 00:35:58,191 --> 00:36:02,041 way over there, which is just another bite of my computer's memory. 693 00:36:02,041 --> 00:36:03,831 What are we really talking about? 694 00:36:03,831 --> 00:36:07,881 Well, you store in a computer's memory values like the number 50, 695 00:36:07,881 --> 00:36:11,841 or the word "hi" inside of your computer's memory at some location. 696 00:36:11,841 --> 00:36:15,921 But today we can also use those same memory locations 697 00:36:15,921 --> 00:36:17,551 to store the address of things. 698 00:36:17,551 --> 00:36:21,351 For instance, if I open this up here and I 699 00:36:21,351 --> 00:36:25,071 see OK, the value inside of this mailbox is not a number like 50, 700 00:36:25,071 --> 00:36:26,361 it's actually an address-- 701 00:36:26,361 --> 00:36:30,861 0x123-- that's like a pointer, a breadcrumb leading 702 00:36:30,861 --> 00:36:32,661 from one location in memory to another. 703 00:36:32,661 --> 00:36:35,161 And in fact, would someone who's seated roughly over there-- 704 00:36:35,161 --> 00:36:37,761 do you mind getting the mail over there? 705 00:36:37,761 --> 00:36:40,581 Any volunteers over in this section? 706 00:36:40,581 --> 00:36:42,931 Just need you to get to the mailbox before I do. 707 00:36:42,931 --> 00:36:44,781 Who's being volunteered? 708 00:36:44,781 --> 00:36:45,471 Oh yes, please. 709 00:36:45,471 --> 00:36:50,926 Whoever is gesturing most wildly, come on down. 710 00:36:50,926 --> 00:36:51,426 Sure. 711 00:36:57,861 --> 00:36:59,315 What's your name? 712 00:36:59,315 --> 00:37:00,078 AUDIENCE: Anfoo. 713 00:37:00,078 --> 00:37:01,161 DAVID J. MALAN: Say again? 714 00:37:01,161 --> 00:37:01,851 AUDIENCE: Anfoo. 715 00:37:01,851 --> 00:37:03,201 DAVID J. MALAN: Anfoo? 716 00:37:03,201 --> 00:37:06,081 OK, come on up to the edge of the stage there and just to be clear-- 717 00:37:06,081 --> 00:37:09,801 if this is p, that is apparently n, but to make clear 718 00:37:09,801 --> 00:37:12,621 what we're talking about when we're storing 0x whatever values-- 719 00:37:12,621 --> 00:37:15,771 like 0x123, that's essentially equivalent to my 720 00:37:15,771 --> 00:37:18,501 maybe pulling out something like this and just 721 00:37:18,501 --> 00:37:21,051 abstractly pointing to your mailbox there, 722 00:37:21,051 --> 00:37:25,311 or if you prefer, pointing to the mailbox-- 723 00:37:25,311 --> 00:37:26,271 OK, all right. 724 00:37:28,951 --> 00:37:29,451 Thank you. 725 00:37:29,451 --> 00:37:29,951 All right. 726 00:37:32,661 --> 00:37:34,821 This is akin to me pointing at your mailbox, 727 00:37:34,821 --> 00:37:36,863 and if you want to go ahead and open your mailbox 728 00:37:36,863 --> 00:37:43,201 and reveal to the crowd what's inside your mailbox labeled n. 729 00:37:43,201 --> 00:37:43,981 All right. 730 00:37:46,501 --> 00:37:48,601 Thank you. 731 00:37:48,601 --> 00:37:51,221 We have a little CS50 stress ball for your trouble. 732 00:37:51,221 --> 00:37:52,553 Thank you for coming up. 733 00:37:52,553 --> 00:37:55,261 So that's just to put a visual on what it is we're talking about, 734 00:37:55,261 --> 00:37:58,171 because it can get very abstract, very cryptic quickly when we're 735 00:37:58,171 --> 00:38:01,391 talking about addresses and memory and drawing it like these little squares. 736 00:38:01,391 --> 00:38:04,308 But if you think about just walking into a post office or an apartment 737 00:38:04,308 --> 00:38:07,261 complex that's got a lot of mailboxes, those mailboxes 738 00:38:07,261 --> 00:38:10,231 essentially are a big chunk of memory and each 739 00:38:10,231 --> 00:38:12,091 of those mailboxes has an address-- 740 00:38:12,091 --> 00:38:14,821 this is apartment one, two, three-- apartment 2 billion. 741 00:38:14,821 --> 00:38:18,091 And inside of those mailboxes can go anything 742 00:38:18,091 --> 00:38:20,261 that can be represented as information. 743 00:38:20,261 --> 00:38:23,341 It could be a number like n, or 50, or if you 744 00:38:23,341 --> 00:38:25,741 prefer it could be a number that represents 745 00:38:25,741 --> 00:38:27,631 the address of another mailbox. 746 00:38:27,631 --> 00:38:30,811 And this is akin, really, if you've ever had an apartment or you 747 00:38:30,811 --> 00:38:33,631 and your parents have moved, to having a forwarding address. 748 00:38:33,631 --> 00:38:36,001 It's like having the Post Office in the US 749 00:38:36,001 --> 00:38:39,481 put some kind of piece of paper in your old mailbox saying, 750 00:38:39,481 --> 00:38:41,911 actually forward it to that other mailbox. 751 00:38:41,911 --> 00:38:44,281 That really is all a pointer is doing. 752 00:38:44,281 --> 00:38:45,991 At the end of the day, it's just a number 753 00:38:45,991 --> 00:38:48,331 but it's a number being used in a different way 754 00:38:48,331 --> 00:38:50,461 and it's the syntax that we've introduced, 755 00:38:50,461 --> 00:38:54,271 not just int but int star, that tells the computer how 756 00:38:54,271 --> 00:38:58,741 to treat that number in this slightly different way. 757 00:38:58,741 --> 00:39:01,841 Are there any questions then, on this? 758 00:39:01,841 --> 00:39:03,962 Yeah, in back. 759 00:39:03,962 --> 00:39:06,379 AUDIENCE: If you had a variable, like int c, [INAUDIBLE].. 760 00:39:10,711 --> 00:39:12,691 DAVID J. MALAN: If I did int c and-- 761 00:39:12,691 --> 00:39:14,841 say the code again? 762 00:39:14,841 --> 00:39:17,011 Once more? 763 00:39:17,011 --> 00:39:19,141 Equal to n, so let me actually type it out. 764 00:39:19,141 --> 00:39:21,271 If I give myself another line of code, tell me 765 00:39:21,271 --> 00:39:27,251 one last time what to type. int is equal to n, like this? 766 00:39:27,251 --> 00:39:31,951 So this is OK, and I can't draw it quite quickly enough on the board here, 767 00:39:31,951 --> 00:39:36,181 but this would be like creating another four bytes somewhere in memory, maybe 768 00:39:36,181 --> 00:39:40,231 down here, that stores an identical copy of 50 769 00:39:40,231 --> 00:39:43,381 because the assignment operator from right to left copies one value 770 00:39:43,381 --> 00:39:44,201 to another. 771 00:39:44,201 --> 00:39:47,671 So that would just add one more rectangle of size four 772 00:39:47,671 --> 00:39:50,391 to this particular picture. 773 00:39:50,391 --> 00:39:52,371 If I'm answering your question as intended. 774 00:39:52,371 --> 00:39:57,231 OK, so that is week one style use of assignment operators before pointers. 775 00:39:57,231 --> 00:40:00,051 I could, though, start copying pointers but again, we'll 776 00:40:00,051 --> 00:40:01,881 come back to some of that complexity. 777 00:40:01,881 --> 00:40:03,421 Any other questions here? 778 00:40:03,421 --> 00:40:04,921 AUDIENCE: That was a great question. 779 00:40:04,921 --> 00:40:06,841 Does the pointer point-- 780 00:40:06,841 --> 00:40:10,084 does the same pointer point to the new replica as well? 781 00:40:10,084 --> 00:40:11,501 DAVID J. MALAN: Ah, good question. 782 00:40:11,501 --> 00:40:12,406 Short answer, no. 783 00:40:12,406 --> 00:40:17,101 And to repeat for the camera, if I create a second variable like this, 784 00:40:17,101 --> 00:40:21,271 int c equals n, and I claim without actually drawing it on the board 785 00:40:21,271 --> 00:40:25,191 that this gives me another rectangle, the value of which is also 50, 786 00:40:25,191 --> 00:40:26,681 p does not get touched. 787 00:40:26,681 --> 00:40:29,041 And this is what's important and really characteristic 788 00:40:29,041 --> 00:40:33,001 of C. Nothing happens automatically for you. 789 00:40:33,001 --> 00:40:36,581 p is not going to be updated unless you update p in some way, 790 00:40:36,581 --> 00:40:39,121 so creating a third variable called c-- even 791 00:40:39,121 --> 00:40:41,521 if you're copying its value from right to left, 792 00:40:41,521 --> 00:40:44,701 that has no effect on anything else in the program. 793 00:40:44,701 --> 00:40:46,031 A good question. 794 00:40:46,031 --> 00:40:52,201 So what have we seen that's perhaps now a little more explainable? 795 00:40:52,201 --> 00:40:56,221 Well, recall that we talked quite a bit last week about strings, and just 796 00:40:56,221 --> 00:41:02,101 to recap in layperson's terms, what is this string as you now understand it? 797 00:41:02,101 --> 00:41:04,191 So say-- well, let me take a specific hand here. 798 00:41:04,191 --> 00:41:05,091 What's a string? 799 00:41:05,091 --> 00:41:06,926 How about over here. 800 00:41:06,926 --> 00:41:08,301 AUDIENCE: An array of characters. 801 00:41:08,301 --> 00:41:08,811 DAVID J. MALAN: OK, sure. 802 00:41:08,811 --> 00:41:09,728 Both of you are right. 803 00:41:09,728 --> 00:41:10,971 An array of characters. 804 00:41:10,971 --> 00:41:13,761 An array of characters, and we-- 805 00:41:13,761 --> 00:41:16,881 I claimed-- or revealed last week that string is not technically 806 00:41:16,881 --> 00:41:20,151 a feature built into C. It's not an official data type 807 00:41:20,151 --> 00:41:22,401 but every programmer in most any language 808 00:41:22,401 --> 00:41:25,641 refers to sequences of characters-- words, letters, 809 00:41:25,641 --> 00:41:27,451 paragraphs-- as strings. 810 00:41:27,451 --> 00:41:30,771 So the vernacular exists but the data type doesn't typically 811 00:41:30,771 --> 00:41:34,111 exist per se in C. So what we're about to do, if you will, 812 00:41:34,111 --> 00:41:36,951 for dramatic effect, is take off some training wheels today. 813 00:41:36,951 --> 00:41:41,451 The CS50 library implemented in the form of the header file CS50.8-- 814 00:41:41,451 --> 00:41:43,581 we claim has had a bunch of things in it. 815 00:41:43,581 --> 00:41:46,761 Prototypes for GetString, prototypes for GetInt, 816 00:41:46,761 --> 00:41:49,281 and all of those other functions, but it turns out 817 00:41:49,281 --> 00:41:53,481 it also is what defines the word "string" in such a way 818 00:41:53,481 --> 00:41:55,981 that you all can use it these past several weeks. 819 00:41:55,981 --> 00:41:58,641 So let's take a look at an example of a string in use. 820 00:41:58,641 --> 00:42:00,681 Here, for instance, is a tiny bit of code 821 00:42:00,681 --> 00:42:05,421 that uses the word "string," creating a variable called s 822 00:42:05,421 --> 00:42:08,083 and then storing quote unquote, hi, exclamation point. 823 00:42:08,083 --> 00:42:10,791 Let's consider what this looks like now in the computer's memory. 824 00:42:10,791 --> 00:42:13,541 I don't care about all the other bytes, let's just focus on these, 825 00:42:13,541 --> 00:42:16,551 and this per last week is how "hi" might be stored. 826 00:42:16,551 --> 00:42:19,311 h-i exclamation point and then one more, as someone already 827 00:42:19,311 --> 00:42:23,151 observed, that sentinel value-- that null character which 828 00:42:23,151 --> 00:42:26,558 just means eight zero bits to demarcate the end of that string 829 00:42:26,558 --> 00:42:28,641 just in case there's something to the right of it, 830 00:42:28,641 --> 00:42:31,801 the computer can now distinguish one string from another. 831 00:42:31,801 --> 00:42:35,004 So last week we introduced this new syntax. 832 00:42:35,004 --> 00:42:36,921 Well, if strings are just arrays of characters 833 00:42:36,921 --> 00:42:39,831 you can then very cleverly use that square bracket notation 834 00:42:39,831 --> 00:42:44,631 and go to location zero or one or two, which are like addresses, 835 00:42:44,631 --> 00:42:46,431 but they're relative to the string. 836 00:42:46,431 --> 00:42:51,381 This could be at 0x123 or 0x456, but with this bracket notation 837 00:42:51,381 --> 00:42:54,381 zero is always the beginning of the string, one is the next, 838 00:42:54,381 --> 00:42:55,801 two is the next, and so forth. 839 00:42:55,801 --> 00:43:00,561 So that was our array syntax for indexing into an array. 840 00:43:00,561 --> 00:43:03,471 But technically speaking, we can go a little deeper today-- 841 00:43:03,471 --> 00:43:09,741 technically speaking, if hi is starting at the address 0x123 then 842 00:43:09,741 --> 00:43:15,711 it stands to reason that i is at 0x124, exclamation point's at 0x125, 843 00:43:15,711 --> 00:43:18,711 and the null is that 0x126. 844 00:43:18,711 --> 00:43:23,331 Now, I don't care about 123 per se, but even though this is hexadecimal, 845 00:43:23,331 --> 00:43:24,591 this is correct math. 846 00:43:24,591 --> 00:43:28,101 Even in hex, if you just add one when you start at 0x123, 847 00:43:28,101 --> 00:43:30,456 the next number is four, five, six at the end. 848 00:43:30,456 --> 00:43:32,331 I don't have to worry about A's, B's, and C's 849 00:43:32,331 --> 00:43:35,341 because I'm not counting that high in this example. 850 00:43:35,341 --> 00:43:39,531 So if that's the case, and my computer is actually 851 00:43:39,531 --> 00:43:47,271 laying out the word hi in memory like that, well, what exactly is s? 852 00:43:47,271 --> 00:43:50,001 What exactly is s if, at the end of the day, 853 00:43:50,001 --> 00:43:56,031 H-I exclamation point null is storing-- or is or stored at these addresses? 854 00:43:56,031 --> 00:43:57,006 Where is s? 855 00:43:57,006 --> 00:43:58,881 Now that I've taken off those training wheels 856 00:43:58,881 --> 00:44:02,481 and showed you where H-I exclamation point null actually are, 857 00:44:02,481 --> 00:44:04,221 what happened to s? 858 00:44:04,221 --> 00:44:08,211 Well s, as always, is actually a variable. 859 00:44:08,211 --> 00:44:10,251 Even in the code I proposed a moment ago, 860 00:44:10,251 --> 00:44:13,551 s is apparently a data type that yes, doesn't come with C, 861 00:44:13,551 --> 00:44:16,101 but CS50's library makes it exist. 862 00:44:16,101 --> 00:44:21,471 s is a variable of type string, so where is s in this picture? 863 00:44:21,471 --> 00:44:25,431 Well, it turns out that s might be up here. 864 00:44:25,431 --> 00:44:28,971 Again, I'm just drawing it anywhere for the sake of discussion, 865 00:44:28,971 --> 00:44:33,141 but s is a variable per that line of code. 866 00:44:33,141 --> 00:44:36,978 What s is storing, apparently, I claim, is 0x123. 867 00:44:36,978 --> 00:44:40,311 I actually don't really care about these addresses, so let's abstract that away. 868 00:44:40,311 --> 00:44:45,591 s is apparently, as of now, today, one week later, just a pointer 869 00:44:45,591 --> 00:44:46,761 to a character. 870 00:44:46,761 --> 00:44:49,311 Specifically, the first character in s. 871 00:44:49,311 --> 00:44:51,411 And this is the last piece of the puzzle. 872 00:44:51,411 --> 00:44:54,981 Last week we had this clever way of demarcating the end of a string. 873 00:44:54,981 --> 00:44:59,901 Well, it turns out that strings are represented in the computer's memory 874 00:44:59,901 --> 00:45:03,861 as a variable that is a pointer, inside of which 875 00:45:03,861 --> 00:45:06,901 is the address of the first character in the string. 876 00:45:06,901 --> 00:45:09,951 So if s points at the first character and you 877 00:45:09,951 --> 00:45:12,501 can trust that backslash zero is at the end of the string, 878 00:45:12,501 --> 00:45:18,091 that's literally all you need to figure out where a string begins and ends. 879 00:45:18,091 --> 00:45:19,531 So what do I mean by this? 880 00:45:19,531 --> 00:45:21,141 Well, let's be a little more concrete. 881 00:45:21,141 --> 00:45:24,801 In terms of this picture, if I've started with this line of code here, 882 00:45:24,801 --> 00:45:29,961 it turns out all this time since week 1, that the word string has just 883 00:45:29,961 --> 00:45:36,871 semi-secretly been an alias for char star. 884 00:45:36,871 --> 00:45:39,391 I know, so char star. 885 00:45:39,391 --> 00:45:40,841 So why does this make sense? 886 00:45:40,841 --> 00:45:44,081 It's a little weird still, but if in our previous example 887 00:45:44,081 --> 00:45:47,671 we were able to store the address of an integer by declaring a variable 888 00:45:47,671 --> 00:45:49,831 called p, as int star p-- 889 00:45:49,831 --> 00:45:52,681 well, if as of now strings are just the address 890 00:45:52,681 --> 00:45:58,111 of the first character in a string, then probably a string is just a char star 891 00:45:58,111 --> 00:46:01,861 because that means s is the address of a character, the very 892 00:46:01,861 --> 00:46:03,461 first character in the string. 893 00:46:03,461 --> 00:46:07,441 Now, the string might have three letters like it did, or four, or even a hundred 894 00:46:07,441 --> 00:46:09,571 if it's a long paragraph, but that's fine 895 00:46:09,571 --> 00:46:11,488 because you can trust that there's going to be 896 00:46:11,488 --> 00:46:13,181 that null character at the very end. 897 00:46:13,181 --> 00:46:16,921 So this is a general purpose way of representing strings 898 00:46:16,921 --> 00:46:20,041 using this new mechanism in C. 899 00:46:20,041 --> 00:46:23,221 So in fact, let me go ahead here and introduce maybe 900 00:46:23,221 --> 00:46:25,061 a couple of manipulations of this. 901 00:46:25,061 --> 00:46:28,831 Let me go back to my code here, and let's get rid of this integer stuff, 902 00:46:28,831 --> 00:46:32,381 and let's instead now do, for instance, this. 903 00:46:32,381 --> 00:46:37,383 Let me add in the CS50 library, so we'll include CS50.H for now. 904 00:46:37,383 --> 00:46:39,091 I'm going to go ahead and inside of main, 905 00:46:39,091 --> 00:46:41,971 give myself a string s equals hi exclamation point. 906 00:46:41,971 --> 00:46:43,621 I don't type the backslash zero. 907 00:46:43,621 --> 00:46:48,228 C does that for me automatically by using my double quotes like this. 908 00:46:48,228 --> 00:46:49,811 Now let me just go ahead and print it. 909 00:46:49,811 --> 00:46:52,981 So this again is week 1 style stuff where I'm just printing a string. 910 00:46:52,981 --> 00:46:54,611 No pointers yet. 911 00:46:54,611 --> 00:46:59,761 So let me do make address, Enter, ./address, and hopefully I see hi, 912 00:46:59,761 --> 00:47:01,391 so nothing new there. 913 00:47:01,391 --> 00:47:05,341 But let's start to peel back some of these layers here. 914 00:47:05,341 --> 00:47:09,361 Let me first of all, get rid of the CS50 library for a moment 915 00:47:09,361 --> 00:47:13,651 and let me change string to char star. 916 00:47:13,651 --> 00:47:15,901 And it's a little bit weird but yes, the convention 917 00:47:15,901 --> 00:47:19,899 is to say char, a space, then the star, and then immediately thereafter 918 00:47:19,899 --> 00:47:20,941 the name of the variable. 919 00:47:20,941 --> 00:47:23,691 Strictly speaking though, you might see textbooks or websites that 920 00:47:23,691 --> 00:47:26,671 do it like this or like this, but the canonical way 921 00:47:26,671 --> 00:47:28,451 is typically to do it like that. 922 00:47:28,451 --> 00:47:31,311 So now no more CS50 library, no more training wheels, if you will. 923 00:47:31,311 --> 00:47:33,821 I'm just treating strings for what they really are. 924 00:47:33,821 --> 00:47:37,021 Let me go ahead and do make address, Enter-- 925 00:47:37,021 --> 00:47:39,181 so far so good-- ./address-- 926 00:47:39,181 --> 00:47:40,651 and that, too, still works. 927 00:47:40,651 --> 00:47:44,851 So %s is a thing that comes with printf because the word string is programmer 928 00:47:44,851 --> 00:47:48,901 terminology but strictly speaking C doesn't have a string data type. 929 00:47:48,901 --> 00:47:53,221 It's always been char star, so what this means now is I 930 00:47:53,221 --> 00:47:56,761 can start to have some fun with these basic ideas, 931 00:47:56,761 --> 00:47:59,891 even though this is not purposeful other than for the sake of discussion. 932 00:47:59,891 --> 00:48:03,901 But if s is this-- let me go back and give myself the CS50 library. 933 00:48:03,901 --> 00:48:06,391 Let's put those training wheels back on for just a moment 934 00:48:06,391 --> 00:48:09,221 so that I can do one manipulation at a time. 935 00:48:09,221 --> 00:48:12,131 Here's my string s, as before. 936 00:48:12,131 --> 00:48:15,181 Well, let me go ahead and declare a char called c, 937 00:48:15,181 --> 00:48:20,221 and let me store the first character in the string there, which is 938 00:48:20,221 --> 00:48:22,891 s bracket zero, and that should give me h. 939 00:48:22,891 --> 00:48:25,951 And then just for kicks, let me go ahead and do char star-- 940 00:48:25,951 --> 00:48:33,061 whoops-- let me go ahead and do char star p equals ampersand c, 941 00:48:33,061 --> 00:48:35,491 and see what this actually prints for me. 942 00:48:35,491 --> 00:48:38,861 Let me go ahead and print out what p is here. 943 00:48:38,861 --> 00:48:40,091 So we're just playing around. 944 00:48:40,091 --> 00:48:43,681 So make address-- so far so good-- ./address. 945 00:48:43,681 --> 00:48:46,021 All right, so what have I just done? 946 00:48:46,021 --> 00:48:51,151 I've just created a char c and stored in it the letter H, which 947 00:48:51,151 --> 00:48:55,531 is the same thing as s bracket I, then I'm saying, what's the address of c, 948 00:48:55,531 --> 00:48:58,391 and that's apparently 0x7FF whatever. 949 00:48:58,391 --> 00:48:59,641 So that's the address. 950 00:48:59,641 --> 00:49:01,841 But I technically didn't have to do that. 951 00:49:01,841 --> 00:49:03,641 Let me go ahead and do two things now. 952 00:49:03,641 --> 00:49:12,001 Instead of just printing p, let me go ahead and print out maybe s itself. 953 00:49:12,001 --> 00:49:14,461 Let me go ahead and do make address, Enter-- 954 00:49:14,461 --> 00:49:17,611 so far so good-- ./address and-- 955 00:49:17,611 --> 00:49:20,371 damn it, what did I do wrong. 956 00:49:20,371 --> 00:49:22,201 Oh shoot, I didn't want to do that. 957 00:49:22,201 --> 00:49:25,781 Oh, I really made a mess of this. 958 00:49:25,781 --> 00:49:28,561 What did I want to do here? 959 00:49:28,561 --> 00:49:31,831 That was supposed to be impressive but it was the opposite. 960 00:49:31,831 --> 00:49:35,321 So let me turn it around. 961 00:49:35,321 --> 00:49:39,181 So if I intended to do this, why are lines nine and 10 962 00:49:39,181 --> 00:49:41,461 printing different values? 963 00:49:41,461 --> 00:49:44,641 Didn't really intend to go here, but let me try to save this. 964 00:49:44,641 --> 00:49:51,991 Why are we seeing different addresses, namely this address 402004 for s, 965 00:49:51,991 --> 00:49:57,031 and then 0x7FF for p? 966 00:49:57,031 --> 00:49:57,991 Any thoughts? 967 00:49:57,991 --> 00:50:00,121 Yeah, over here. 968 00:50:00,121 --> 00:50:02,571 AUDIENCE: [INAUDIBLE] is the character c is 969 00:50:02,571 --> 00:50:07,471 its own sort of location of the [INAUDIBLE],, 970 00:50:07,471 --> 00:50:09,513 and it's taking off just the values [INAUDIBLE].. 971 00:50:09,513 --> 00:50:10,513 DAVID J. MALAN: Correct. 972 00:50:10,513 --> 00:50:12,684 So if I really wanted to weasel my way out of this, 973 00:50:12,684 --> 00:50:15,351 this is a great answer to the previous question which was about, 974 00:50:15,351 --> 00:50:20,091 what if I introduce another variable, c, that's a copy of the value, 975 00:50:20,091 --> 00:50:22,791 and not in this case an int, but an actual char. 976 00:50:22,791 --> 00:50:28,281 Here, I've made c be a copy of the character that's at the beginning of s, 977 00:50:28,281 --> 00:50:29,381 but that's indeed a copy. 978 00:50:29,381 --> 00:50:31,131 So if I were to draw it on the screen that 979 00:50:31,131 --> 00:50:35,271 would give me a different rectangle in which this copy of h 980 00:50:35,271 --> 00:50:36,681 would actually be stored. 981 00:50:36,681 --> 00:50:38,631 So I didn't intend to do this, but what you're 982 00:50:38,631 --> 00:50:40,618 seeing is yes, the address of s-- 983 00:50:40,618 --> 00:50:42,951 and apparently that's at a pretty low address by default 984 00:50:42,951 --> 00:50:44,961 here-- then you're seeing the address of c. 985 00:50:44,961 --> 00:50:47,841 But even though each of them is h, I claim 986 00:50:47,841 --> 00:50:49,803 one is at a different address in memory. 987 00:50:49,803 --> 00:50:51,261 And this has always been happening. 988 00:50:51,261 --> 00:50:53,991 Any time you created one variable or another it was ending up here, 989 00:50:53,991 --> 00:50:55,908 or here, or here, or somewhere else in memory. 990 00:50:55,908 --> 00:50:58,911 Now for the first time all we're doing is actually just poking around 991 00:50:58,911 --> 00:51:02,371 the computer's memory to see what is actually there. 992 00:51:02,371 --> 00:51:06,021 So let me actually back this up a little bit 993 00:51:06,021 --> 00:51:09,391 and do what I intended to do here, which was something like this. 994 00:51:09,391 --> 00:51:13,551 So if string s equals quote unquote, hi, let's go ahead 995 00:51:13,551 --> 00:51:23,051 and give myself a pointer, called p, to the first character in s. 996 00:51:23,051 --> 00:51:26,891 All right, so now let me go ahead and print out the value of this pointer, 997 00:51:26,891 --> 00:51:29,034 %p, printing out p. 998 00:51:29,034 --> 00:51:30,951 So we're just going to do one thing at a time. 999 00:51:30,951 --> 00:51:33,761 So make address, Enter, ./address. 1000 00:51:33,761 --> 00:51:38,861 There, at the moment, is the address of the first character in s. 1001 00:51:38,861 --> 00:51:40,781 What I meant to do now, was this. 1002 00:51:40,781 --> 00:51:43,721 If I want to print out two things this time, 1003 00:51:43,721 --> 00:51:49,391 let me print out not only what p is, but also what s itself originally is. 1004 00:51:49,391 --> 00:51:53,411 Because if I claim that everyone from last week should be comfortable with 1005 00:51:53,411 --> 00:51:56,381 s bracket zero just representing the first character in s 1006 00:51:56,381 --> 00:51:59,621 by definition of strings being arrays of characters. 1007 00:51:59,621 --> 00:52:05,871 Then s, as of today, is itself the address of a character, 1008 00:52:05,871 --> 00:52:06,761 the first one in s. 1009 00:52:06,761 --> 00:52:10,721 So if I now do make address, and do ./address, 1010 00:52:10,721 --> 00:52:13,481 this time I see the same exact things. 1011 00:52:13,481 --> 00:52:14,081 Thank you. 1012 00:52:18,228 --> 00:52:20,811 This is really the lamest sort of thing to be applauding over, 1013 00:52:20,811 --> 00:52:26,571 but what we're demonstrating here is that s is by definition the address 1014 00:52:26,571 --> 00:52:28,261 of the first character in c. 1015 00:52:28,261 --> 00:52:30,931 So if we borrow some of our mental model from last week-- 1016 00:52:30,931 --> 00:52:35,811 well, if s bracket zero is the first character in c, doing the ampersand on 1017 00:52:35,811 --> 00:52:38,351 that expression should be the same as s. 1018 00:52:38,351 --> 00:52:40,851 Now this isn't to say that we would jump through these hoops 1019 00:52:40,851 --> 00:52:45,051 all the time with this much syntax, but this is just to do proof by example 1020 00:52:45,051 --> 00:52:51,171 that s is in fact, as I claimed a moment ago, just the address of a character. 1021 00:52:51,171 --> 00:52:54,651 Not even multiple characters, it's the address of a single character, 1022 00:52:54,651 --> 00:52:58,581 but the key thing is it's the address of the first character in the string, 1023 00:52:58,581 --> 00:53:01,821 and per last week we trust that C is going 1024 00:53:01,821 --> 00:53:04,881 to look for that null character at the very end just 1025 00:53:04,881 --> 00:53:08,721 to make sure it knows where the string actually ends. 1026 00:53:08,721 --> 00:53:12,317 All right, a question came up over here. 1027 00:53:12,317 --> 00:53:25,581 AUDIENCE: [INAUDIBLE] 1028 00:53:25,581 --> 00:53:26,581 DAVID J. MALAN: Correct. 1029 00:53:26,581 --> 00:53:30,181 To summarize, on line eight, when I am using %p-- 1030 00:53:30,181 --> 00:53:33,181 that just means print a pointer value, so 0x something-- 1031 00:53:33,181 --> 00:53:35,581 I'm passing it s. 1032 00:53:35,581 --> 00:53:41,281 Previously, when we used %s, printf knew to print not just the first character 1033 00:53:41,281 --> 00:53:45,481 of s, but h, i, exclamation point, and then stop when it hits the backslash 1034 00:53:45,481 --> 00:53:46,621 zero. 1035 00:53:46,621 --> 00:53:51,841 p is different. %p tells the computer to go to that address-- 1036 00:53:51,841 --> 00:53:56,711 sorry, tells the computer to print that address on the screen. 1037 00:53:56,711 --> 00:53:59,761 So this is where %s all this time has been powerful. 1038 00:53:59,761 --> 00:54:03,961 The reason printf worked in week 1 and 2 and 3 1039 00:54:03,961 --> 00:54:07,261 was because printf was designed by some human years ago 1040 00:54:07,261 --> 00:54:10,291 to go to the address that's being passed in-- for instance, 1041 00:54:10,291 --> 00:54:12,631 s-- and print out character after character 1042 00:54:12,631 --> 00:54:16,291 after character until it sees the null character backslash zero, 1043 00:54:16,291 --> 00:54:17,891 and then stop printing it. 1044 00:54:17,891 --> 00:54:21,481 So that's-- you're getting a lot of functionality for free from %s. 1045 00:54:21,481 --> 00:54:23,911 Today we're using something much simpler, %p, 1046 00:54:23,911 --> 00:54:27,211 which just literally prints what s is. 1047 00:54:27,211 --> 00:54:28,951 And the reason we don't do this in week 1 1048 00:54:28,951 --> 00:54:31,021 is just because this is like way too much 1049 00:54:31,021 --> 00:54:33,021 to be interesting when all you want to print out 1050 00:54:33,021 --> 00:54:34,541 is hi or hello, world, or the like. 1051 00:54:34,541 --> 00:54:36,511 But now what we're really doing is revealing 1052 00:54:36,511 --> 00:54:38,941 what's been going on this whole time. 1053 00:54:38,941 --> 00:54:40,678 And let me make one other example here. 1054 00:54:40,678 --> 00:54:42,511 Let me go ahead and get rid of this variable 1055 00:54:42,511 --> 00:54:45,901 here and let me just print out a few things to make the same point. 1056 00:54:45,901 --> 00:54:50,131 I'm going to print out not just s like I did here, but let's go ahead 1057 00:54:50,131 --> 00:54:51,181 and print out every-- 1058 00:54:51,181 --> 00:54:53,071 the address of every character in s. 1059 00:54:53,071 --> 00:54:57,353 So let's get the first letter in s and get its address, 1060 00:54:57,353 --> 00:54:59,311 and I'm going to do copy paste for time's sake, 1061 00:54:59,311 --> 00:55:02,521 but not something I would do frequently. 1062 00:55:02,521 --> 00:55:06,034 So let me print out the address of the first character, the second character, 1063 00:55:06,034 --> 00:55:07,951 the third, and actually even the fourth, which 1064 00:55:07,951 --> 00:55:11,321 is the backslash zero, by doing this. 1065 00:55:11,321 --> 00:55:15,931 So when I compiled this program-- make address, ./address-- 1066 00:55:15,931 --> 00:55:19,441 I should see two identical values and then 1067 00:55:19,441 --> 00:55:21,931 additional values that are one byte away. 1068 00:55:21,931 --> 00:55:27,571 In my diagram a moment ago, my addresses were arbitrarily 0x123, 124, 125, 126. 1069 00:55:27,571 --> 00:55:33,841 Now it starts at, by chance, 0x402004, which is s. 1070 00:55:33,841 --> 00:55:37,381 0x402004 is the same thing as s because I'm just 1071 00:55:37,381 --> 00:55:39,991 saying go to the first character and then get its address. 1072 00:55:39,991 --> 00:55:41,491 Those are one in the same now. 1073 00:55:41,491 --> 00:55:47,401 And then after that is 0x402005, 006, 007, 1074 00:55:47,401 --> 00:55:49,181 because that is just like the diagram. 1075 00:55:49,181 --> 00:55:52,981 Go to the i, to the exclamation point, and to the null character. 1076 00:55:52,981 --> 00:55:55,891 So all I'm doing now is using my newfound understanding of what 1077 00:55:55,891 --> 00:55:59,251 ampersand does and what the star does, is I'm just playing around. 1078 00:55:59,251 --> 00:56:02,149 I'm poking around in the computer's memory. 1079 00:56:02,149 --> 00:56:03,691 Just to demonstrate there's no magic. 1080 00:56:03,691 --> 00:56:06,661 It's all there very deliberately because I or printf or someone 1081 00:56:06,661 --> 00:56:07,441 else put it there. 1082 00:56:07,441 --> 00:56:09,166 Yeah. 1083 00:56:09,166 --> 00:56:15,894 AUDIENCE: [INAUDIBLE] 1084 00:56:15,894 --> 00:56:17,561 DAVID J. MALAN: Really good observation. 1085 00:56:17,561 --> 00:56:21,071 So it's indeed the case that hi, unlike 50, 1086 00:56:21,071 --> 00:56:26,291 is ending up at a very low address, not the 0x7FF wherever it was. 1087 00:56:26,291 --> 00:56:29,261 That's actually because, long story short, strings 1088 00:56:29,261 --> 00:56:32,231 are often stored in a different part of the computer's memory-- 1089 00:56:32,231 --> 00:56:34,331 more on that later today-- for efficiency. 1090 00:56:34,331 --> 00:56:37,541 There's actually only going to be one copy of the word "hi" and exclamation 1091 00:56:37,541 --> 00:56:40,821 point, and the computer is going to tuck it at the beginning of my memory, 1092 00:56:40,821 --> 00:56:43,751 but other values like ints and floats and the 1093 00:56:43,751 --> 00:56:46,391 like-- they end up lower in memory by convention. 1094 00:56:46,391 --> 00:56:49,641 But a good observation, because that is consistent here. 1095 00:56:49,641 --> 00:56:53,111 All right, so a couple final details then, on what's been going on here. 1096 00:56:53,111 --> 00:56:58,691 Let me go ahead and claim that we implemented char star-- 1097 00:56:58,691 --> 00:57:01,391 or rather, string as a char star as follows. 1098 00:57:01,391 --> 00:57:03,731 As of last week we were writing this code. 1099 00:57:03,731 --> 00:57:07,961 As of this week, we can now start writing this code because char star 1100 00:57:07,961 --> 00:57:11,541 specifically, we invented in the CS50 library. 1101 00:57:11,541 --> 00:57:14,891 But it turns out you've seen a way of inventing your own data types. 1102 00:57:14,891 --> 00:57:16,631 Recall this thing here. 1103 00:57:16,631 --> 00:57:20,861 We played around last time with data structures, or the struct keyword in C, 1104 00:57:20,861 --> 00:57:24,641 and briefly the typedef keyword, which defines a type for you. 1105 00:57:24,641 --> 00:57:26,651 And if I highlight what's interesting here, 1106 00:57:26,651 --> 00:57:30,341 the way we invented a person data type last time 1107 00:57:30,341 --> 00:57:33,401 was to define a person as having two variables inside of it-- 1108 00:57:33,401 --> 00:57:38,598 a structure that encapsulates a name and encapsulates a number. 1109 00:57:38,598 --> 00:57:41,681 Now even though the syntax is a little different today because of the star 1110 00:57:41,681 --> 00:57:47,771 thing, notice that this could be a similar application of that idea. 1111 00:57:47,771 --> 00:57:52,061 If I want to create a type called string, highlighted in yellow here, 1112 00:57:52,061 --> 00:57:56,231 then I use typedef to make it defined to be char star. 1113 00:57:56,231 --> 00:57:59,951 So this is literally all that has ever been in CS50.h, 1114 00:57:59,951 --> 00:58:02,771 in addition to those prototypes of functions we've talked about. 1115 00:58:02,771 --> 00:58:05,831 typedef char star string is a one-line code 1116 00:58:05,831 --> 00:58:10,558 that brings the word string as a data type into existence, 1117 00:58:10,558 --> 00:58:12,141 and that's all that's ever been there. 1118 00:58:12,141 --> 00:58:15,281 But the star, the char star, is just too much in week 1. 1119 00:58:15,281 --> 00:58:18,671 We wait until this point to peel back that layer. 1120 00:58:18,671 --> 00:58:21,161 are any questions, then, on what a string is? 1121 00:58:21,161 --> 00:58:23,741 What star or the ampersand are doing? 1122 00:58:23,741 --> 00:58:25,511 Yeah. 1123 00:58:25,511 --> 00:58:28,608 AUDIENCE: [INAUDIBLE] 1124 00:58:28,608 --> 00:58:29,691 DAVID J. MALAN: Oh my God. 1125 00:58:29,691 --> 00:58:31,071 Massive spoiler, but yes. 1126 00:58:31,071 --> 00:58:34,671 If that is-- is that why when you compare two strings as I briefly 1127 00:58:34,671 --> 00:58:38,671 did, or almost did, problems arise. 1128 00:58:38,671 --> 00:58:40,971 And in fact yes, last week we use str compare-- 1129 00:58:40,971 --> 00:58:45,351 STRCMP-- for a very deliberate reason because yes, the spoiler is I 1130 00:58:45,351 --> 00:58:49,941 accidentally would have compared two addresses in memory, not the strings 1131 00:58:49,941 --> 00:58:52,111 at those addresses. 1132 00:58:52,111 --> 00:58:53,251 Other questions here. 1133 00:58:55,213 --> 00:58:58,171 All right, well, before we give ourselves maybe a 10 minute break here, 1134 00:58:58,171 --> 00:58:59,401 we have lots of pieces of paper. 1135 00:58:59,401 --> 00:59:02,191 If anyone wants to come on up and play with this big stack of Post-Its, 1136 00:59:02,191 --> 00:59:04,201 if you want to make your own eight by eight grid of something 1137 00:59:04,201 --> 00:59:07,261 to share with the class if you're artistically inclined, come on up. 1138 00:59:07,261 --> 00:59:09,991 Otherwise, let's take 10 minutes and will return after 10. 1139 00:59:09,991 --> 00:59:14,911 All right, so let's come back to this question of how 1140 00:59:14,911 --> 00:59:17,881 we can start to use these pointers and these addresses, ultimately 1141 00:59:17,881 --> 00:59:18,971 in an interesting way. 1142 00:59:18,971 --> 00:59:21,211 The goal ultimately next week is going to be 1143 00:59:21,211 --> 00:59:24,931 to use these addresses to really stitch together more complicated data 1144 00:59:24,931 --> 00:59:28,261 structures than just persons, like last week, or candidates 1145 00:59:28,261 --> 00:59:30,061 in the context of an electoral algorithm, 1146 00:59:30,061 --> 00:59:33,631 if you will, and actually really use our memory in the most versatile way 1147 00:59:33,631 --> 00:59:36,691 to represent not just images but maybe videos 1148 00:59:36,691 --> 00:59:39,191 and other two-dimensional structures as well. 1149 00:59:39,191 --> 00:59:41,581 But for now, let's come back to this address example, 1150 00:59:41,581 --> 00:59:46,561 whittle it down to just a hi initially, and see what's going on again, here 1151 00:59:46,561 --> 00:59:47,461 underneath the hood. 1152 00:59:47,461 --> 00:59:50,401 So let me re-add the CS50 library just so we 1153 00:59:50,401 --> 00:59:54,031 use our synonym for a moment, that is the word string, 1154 00:59:54,031 --> 00:59:56,161 and I'll redefine s as a string. 1155 00:59:56,161 --> 00:59:58,831 And what I didn't mention before is that these double quotes 1156 00:59:58,831 --> 01:00:01,681 that you've been using for some time are actually a little special. 1157 01:00:01,681 --> 01:00:04,921 The double quotes are a clue to the compiler 1158 01:00:04,921 --> 01:00:09,311 that what is between them is in fact a string as we now know it, 1159 01:00:09,311 --> 01:00:12,571 which means the compiler will do all the work of figuring out 1160 01:00:12,571 --> 01:00:15,331 where to put the h, the i, the exclamation point, 1161 01:00:15,331 --> 01:00:18,361 and even adding for you automatically a backslash zero. 1162 01:00:18,361 --> 01:00:20,581 And what the compiler will do for you, too, 1163 01:00:20,581 --> 01:00:23,461 is figure out what address all four of those chars 1164 01:00:23,461 --> 01:00:27,331 ended up at and store it for you in the variable s. 1165 01:00:27,331 --> 01:00:31,531 So that's why it just happens with strings without using ampersands 1166 01:00:31,531 --> 01:00:35,911 or even stars explicitly, but the star at least has been there because again, 1167 01:00:35,911 --> 01:00:38,401 string is just synonymous now with char star. 1168 01:00:38,401 --> 01:00:42,371 It's not really as readable, but it is now the same idea. 1169 01:00:42,371 --> 01:00:44,911 So I'll leave string in place just to do something week 1170 01:00:44,911 --> 01:00:48,581 1 style here for a moment, and let's go ahead and print out a few characters. 1171 01:00:48,581 --> 01:00:54,031 So I'm going to use %c this time, and I'm going to print out s bracket zero 1172 01:00:54,031 --> 01:00:59,161 and then I'm going to print out s bracket one and s bracket two, 1173 01:00:59,161 --> 01:01:03,091 literally doing week three style from last week-- 1174 01:01:03,091 --> 01:01:07,921 a printing of every character in s as though it were an array. 1175 01:01:07,921 --> 01:01:11,221 So ./address should give me h-i exclamation point. 1176 01:01:11,221 --> 01:01:14,461 And if I really want to get curious, technically speaking, 1177 01:01:14,461 --> 01:01:18,691 I could print out one more location, and let me go ahead and recompile, 1178 01:01:18,691 --> 01:01:24,211 make address ./address and there is, it would seem, the backslash zero. 1179 01:01:24,211 --> 01:01:29,641 I'm not seeing zero because I didn't type literally the zero char in ASCII, 1180 01:01:29,641 --> 01:01:33,331 it's literally eight zero bits which are technically unprintable, 1181 01:01:33,331 --> 01:01:34,961 if you will, in printf speak. 1182 01:01:34,961 --> 01:01:37,351 And so what I'm seeing here is like a blank symbol. 1183 01:01:37,351 --> 01:01:39,541 That just means there is something else there-- 1184 01:01:39,541 --> 01:01:43,801 it's apparently all eight zero bits, but they are there 1185 01:01:43,801 --> 01:01:46,571 even though we're not seeing them literally right now. 1186 01:01:46,571 --> 01:01:49,211 Well, let's go ahead and peel back one of these layers 1187 01:01:49,211 --> 01:01:53,131 and let me go ahead and get rid of the CS50 library and get rid of, 1188 01:01:53,131 --> 01:01:56,551 therefore, the word string because again, henceforth it's just char star. 1189 01:01:56,551 --> 01:01:57,901 Nothing else is different. 1190 01:01:57,901 --> 01:02:00,781 I'm going to now do make address, ./address, 1191 01:02:00,781 --> 01:02:02,251 and it's the same exact thing. 1192 01:02:02,251 --> 01:02:05,621 And now, let's just focus on the hi rather than even worry about that. 1193 01:02:05,621 --> 01:02:10,411 So I'm going to recompile one last time and now I have h-i exclamation point. 1194 01:02:10,411 --> 01:02:15,001 Well, it turns out that the array notation we used last week 1195 01:02:15,001 --> 01:02:17,611 was technically some of this syntactic sugar. 1196 01:02:17,611 --> 01:02:20,821 Sort of a neat way to use syntax in a useful way, 1197 01:02:20,821 --> 01:02:26,431 but we can see more explicitly today what the square brackets for a string 1198 01:02:26,431 --> 01:02:28,061 is actually doing. 1199 01:02:28,061 --> 01:02:29,801 Let me go ahead and do this. 1200 01:02:29,801 --> 01:02:35,041 Let me adventurously say I want to print out not s bracket 1201 01:02:35,041 --> 01:02:40,831 zero, but I want to print out whatever the first character of s is. 1202 01:02:40,831 --> 01:02:43,081 So to be clear, what is s now? 1203 01:02:43,081 --> 01:02:44,431 It's the address of a string. 1204 01:02:44,431 --> 01:02:45,931 OK, but what is s, really? 1205 01:02:45,931 --> 01:02:49,441 s is the address of the first char in a string 1206 01:02:49,441 --> 01:02:52,441 and again, that's sufficient for defining a string because eventually 1207 01:02:52,441 --> 01:02:55,361 the computer will see that there's a backslash n at the end of it. 1208 01:02:55,361 --> 01:03:01,241 So s is specifically the address of the first character in a string. 1209 01:03:01,241 --> 01:03:04,291 So that means, using my new syntax, if I want 1210 01:03:04,291 --> 01:03:07,583 to print out that first character I can print out star 1211 01:03:07,583 --> 01:03:11,473 s, because recall that star is the dereference operator when you don't 1212 01:03:11,473 --> 01:03:13,681 repeat the word char, you don't repeat the word int-- 1213 01:03:13,681 --> 01:03:15,301 you just use the star here. 1214 01:03:15,301 --> 01:03:17,821 That means go to that address. 1215 01:03:17,821 --> 01:03:22,651 Similarly, if I, in my newfound knowledge of how strings work, 1216 01:03:22,651 --> 01:03:26,281 know that the h comes first, then the i right after it, 1217 01:03:26,281 --> 01:03:30,151 then the exclamation point, then the backslash zero, contiguously 1218 01:03:30,151 --> 01:03:33,931 one byte apart, I could start to do some arithmetic. 1219 01:03:33,931 --> 01:03:39,571 I could go to s plus 1 byte and print out the second character, 1220 01:03:39,571 --> 01:03:43,321 and I could print out whatever is at s plus 2-- 1221 01:03:43,321 --> 01:03:46,591 in fact, doing what's generally known as pointer arithmetic. 1222 01:03:46,591 --> 01:03:49,591 Literally treating pointers as the numbers they are-- 1223 01:03:49,591 --> 01:03:52,831 hexadecimal or decimal, doesn't really matter-- it's still just numbers. 1224 01:03:52,831 --> 01:03:55,661 And go ahead and add one byte or two bytes 1225 01:03:55,661 --> 01:03:58,151 to them to start at the beginning of a string 1226 01:03:58,151 --> 01:04:00,831 and just poke around from left to right. 1227 01:04:00,831 --> 01:04:04,901 So this now is equivalent to what we did last week using square bracket 1228 01:04:04,901 --> 01:04:09,671 notation, but now I'm re implementing that same idea with this lower level 1229 01:04:09,671 --> 01:04:13,821 plumbing, understanding ampersand and stars now a little bit more, 1230 01:04:13,821 --> 01:04:16,601 so if I remake this program and do ./address, 1231 01:04:16,601 --> 01:04:19,128 I should still see h-i exclamation point. 1232 01:04:19,128 --> 01:04:21,461 But what I'm really doing is just kind of demonstrating, 1233 01:04:21,461 --> 01:04:24,851 hopefully, my understanding of what really 1234 01:04:24,851 --> 01:04:26,711 is going on in the computer's memory. 1235 01:04:26,711 --> 01:04:29,231 Now, programmers who are maybe trying to show off 1236 01:04:29,231 --> 01:04:30,611 might actually write this syntax. 1237 01:04:30,611 --> 01:04:33,236 I think the more common syntax would be what we did last week-- 1238 01:04:33,236 --> 01:04:34,971 s bracket zero, s bracket one. 1239 01:04:34,971 --> 01:04:35,471 Why? 1240 01:04:35,471 --> 01:04:37,346 It's just a little more readable and we don't 1241 01:04:37,346 --> 01:04:41,531 need to brag about or care about this underlying representation. 1242 01:04:41,531 --> 01:04:44,411 The square brackets last week we're an abstraction, if you will, 1243 01:04:44,411 --> 01:04:46,721 on top of what is lower level math. 1244 01:04:46,721 --> 01:04:49,361 But that's all that's going on underneath the hood. 1245 01:04:49,361 --> 01:04:52,811 We're poking around from byte to byte to byte. 1246 01:04:52,811 --> 01:04:58,221 All right, let me pause here, see if there's any questions on that one. 1247 01:04:58,221 --> 01:05:00,931 Any questions on this? 1248 01:05:00,931 --> 01:05:03,651 Let's do one more then, just to demonstrate that this is not 1249 01:05:03,651 --> 01:05:05,171 even specific to strings. 1250 01:05:05,171 --> 01:05:07,161 Let me go ahead and get rid of all of this 1251 01:05:07,161 --> 01:05:11,541 and let me give myself an array of numbers like I did last week. 1252 01:05:11,541 --> 01:05:13,821 So if I'm going to declare all the numbers 1253 01:05:13,821 --> 01:05:16,521 at once using this funky curly brace notation, 1254 01:05:16,521 --> 01:05:19,971 I can do like 4, 6, 8, 2, 7, 5, 0. 1255 01:05:19,971 --> 01:05:24,051 So seven different numbers inside of an array that's automatically 1256 01:05:24,051 --> 01:05:25,071 initialized like this. 1257 01:05:25,071 --> 01:05:27,131 I don't, strictly speaking, need to say seven. 1258 01:05:27,131 --> 01:05:28,881 The compiler is smart enough to figure out 1259 01:05:28,881 --> 01:05:31,251 how many numbers I put with commas between them, 1260 01:05:31,251 --> 01:05:35,751 and that just gives me an array containing 4, 6, 8, 2, 7, 5, 0. 1261 01:05:35,751 --> 01:05:39,201 So it turns out I can print each of these numbers in the familiar way. 1262 01:05:39,201 --> 01:05:45,021 I can do a printf of %i backslash n, and I can print numbers bracket zero, 1263 01:05:45,021 --> 01:05:49,041 and let me just do some quick copy/paste just to print the first three of these. 1264 01:05:49,041 --> 01:05:53,881 Theoretically, that should print out 4, 6, 8, and so forth. 1265 01:05:53,881 --> 01:05:57,021 But I can do the same sort of manipulation understanding 1266 01:05:57,021 --> 01:05:59,931 what pointers now are, using pointer arithmetic. 1267 01:05:59,931 --> 01:06:03,741 So let me actually unwind this and just go back to one printf, 1268 01:06:03,741 --> 01:06:07,191 and instead of printing numbers bracket zero like I might have last week, 1269 01:06:07,191 --> 01:06:11,361 let me just go and print out whatever is at that address-- 1270 01:06:11,361 --> 01:06:13,431 so asterisk numbers. 1271 01:06:13,431 --> 01:06:15,861 Let me then print out the second digit, which 1272 01:06:15,861 --> 01:06:21,051 is going to be whatever is at numbers plus 1, and then let me do this further 1273 01:06:21,051 --> 01:06:25,021 and do whatever is at numbers plus 2, and if I really want to repeat this, 1274 01:06:25,021 --> 01:06:27,261 let me do it four more times and do what's 1275 01:06:27,261 --> 01:06:31,881 at location three, four, five, and six. 1276 01:06:31,881 --> 01:06:35,631 And that's seven total numbers because I started counting at zero. 1277 01:06:35,631 --> 01:06:37,201 So let me just quickly run this. 1278 01:06:37,201 --> 01:06:39,651 Make address, ./address. 1279 01:06:39,651 --> 01:06:42,381 There are those seven digits being printed. 1280 01:06:42,381 --> 01:06:46,401 But there's something subtle but also useful here. 1281 01:06:46,401 --> 01:06:47,541 Each of these digits-- 1282 01:06:47,541 --> 01:06:49,341 4, 6, 8, 2,7,5, 0-- 1283 01:06:49,341 --> 01:06:49,891 is an int. 1284 01:06:49,891 --> 01:06:50,391 Why? 1285 01:06:50,391 --> 01:06:52,531 Because I made an array of integers. 1286 01:06:52,531 --> 01:06:57,181 But think back-- how big is a typical integer, have we claimed? 1287 01:06:57,181 --> 01:07:02,821 Four bytes, or 32 bits, so it's worth noting that I don't really 1288 01:07:02,821 --> 01:07:04,841 need to worry about that detail. 1289 01:07:04,841 --> 01:07:10,119 Notice that I did not do plus 4, plus 8, plus 12, plus 16, plus 20. 1290 01:07:10,119 --> 01:07:11,911 I, the programmer, strictly speaking, don't 1291 01:07:11,911 --> 01:07:14,191 need to worry about how big the data type is. 1292 01:07:14,191 --> 01:07:16,291 This is the power of pointer arithmetic. 1293 01:07:16,291 --> 01:07:21,931 The compiler is smart enough to know that if you add 1 to this pointer, 1294 01:07:21,931 --> 01:07:26,441 that is the same as saying go one more piece of data-- 1295 01:07:26,441 --> 01:07:27,481 not just one byte-- 1296 01:07:27,481 --> 01:07:29,251 so if it's an int, move four. 1297 01:07:29,251 --> 01:07:30,871 If it's a second int, move eight. 1298 01:07:30,871 --> 01:07:32,601 If it's a third int, move 12. 1299 01:07:32,601 --> 01:07:35,821 Pointer arithmetic handles that annoying arithmetic for you 1300 01:07:35,821 --> 01:07:38,461 so you can just think of this as a number after a number 1301 01:07:38,461 --> 01:07:41,821 after a number that are back to back to back but not one byte apart, 1302 01:07:41,821 --> 01:07:43,171 but four bytes apart. 1303 01:07:43,171 --> 01:07:47,201 Which is only to say plus 1, plus 2, plus 3 works no matter the data type. 1304 01:07:47,201 --> 01:07:47,701 Why? 1305 01:07:47,701 --> 01:07:53,121 Because the compiler knows what type of data you're talking about. 1306 01:07:53,121 --> 01:07:56,511 Now, there's one other detail I should reveal here 1307 01:07:56,511 --> 01:07:58,671 that I've taken for granted. 1308 01:07:58,671 --> 01:08:01,641 In the past I was using double quotes to represent strings, 1309 01:08:01,641 --> 01:08:04,371 and I claim that the compiler's smart enough to realize that oh, 1310 01:08:04,371 --> 01:08:08,911 if I have double quote hi, that means it's an array of h-i exclamation point, 1311 01:08:08,911 --> 01:08:10,431 and then the backslash zero. 1312 01:08:10,431 --> 01:08:12,801 Notice this usefulness. 1313 01:08:12,801 --> 01:08:18,561 It turns out that you can actually treat arrays as though the name of the array 1314 01:08:18,561 --> 01:08:20,781 is itself a pointer, and this is actually 1315 01:08:20,781 --> 01:08:23,151 going to be something useful in upcoming problems 1316 01:08:23,151 --> 01:08:26,721 when we want to pass arrays around in the computer's memory. 1317 01:08:26,721 --> 01:08:30,463 Notice that strictly speaking on line five, there's no pointers going on. 1318 01:08:30,463 --> 01:08:32,421 There's no star, there's no ampersand-- there's 1319 01:08:32,421 --> 01:08:35,661 nothing new there, and yet instantly on line seven 1320 01:08:35,661 --> 01:08:40,491 I'm pretending that it is the address, and this is actually OK. 1321 01:08:40,491 --> 01:08:44,391 It turns out that an array really can be treated 1322 01:08:44,391 --> 01:08:47,881 as the address of the first element in that array. 1323 01:08:47,881 --> 01:08:52,079 The difference is that there's no secret backslash zero anywhere. 1324 01:08:52,079 --> 01:08:53,871 This is just part of the phone number here, 1325 01:08:53,871 --> 01:08:56,691 the ending in zero-- that's not like a special backslash zero. 1326 01:08:56,691 --> 01:08:59,721 So this is something we're going to take advantage of too, before long. 1327 01:08:59,721 --> 01:09:03,441 There's this interrelationship between addresses and arrays 1328 01:09:03,441 --> 01:09:08,121 that just generally allows you to treat one as though it is the other, 1329 01:09:08,121 --> 01:09:10,521 but the math is taken care of for you. 1330 01:09:10,521 --> 01:09:14,961 Are any questions then on this before we start to solve some bigger problems? 1331 01:09:14,961 --> 01:09:16,761 Yeah. 1332 01:09:16,761 --> 01:09:23,784 AUDIENCE: [INAUDIBLE] 1333 01:09:23,784 --> 01:09:24,951 DAVID J. MALAN: Potentially. 1334 01:09:24,951 --> 01:09:28,911 If you go beyond the end of an array, you might get a segmentation fault. 1335 01:09:28,911 --> 01:09:32,181 The problem is that that symptom is sometimes nondeterministic, 1336 01:09:32,181 --> 01:09:35,181 which means that sometimes it will happen, sometimes it won't. 1337 01:09:35,181 --> 01:09:39,141 It often depends on how far off the end of the array you actually go. 1338 01:09:39,141 --> 01:09:41,631 You'll often not induce the segmentation fault 1339 01:09:41,631 --> 01:09:44,421 if you just poke a little too far, but if you go way too far 1340 01:09:44,421 --> 01:09:45,831 it quite likely will. 1341 01:09:45,831 --> 01:09:49,161 But we'll give you a tool today actually for detecting and solving 1342 01:09:49,161 --> 01:09:51,181 exactly that kind of situation. 1343 01:09:51,181 --> 01:09:54,091 So let's go ahead now and do something a little different in code, 1344 01:09:54,091 --> 01:09:56,601 but that actually comes back to that spoiler from earlier. 1345 01:09:56,601 --> 01:10:01,471 Let me go ahead and create a program called compare.c, and in this program 1346 01:10:01,471 --> 01:10:04,641 I'm going to go ahead and allow myself the CS50 library, 1347 01:10:04,641 --> 01:10:08,121 not so much for string but so that I can actually use GetInt still, 1348 01:10:08,121 --> 01:10:12,440 which is way easier than the way we'll see that C normally lets you get input. 1349 01:10:12,440 --> 01:10:15,471 Let me give myself stdio.h, do an int main(void), 1350 01:10:15,471 --> 01:10:18,381 not worrying about command line arguments today, and let me go ahead 1351 01:10:18,381 --> 01:10:22,701 and get an int i using get int, and ask the human for the value of i, 1352 01:10:22,701 --> 01:10:28,461 then let me give myself an int j, ask the user for another int, calling it j, 1353 01:10:28,461 --> 01:10:32,631 and then let me go ahead and kind of naively, but to your point earlier, 1354 01:10:32,631 --> 01:10:36,051 if i equals equals j, then let's go ahead 1355 01:10:36,051 --> 01:10:41,121 and print out something like "same," backslash n, else let's go ahead 1356 01:10:41,121 --> 01:10:44,791 and print out "different" if they are not, in fact, the same. 1357 01:10:44,791 --> 01:10:48,951 So that would seem to be a program that compares the value of two integers. 1358 01:10:48,951 --> 01:10:51,261 All right, so let's go ahead and run make compare-- 1359 01:10:51,261 --> 01:10:53,451 so far so good-- ./compare. 1360 01:10:53,451 --> 01:10:56,991 OK, i will be 50, j will be 50-- 1361 01:10:56,991 --> 01:10:58,041 they're the same. 1362 01:10:58,041 --> 01:10:59,221 Let's do it once more. 1363 01:10:59,221 --> 01:11:02,239 i will be 50, j will be 42. 1364 01:11:02,239 --> 01:11:03,031 They are different. 1365 01:11:03,031 --> 01:11:07,341 So so far, so good in this first version of comparison. 1366 01:11:07,341 --> 01:11:10,411 But as you might see where I'm going with this, 1367 01:11:10,411 --> 01:11:14,151 let's move away from integers and let's actually change these things to char-- 1368 01:11:14,151 --> 01:11:15,301 to strings. 1369 01:11:15,301 --> 01:11:17,901 So I could do string s over here-- 1370 01:11:17,901 --> 01:11:20,481 GetString s over here. 1371 01:11:20,481 --> 01:11:27,351 Then I could do string t over here, and GetString over here, 1372 01:11:27,351 --> 01:11:30,081 asking the user for t this time, here. 1373 01:11:30,081 --> 01:11:31,611 And then I can compare the two. 1374 01:11:31,611 --> 01:11:33,458 If s equals equals t-- 1375 01:11:33,458 --> 01:11:34,791 and this is a common convention. 1376 01:11:34,791 --> 01:11:37,821 If you've used s for string already you can use t for the next one, at least 1377 01:11:37,821 --> 01:11:39,441 for simple demonstrations like this. 1378 01:11:39,441 --> 01:11:42,566 I'm going to compare the two, just like I did for ints, which worked great. 1379 01:11:42,566 --> 01:11:46,521 Make compare-- so far so good-- ./address-- 1380 01:11:46,521 --> 01:11:47,361 oh, sorry. 1381 01:11:47,361 --> 01:11:49,221 Wrong program-- ./compare. 1382 01:11:49,221 --> 01:11:52,431 Let me go ahead and type in something like 1383 01:11:52,431 --> 01:11:57,401 hi, exclamation point and bye, exclamation point, which of course 1384 01:11:57,401 --> 01:11:59,301 should definitely be different. 1385 01:11:59,301 --> 01:12:05,121 Let me run it again with hi, exclamation point and hi, exclamation point. 1386 01:12:05,121 --> 01:12:07,071 Different-- maybe I messed up. 1387 01:12:07,071 --> 01:12:10,181 Let's maybe do it lowercase, maybe that'll fix. 1388 01:12:10,181 --> 01:12:12,501 But no, those two are different. 1389 01:12:12,501 --> 01:12:16,481 So to come back to what I described as a spoiler earlier, what's 1390 01:12:16,481 --> 01:12:20,659 the fundamental issue here, to be clear? 1391 01:12:20,659 --> 01:12:22,701 Why is it saying different even though I'm pretty 1392 01:12:22,701 --> 01:12:24,118 sure I typed the same thing twice. 1393 01:12:24,118 --> 01:12:26,181 Yeah. 1394 01:12:26,181 --> 01:12:29,601 Yeah, this is where it's now useful to know that string has been 1395 01:12:29,601 --> 01:12:33,063 an abstraction-- a training wheel, if you will-- and if we take that away-- 1396 01:12:33,063 --> 01:12:35,271 still use GetString because that's convenient still-- 1397 01:12:35,271 --> 01:12:38,061 but if I change string to be char star, it's 1398 01:12:38,061 --> 01:12:44,301 a little more explicit as to what s and what t are. s is a pointer to a char, 1399 01:12:44,301 --> 01:12:46,761 that is the address of a char. t is a pointer 1400 01:12:46,761 --> 01:12:48,921 to a char, that is the address of a char. 1401 01:12:48,921 --> 01:12:52,071 Specifically, the first character in s and the first character 1402 01:12:52,071 --> 01:12:53,851 in t, respectively. 1403 01:12:53,851 --> 01:12:56,076 So if I'm comparing these two it should stand 1404 01:12:56,076 --> 01:12:57,951 to reason that they're going to be different. 1405 01:12:57,951 --> 01:12:58,451 Why? 1406 01:12:58,451 --> 01:13:02,061 Because s might end up here in memory and t might end up here in memory. 1407 01:13:02,061 --> 01:13:05,181 Each time I call GetString, it is not smart enough or advanced enough 1408 01:13:05,181 --> 01:13:07,171 to know that, wait a minute-- you typed the same thing. 1409 01:13:07,171 --> 01:13:08,691 I'm just going to hand you back the same address. 1410 01:13:08,691 --> 01:13:11,511 That doesn't happen because we did not design GetString that way. 1411 01:13:11,511 --> 01:13:15,141 Each time I call GetString, it returns, apparently, 1412 01:13:15,141 --> 01:13:17,901 a different copy of the string that was typed in. 1413 01:13:17,901 --> 01:13:20,211 A hi over here and a hi over here. 1414 01:13:20,211 --> 01:13:22,791 They might look the same to the human but to the computer 1415 01:13:22,791 --> 01:13:26,691 they are different chunks of memory, and therefore at different addresses. 1416 01:13:26,691 --> 01:13:30,181 And here, too, we can reveal what is GetString returning? 1417 01:13:30,181 --> 01:13:34,161 Well, up until today it was returning a string, so to speak. 1418 01:13:34,161 --> 01:13:35,661 That's not really a thing. 1419 01:13:35,661 --> 01:13:38,001 Technically, what GetString has always been 1420 01:13:38,001 --> 01:13:43,371 doing is returning the address of the first char in a string 1421 01:13:43,371 --> 01:13:47,181 and trusting that we put a backslash zero at the end of whatever the human 1422 01:13:47,181 --> 01:13:51,411 typed in, and that's enough now for printf, for strlen, for you 1423 01:13:51,411 --> 01:13:53,961 to know where a string begins and ends. 1424 01:13:53,961 --> 01:13:57,711 So GetString has actually always returned a pointer. 1425 01:13:57,711 --> 01:14:01,101 It has not returned a quote unquote string per se, 1426 01:14:01,101 --> 01:14:04,401 but there are functions that can solve this comparison for us. 1427 01:14:04,401 --> 01:14:07,501 Recall that I could do something like this. 1428 01:14:07,501 --> 01:14:10,431 I could actually go in here and I could-- 1429 01:14:10,431 --> 01:14:11,641 let's see, where was it? 1430 01:14:11,641 --> 01:14:18,981 So if I include str compare here and use it to pass in two values, s and t, 1431 01:14:18,981 --> 01:14:22,701 let's see now what happens when I make compare. 1432 01:14:22,701 --> 01:14:26,211 Implicitly declaring library function str compare with type int-- 1433 01:14:26,211 --> 01:14:27,321 and well, there's a star. 1434 01:14:27,321 --> 01:14:30,801 So you might have seen this error before and you might have ignored most of it, 1435 01:14:30,801 --> 01:14:35,281 but there's some evidence of stars or pointers going on here. 1436 01:14:35,281 --> 01:14:37,771 It looks like I didn't include the string.h header file, 1437 01:14:37,771 --> 01:14:38,961 so that's an easy fix. 1438 01:14:38,961 --> 01:14:43,551 Include string.h which, despite its name, does not create a data type 1439 01:14:43,551 --> 01:14:46,431 called string, it just has string-related functions in it 1440 01:14:46,431 --> 01:14:47,511 like str compare. 1441 01:14:47,511 --> 01:14:49,161 Let's make compare again. 1442 01:14:49,161 --> 01:14:51,231 Now it compiles, ./compare. 1443 01:14:51,231 --> 01:14:55,011 Now let's type in hi, exclamation point and even the same thing again. 1444 01:14:55,011 --> 01:14:58,641 These are now-- oh, I used it wrong. 1445 01:14:58,641 --> 01:15:00,364 OK, user error. 1446 01:15:00,364 --> 01:15:02,781 That was supposed to be impressive, but it's the opposite. 1447 01:15:02,781 --> 01:15:05,101 What did I do wrong? 1448 01:15:05,101 --> 01:15:06,201 What did I do wrong here? 1449 01:15:06,201 --> 01:15:07,463 Yeah. 1450 01:15:07,463 --> 01:15:08,951 Yeah. 1451 01:15:08,951 --> 01:15:12,258 AUDIENCE: [INAUDIBLE] 1452 01:15:12,258 --> 01:15:14,591 DAVID J. MALAN: Yeah, it returns three different values. 1453 01:15:14,591 --> 01:15:18,371 Zero if they're the same, positive 1 becomes before the other, 1454 01:15:18,371 --> 01:15:20,061 negative if the opposite is true. 1455 01:15:20,061 --> 01:15:23,261 I just forgot that, so like I did last week correctly, 1456 01:15:23,261 --> 01:15:26,741 if I want to compare them for equality per the manual page, 1457 01:15:26,741 --> 01:15:29,421 I should be checking for zero as the return value. 1458 01:15:29,421 --> 01:15:32,591 Now make compare, ./compare, Enter. 1459 01:15:32,591 --> 01:15:35,261 Let's try it one last time-- hi and hi. 1460 01:15:35,261 --> 01:15:36,821 OK now, they're in fact the same. 1461 01:15:36,821 --> 01:15:38,231 And Justin, thank you. 1462 01:15:41,871 --> 01:15:44,751 And indeed, not that it's returning same all the time. 1463 01:15:44,751 --> 01:15:46,971 If I type in hi and then bye, it's indeed 1464 01:15:46,971 --> 01:15:49,261 noticing that difference as well. 1465 01:15:49,261 --> 01:15:53,251 Well, let me go ahead and do one other thing here. 1466 01:15:53,251 --> 01:15:55,501 Let's do one other thing. 1467 01:15:55,501 --> 01:15:59,001 Let me go ahead now and just reveal more pictorially what's going on. 1468 01:15:59,001 --> 01:16:02,331 Let's get rid of the string comparison and let's just print these things out. 1469 01:16:02,331 --> 01:16:06,111 The simple way to print this out would be with %s and again, %s is special-- 1470 01:16:06,111 --> 01:16:07,161 printf knows-- 1471 01:16:07,161 --> 01:16:10,341 taking an address and start there, print every character up 1472 01:16:10,341 --> 01:16:13,741 until the backslash n, so let's just hand it s and do that. 1473 01:16:13,741 --> 01:16:16,911 And then let's do one more, %s,t. 1474 01:16:16,911 --> 01:16:21,751 This is, again, sort of a mix of week 1 and this week 1475 01:16:21,751 --> 01:16:23,571 because I got rid of the word string. 1476 01:16:23,571 --> 01:16:28,711 I'm using char star, but I'm still using printf and %s in the same way. 1477 01:16:28,711 --> 01:16:32,331 Let me go ahead and run compare now, and if I type hi and hi, 1478 01:16:32,331 --> 01:16:34,291 I should see the same thing twice. 1479 01:16:34,291 --> 01:16:37,911 So they look the same, but here now we have the syntax today 1480 01:16:37,911 --> 01:16:40,291 to print out the actual addresses of these things. 1481 01:16:40,291 --> 01:16:44,721 So let me just change the s to a p, because p means don't go to the address 1482 01:16:44,721 --> 01:16:48,651 and print it, it means just print the address as a pointer. 1483 01:16:48,651 --> 01:16:53,421 So make compare, ./compare, and now let's type in hi, and once more, 1484 01:16:53,421 --> 01:16:57,831 and I should see, indeed, two slightly different addresses given 1485 01:16:57,831 --> 01:16:58,641 in hexadecimal. 1486 01:16:58,641 --> 01:17:00,951 One's got a B at the end, one's got an F at the end, 1487 01:17:00,951 --> 01:17:03,481 and they are indeed a few bytes apart. 1488 01:17:03,481 --> 01:17:06,706 So this is just confirming what our suspicions have actually been. 1489 01:17:06,706 --> 01:17:09,081 So what does this mean, perhaps in the computer's memory? 1490 01:17:09,081 --> 01:17:10,581 Well, let's take a look. 1491 01:17:10,581 --> 01:17:14,511 I've zoomed out so I have a little more squares to look at at once. 1492 01:17:14,511 --> 01:17:20,901 Here might be s in memory when I do string s equals, or char star s equals. 1493 01:17:20,901 --> 01:17:24,381 I get a variable that's of size 1, 2, 3, 4, 5, 6, 7, 8, because I 1494 01:17:24,381 --> 01:17:27,951 claimed earlier that on modern systems, pointers are generally eight bytes 1495 01:17:27,951 --> 01:17:30,261 nowadays so they can count even higher. 1496 01:17:30,261 --> 01:17:33,246 And inside of the computer's memory, also, might be hi. 1497 01:17:33,246 --> 01:17:35,871 And I don't know where it ends up so for the sake of discussion 1498 01:17:35,871 --> 01:17:36,801 it ended up down here. 1499 01:17:36,801 --> 01:17:39,761 That's what was free when I ran the program. 1500 01:17:39,761 --> 01:17:41,601 h-i exclamation point, backslash zero. 1501 01:17:41,601 --> 01:17:46,761 Maybe it ended up, for the sake of discussion, at 0x123, 4, 5, and 6. 1502 01:17:46,761 --> 01:17:51,801 So to be clear, what is s storing once the assignment 1503 01:17:51,801 --> 01:17:54,711 operator copies from right to left? 1504 01:17:54,711 --> 01:17:59,331 What is s storing if I advance one more slide? 1505 01:17:59,331 --> 01:18:01,451 Yeah. 1506 01:18:01,451 --> 01:18:05,261 0x123, the presumption being that if a string is 1507 01:18:05,261 --> 01:18:09,236 defined by the address of its first char and that address of its first char 1508 01:18:09,236 --> 01:18:13,691 is 0x123, then that's indeed what should be in the variable s. 1509 01:18:13,691 --> 01:18:16,751 And so technically, that's what's been happening with that assignment 1510 01:18:16,751 --> 01:18:18,251 operator from right to left. 1511 01:18:18,251 --> 01:18:21,401 GetString indeed returns a string, so to speak, 1512 01:18:21,401 --> 01:18:25,241 but more properly it returns the address of a char. 1513 01:18:25,241 --> 01:18:28,721 What's been then copied from right to left using that assignment operator 1514 01:18:28,721 --> 01:18:31,601 all these weeks is indeed that address. 1515 01:18:31,601 --> 01:18:36,101 Now technically, we don't really need to care about where these addresses are. 1516 01:18:36,101 --> 01:18:38,951 It suffices to just think about them referentially, but let's 1517 01:18:38,951 --> 01:18:42,791 first consider where t might be. t is just another variable that I 1518 01:18:42,791 --> 01:18:44,441 created on my second line of code. 1519 01:18:44,441 --> 01:18:46,061 Maybe it ends up there, maybe somewhere else. 1520 01:18:46,061 --> 01:18:48,353 For the sake of discussion I'll draw it left and right. 1521 01:18:48,353 --> 01:18:51,771 Where did the second word end up that I typed in? 1522 01:18:51,771 --> 01:18:57,671 Well, suppose the second copy of hi ended up at 0x456457458459. 1523 01:18:57,671 --> 01:18:58,961 What ended up in t? 1524 01:18:58,961 --> 01:19:00,551 I'll pluck this one off myself. 1525 01:19:00,551 --> 01:19:02,621 0x456, presumably. 1526 01:19:02,621 --> 01:19:06,071 And so this is now a pictorial representation of why, 1527 01:19:06,071 --> 01:19:07,751 and let's abstract away everything else. 1528 01:19:07,751 --> 01:19:13,061 When I compared s against t using equal equals, based on the picture 1529 01:19:13,061 --> 01:19:14,591 they're obviously not the same. 1530 01:19:14,591 --> 01:19:16,751 One is over here, one is over here. 1531 01:19:16,751 --> 01:19:21,281 And per a moment ago, one is 0x123, the other is 0x456. 1532 01:19:21,281 --> 01:19:24,491 Yes, technically they're pointing at something that's the same, 1533 01:19:24,491 --> 01:19:27,971 but that just reveals how str compare works. 1534 01:19:27,971 --> 01:19:30,641 str compare is apparently a function that 1535 01:19:30,641 --> 01:19:33,881 takes in the address of a string as its argument 1536 01:19:33,881 --> 01:19:36,401 and the address of another string as its argument, 1537 01:19:36,401 --> 01:19:41,321 it goes to the first character in each of those strings, respectively, 1538 01:19:41,321 --> 01:19:43,511 and probably has a for loop or a while loop 1539 01:19:43,511 --> 01:19:46,421 and just goes from left to right, comparing, looking 1540 01:19:46,421 --> 01:19:50,141 for the same chars left and right, and if it doesn't notice any differences, 1541 01:19:50,141 --> 01:19:52,121 boom-- it returns zero. 1542 01:19:52,121 --> 01:19:56,481 If it does notice a difference it returns a positive or a negative value. 1543 01:19:56,481 --> 01:20:00,321 And that's very similar, recall, to how we implemented string length ourselves 1544 01:20:00,321 --> 01:20:00,821 last week. 1545 01:20:00,821 --> 01:20:03,731 I used a for loop, I was looking for a backslash zero. 1546 01:20:03,731 --> 01:20:09,521 str compare is probably a little similar in spirit, looping from left to right 1547 01:20:09,521 --> 01:20:13,001 but comparing, this time not just counting. 1548 01:20:13,001 --> 01:20:15,731 Are any questions then, on string comparison 1549 01:20:15,731 --> 01:20:18,821 and why it is that we use str compare and not equals equals? 1550 01:20:18,821 --> 01:20:20,013 Yeah. 1551 01:20:20,013 --> 01:20:22,249 AUDIENCE: Do pointers have addresses? 1552 01:20:22,249 --> 01:20:24,041 DAVID J. MALAN: Do pointers have addresses? 1553 01:20:24,041 --> 01:20:24,541 Yes. 1554 01:20:24,541 --> 01:20:29,291 So we won't do that today, but I could actually use the ampersand operator 1555 01:20:29,291 --> 01:20:30,821 on s or on t. 1556 01:20:30,821 --> 01:20:34,421 That would give me the equivalent of a char star star 1557 01:20:34,421 --> 01:20:36,606 that itself could be stored elsewhere in memory. 1558 01:20:36,606 --> 01:20:37,481 That's where it ends. 1559 01:20:37,481 --> 01:20:39,671 We don't do that recursively forever. 1560 01:20:39,671 --> 01:20:42,611 There's star and there's star star, but yes, that is a thing 1561 01:20:42,611 --> 01:20:45,911 and it's very often useful in the context of two dimensional arrays, 1562 01:20:45,911 --> 01:20:49,181 which we haven't really talked about, but that is a feature of the language, 1563 01:20:49,181 --> 01:20:49,681 too. 1564 01:20:49,681 --> 01:20:50,711 But not today. 1565 01:20:50,711 --> 01:20:52,221 Good question. 1566 01:20:52,221 --> 01:20:55,271 All right, so what might we now do to take things up a notch? 1567 01:20:55,271 --> 01:20:57,791 Well let's go ahead and implement a different program here 1568 01:20:57,791 --> 01:21:01,341 that maybe tries copying some values, just to demonstrate this. 1569 01:21:01,341 --> 01:21:05,081 Let me open up a file called, how about copy.c, 1570 01:21:05,081 --> 01:21:07,511 and I'm going to start off with a few includes. 1571 01:21:07,511 --> 01:21:11,291 So let's include the CS50 library just so we have a way of getting user input. 1572 01:21:11,291 --> 01:21:15,941 Let's include-- how about stdio as always, let's preemptively 1573 01:21:15,941 --> 01:21:18,711 include string.h and maybe one other in a moment. 1574 01:21:18,711 --> 01:21:21,711 Let's do int main(void) as before. 1575 01:21:21,711 --> 01:21:25,241 And then in here, let's get a string from the user and just 1576 01:21:25,241 --> 01:21:27,671 call it s for simplicity. 1577 01:21:27,671 --> 01:21:31,361 And heck, we can actually just call this char star if we want, 1578 01:21:31,361 --> 01:21:33,474 or string, since we're using the RS50 library. 1579 01:21:33,474 --> 01:21:34,641 But we'll come back to that. 1580 01:21:34,641 --> 01:21:38,231 Let's now make a copy of s and do s equals t, 1581 01:21:38,231 --> 01:21:42,891 using a single assignment operator and then let's check something like this. 1582 01:21:42,891 --> 01:21:47,831 Let's go into the first character of t, which is t bracket zero, 1583 01:21:47,831 --> 01:21:50,231 and then let's uppercase it using that function 1584 01:21:50,231 --> 01:21:55,571 that we've used in the past of toupper t bracket zero, semicolon. 1585 01:21:55,571 --> 01:21:57,231 And actually, I should go back up here. 1586 01:21:57,231 --> 01:22:01,468 If I'm using toupper or if you use tolower or isupper or islower-- 1587 01:22:01,468 --> 01:22:04,301 I might not remember this offhand, but it was in another header file 1588 01:22:04,301 --> 01:22:06,161 called C type dot h. 1589 01:22:06,161 --> 01:22:09,291 There was a bunch of helpful functions in that library as well. 1590 01:22:09,291 --> 01:22:14,096 Now at the very last line of the program let's just print out what both s and t 1591 01:22:14,096 --> 01:22:21,521 are by simply printing out %s for each of them, and t is %s also, not %t, 1592 01:22:21,521 --> 01:22:24,681 of course, and let's see what happens here. 1593 01:22:24,681 --> 01:22:26,471 So let me make copy-- 1594 01:22:26,471 --> 01:22:27,881 oh my God, so many mistakes. 1595 01:22:27,881 --> 01:22:29,271 What did I do wrong? 1596 01:22:29,271 --> 01:22:30,221 Oh. 1597 01:22:30,221 --> 01:22:31,301 OK, that was unintended. 1598 01:22:31,301 --> 01:22:34,851 String t equals s, sorry, so I'm creating two variables, 1599 01:22:34,851 --> 01:22:37,781 s and t respectively, and I'm copying s into t. 1600 01:22:37,781 --> 01:22:39,461 Make copy, Enter. 1601 01:22:39,461 --> 01:22:44,651 There we go. ./copy, and let's now type in, for instance, 1602 01:22:44,651 --> 01:22:48,521 how about hi exclamation point in all lowercase this time, 1603 01:22:48,521 --> 01:22:52,091 and now what gets printed? 1604 01:22:52,091 --> 01:22:56,201 I don't think that's what I intended, so to speak, here. 1605 01:22:56,201 --> 01:23:00,021 Because notice that I got s from the user, so that checks out. 1606 01:23:00,021 --> 01:23:03,703 I then copied t into s, which looks correct. 1607 01:23:03,703 --> 01:23:05,411 That's what we always use assignment for. 1608 01:23:05,411 --> 01:23:09,191 Then I uppercase the first letter in t, but not s-- 1609 01:23:09,191 --> 01:23:10,331 at least in my code-- 1610 01:23:10,331 --> 01:23:14,051 then I printed s and t and then noticed, apparently, both s 1611 01:23:14,051 --> 01:23:17,921 and t got capitalized. 1612 01:23:17,921 --> 01:23:20,521 So if you're starting to get a little comfortable with what's 1613 01:23:20,521 --> 01:23:24,421 going on underneath the hood, what's the fundamental problem here? 1614 01:23:24,421 --> 01:23:28,223 Why did both get capitalized? 1615 01:23:28,223 --> 01:23:29,431 Why did both get capitalized? 1616 01:23:29,431 --> 01:23:30,121 Yeah, over here. 1617 01:23:30,121 --> 01:23:32,601 AUDIENCE: Could it be they're referencing the same address? 1618 01:23:32,601 --> 01:23:34,011 DAVID J. MALAN: Yeah, they're representing the same address. 1619 01:23:34,011 --> 01:23:35,871 So C is really literal. 1620 01:23:35,871 --> 01:23:39,261 If you create another variable called t and you assign it the value of s, 1621 01:23:39,261 --> 01:23:41,871 you are literally assigning it the value in s, 1622 01:23:41,871 --> 01:23:44,761 which is 0x123 or something like that. 1623 01:23:44,761 --> 01:23:48,381 And so at that point in the story both s and t presumably 1624 01:23:48,381 --> 01:23:51,951 have a value of 0x123, which means they technically 1625 01:23:51,951 --> 01:23:56,061 point to the same h-i exclamation point in memory. 1626 01:23:56,061 --> 01:24:00,891 Nowhere did I tell the computer to give me a copy of a h-i exclamation point 1627 01:24:00,891 --> 01:24:04,131 per se, I literally said just copy s. 1628 01:24:04,131 --> 01:24:08,391 So here's where an understanding of what s literally is explains the situation. 1629 01:24:08,391 --> 01:24:10,761 I'm only copying the pointers. 1630 01:24:10,761 --> 01:24:12,601 So what actually went on in memory? 1631 01:24:12,601 --> 01:24:14,241 Let's take a look here at this grid. 1632 01:24:14,241 --> 01:24:17,091 If I created s initially, maybe it ends up here. 1633 01:24:17,091 --> 01:24:20,601 And I created hi in lowercase, and it ended up down here. 1634 01:24:20,601 --> 01:24:26,751 Then the address was, again, like 0x123456, 0x123 is what's in s. 1635 01:24:26,751 --> 01:24:29,451 If then I create a second variable called t, 1636 01:24:29,451 --> 01:24:33,681 and I call it a string, a.k.a. char star, maybe it again ends up here. 1637 01:24:33,681 --> 01:24:39,261 But when I copy s into t by doing t equals s semicolon, 1638 01:24:39,261 --> 01:24:44,866 that literally just copies s into t, which puts the value 0x123 there. 1639 01:24:44,866 --> 01:24:47,991 So if we now abstract away all these numbers and just think about a picture 1640 01:24:47,991 --> 01:24:52,371 with arrows, what we've drawn in the computer's memory is this. 1641 01:24:52,371 --> 01:24:56,871 Two different pointers but storing the same address, which means 1642 01:24:56,871 --> 01:24:59,761 the breadcrumbs lead to the same place. 1643 01:24:59,761 --> 01:25:02,841 And so if you follow the t breadcrumb and capitalize the first letter, 1644 01:25:02,841 --> 01:25:06,831 it is functionally the same as copying the-- 1645 01:25:06,831 --> 01:25:12,471 changing the first letter in the version s as well. 1646 01:25:12,471 --> 01:25:17,311 So what's the solution, then, to this kind of problem? 1647 01:25:17,311 --> 01:25:19,381 Even if you have no idea how to do it in code, 1648 01:25:19,381 --> 01:25:21,946 what's the gist of what I really intended, which is, 1649 01:25:21,946 --> 01:25:26,101 I want a genuine copy of s, called t. 1650 01:25:26,101 --> 01:25:30,213 I want a new h-i exclamation point backslash zero. 1651 01:25:30,213 --> 01:25:31,921 What do I need to do to make that happen? 1652 01:25:31,921 --> 01:25:32,888 Thoughts? 1653 01:25:32,888 --> 01:25:35,631 AUDIENCE: I think there's a function called str copy. 1654 01:25:35,631 --> 01:25:38,961 DAVID J. MALAN: So there is a function called str copy, strcpy, 1655 01:25:38,961 --> 01:25:41,511 which is a possible answer to this question. 1656 01:25:41,511 --> 01:25:45,681 The catch with stir copy is that you have to tell it in advance not only 1657 01:25:45,681 --> 01:25:48,231 what the source string is-- the one you want to copy-- 1658 01:25:48,231 --> 01:25:50,961 you also need to pass in the address of a chunk of memory 1659 01:25:50,961 --> 01:25:55,551 into which you can copy the string, and here's one thing we haven't seen yet, 1660 01:25:55,551 --> 01:25:57,951 and we need one more building block today, if you will. 1661 01:25:57,951 --> 01:26:02,361 We haven't yet seen a way to create new chunks of memory 1662 01:26:02,361 --> 01:26:05,281 and then let some other function copy into them. 1663 01:26:05,281 --> 01:26:08,661 And for this, we're going to introduce something called dynamic memory 1664 01:26:08,661 --> 01:26:09,571 allocation. 1665 01:26:09,571 --> 01:26:12,291 And this is the last and most powerful feature perhaps, today, 1666 01:26:12,291 --> 01:26:16,251 whereby we're going to introduce two functions, malloc and free, where 1667 01:26:16,251 --> 01:26:19,491 malloc means memory allocate, which literally does just that. 1668 01:26:19,491 --> 01:26:22,641 It's a function that takes a number as input-- how many bytes of memory 1669 01:26:22,641 --> 01:26:26,034 do you want the operating system to find for you somewhere in that big grid? 1670 01:26:26,034 --> 01:26:27,951 It's going to find it and it's going to return 1671 01:26:27,951 --> 01:26:31,554 to you the address of the first byte of contiguous memory back to back to back, 1672 01:26:31,554 --> 01:26:34,221 and then you can do anything you want with that chunk of memory. 1673 01:26:34,221 --> 01:26:35,751 free is going to do the opposite. 1674 01:26:35,751 --> 01:26:38,571 When you're done using a chunk of memory that malloc has given you, 1675 01:26:38,571 --> 01:26:42,201 you can say free it, and that means you hand it back to the operating system 1676 01:26:42,201 --> 01:26:45,421 and then the operating system can use it for something else later. 1677 01:26:45,421 --> 01:26:48,861 So this is actually evidence of a common problem in programming. 1678 01:26:48,861 --> 01:26:53,311 If your Mac your PC has ever been in the habit of starting to get really, 1679 01:26:53,311 --> 01:26:57,921 really slow, or it's slowing to a crawl-- heck, maybe it even freezes-- 1680 01:26:57,921 --> 01:27:00,921 one of the possible explanations could be 1681 01:27:00,921 --> 01:27:03,801 that the program you're running by Apple or Microsoft 1682 01:27:03,801 --> 01:27:07,041 or whoever, maybe they're using malloc or some equivalent, 1683 01:27:07,041 --> 01:27:08,346 asking the operating system-- 1684 01:27:08,346 --> 01:27:10,221 Mac OS or Windows-- for, give me more memory. 1685 01:27:10,221 --> 01:27:11,001 I need more memory. 1686 01:27:11,001 --> 01:27:12,381 The user is creating more images. 1687 01:27:12,381 --> 01:27:13,821 The user is typing a longer essay. 1688 01:27:13,821 --> 01:27:15,441 Give me more memory, more memory. 1689 01:27:15,441 --> 01:27:20,001 If the program has a bug and never actually frees any of that memory, 1690 01:27:20,001 --> 01:27:22,701 your computer might end up using all of the available memory 1691 01:27:22,701 --> 01:27:26,571 and honestly, humans are not very good at handling corner cases like that. 1692 01:27:26,571 --> 01:27:29,451 Very often programs, computers just freeze at that point 1693 01:27:29,451 --> 01:27:33,591 or get really, really slow because they start trying to be creative 1694 01:27:33,591 --> 01:27:35,751 when there's not enough memory left. 1695 01:27:35,751 --> 01:27:38,361 So one of the reasons for a computer really slowing down 1696 01:27:38,361 --> 01:27:42,634 might be calling for malloc a lot, or some equivalent, but never freeing it. 1697 01:27:42,634 --> 01:27:45,051 Which is to say, you should always use these two functions 1698 01:27:45,051 --> 01:27:48,631 in concert and free memory once you are done with it. 1699 01:27:48,631 --> 01:27:52,761 So let me go ahead and do this in code and solve this problem properly. 1700 01:27:52,761 --> 01:27:54,801 Let me go ahead and do this. 1701 01:27:54,801 --> 01:27:58,491 Before I copy s into t using something like str copy, 1702 01:27:58,491 --> 01:28:01,126 I first need to get a bunch of memory from the computer. 1703 01:28:01,126 --> 01:28:04,251 So to do that, let's make this super clear that we're dealing with pointer, 1704 01:28:04,251 --> 01:28:07,821 so I'm going to change my strings to char stars for both s and t, 1705 01:28:07,821 --> 01:28:10,281 and what I technically am going to store in t 1706 01:28:10,281 --> 01:28:14,331 is the address of an available chunk of memory. 1707 01:28:14,331 --> 01:28:18,531 To do that, I can ask the computer to allocate memory for me, 1708 01:28:18,531 --> 01:28:19,941 and how many bytes. 1709 01:28:19,941 --> 01:28:23,181 If I want to create a copy of h-i exclamation point, 1710 01:28:23,181 --> 01:28:26,501 I need how many bytes? 1711 01:28:26,501 --> 01:28:27,001 Good! 1712 01:28:27,001 --> 01:28:27,631 Four! 1713 01:28:27,631 --> 01:28:31,891 Because I need the h, the i, the exclamation point, and additional space 1714 01:28:31,891 --> 01:28:33,001 for the backslash zero. 1715 01:28:33,001 --> 01:28:35,161 It's up to me to understand that and ask for it. 1716 01:28:35,161 --> 01:28:36,691 It's not going to happen magically. 1717 01:28:36,691 --> 01:28:40,601 Nothing does in C. So I could just naively type four there, 1718 01:28:40,601 --> 01:28:43,501 and that would be correct if I type in h-i exclamation 1719 01:28:43,501 --> 01:28:47,431 point or any other three letter word or phrase, but to do this dynamically 1720 01:28:47,431 --> 01:28:50,761 I should probably do something like strlen of s 1721 01:28:50,761 --> 01:28:54,331 plus 1 for the additional null character. 1722 01:28:54,331 --> 01:28:56,821 Recall that string length does it in the English sense-- 1723 01:28:56,821 --> 01:29:00,991 it returns the length of the string you see, plus 1 also takes into account 1724 01:29:00,991 --> 01:29:03,241 the fact that I'm going to need that backslash n. 1725 01:29:03,241 --> 01:29:05,611 Now let me do this old school style first. 1726 01:29:05,611 --> 01:29:10,351 Let me go ahead and manually copy the string s into t first. 1727 01:29:10,351 --> 01:29:18,211 So for int i equals 0, i is less than the string length of s, i plus plus. 1728 01:29:18,211 --> 01:29:23,161 Then inside my for loop, I'm going to do t bracket i equals s bracket 1729 01:29:23,161 --> 01:29:27,211 i, but actually I want the null character too, 1730 01:29:27,211 --> 01:29:30,001 so I want to do the length of the string plus 1 more, 1731 01:29:30,001 --> 01:29:32,671 and heck, I think I learned an optimization last time. 1732 01:29:32,671 --> 01:29:35,131 If I'm doing this again and again, I could really 1733 01:29:35,131 --> 01:29:40,861 do n equals strlen of s plus 1 and then do i is less than n, 1734 01:29:40,861 --> 01:29:43,361 just as a nice design optimization. 1735 01:29:43,361 --> 01:29:46,531 I think this for loop will actually handle the process, then, 1736 01:29:46,531 --> 01:29:53,341 of copying every character from s into every available byte of memory in t. 1737 01:29:53,341 --> 01:29:56,671 Or I could get rid of all of that and take your suggestion, which 1738 01:29:56,671 --> 01:30:00,841 is to use str copy, which takes as its first argument the destination 1739 01:30:00,841 --> 01:30:03,301 and its second argument the source. 1740 01:30:03,301 --> 01:30:08,281 So copy from right to left in this case, too, that's going to do all of that 1741 01:30:08,281 --> 01:30:11,231 automatically for me as well. 1742 01:30:11,231 --> 01:30:13,421 Now I think I'm good. 1743 01:30:13,421 --> 01:30:15,401 I can now capitalize safely. 1744 01:30:15,401 --> 01:30:19,441 The first character in t, which is now a different chunk of memory 1745 01:30:19,441 --> 01:30:23,441 than s, and then I can print them both out to see that one has not changed 1746 01:30:23,441 --> 01:30:24,451 but the other has. 1747 01:30:24,451 --> 01:30:27,331 So make copy-- all right, what did I do wrong? 1748 01:30:27,331 --> 01:30:30,421 Implicitly declaring library function malloc dot, dot, dot. 1749 01:30:30,421 --> 01:30:33,061 So we've seen this kind of error before. 1750 01:30:33,061 --> 01:30:36,151 What is-- even if you don't know quite how to solve it, 1751 01:30:36,151 --> 01:30:37,681 what's the essence of the solution? 1752 01:30:37,681 --> 01:30:40,711 What do I need to do to fix this kind of problem involving implicitly 1753 01:30:40,711 --> 01:30:43,271 declaring a library function? 1754 01:30:43,271 --> 01:30:44,081 What did I forget? 1755 01:30:44,081 --> 01:30:46,211 Yeah. 1756 01:30:46,211 --> 01:30:47,561 I need to include the library. 1757 01:30:47,561 --> 01:30:51,551 And I could look this up in the manual, or I know it off the top of my head, 1758 01:30:51,551 --> 01:30:52,361 I just forgot it. 1759 01:30:52,361 --> 01:30:54,461 There's another library we'll occasionally 1760 01:30:54,461 --> 01:30:56,561 need now called standard lib-- 1761 01:30:56,561 --> 01:31:00,671 standard library-- that contains malloc and free prototypes 1762 01:31:00,671 --> 01:31:02,021 and some other stuff, too. 1763 01:31:02,021 --> 01:31:05,061 All right, let me just clear this away and do make copy one more time. 1764 01:31:05,061 --> 01:31:10,961 Now I'm good. ./copy, Enter, All right. s, I'm going to type in hi, lowercase. 1765 01:31:10,961 --> 01:31:14,771 t and s now come back as intended. 1766 01:31:14,771 --> 01:31:19,961 s is untouched, it would seem, but t is now capitalized. 1767 01:31:19,961 --> 01:31:23,351 Are any questions, then, on what we just did in code? 1768 01:31:23,351 --> 01:31:25,172 Yeah. 1769 01:31:25,172 --> 01:31:28,581 AUDIENCE: You said that malloc and free go together. 1770 01:31:28,581 --> 01:31:32,093 [INAUDIBLE] 1771 01:31:32,093 --> 01:31:33,051 DAVID J. MALAN: Indeed. 1772 01:31:33,051 --> 01:31:35,093 There's a few improvements I want to make, so let 1773 01:31:35,093 --> 01:31:36,651 me actually do those right now. 1774 01:31:36,651 --> 01:31:39,681 Technically, I should practice what I preached and I should indeed, 1775 01:31:39,681 --> 01:31:42,098 when I'm done with t, free t. 1776 01:31:42,098 --> 01:31:44,181 Fortunately, I don't have to worry about how big t 1777 01:31:44,181 --> 01:31:47,691 was-- the computer remembers how many bytes it gave me and it will go free 1778 01:31:47,691 --> 01:31:49,371 all of them, not just the first. 1779 01:31:49,371 --> 01:31:51,081 I should do free t. 1780 01:31:51,081 --> 01:31:53,751 I don't need to do free s, and I shouldn't, 1781 01:31:53,751 --> 01:31:56,691 because that is handled automatically by the CS50 library. 1782 01:31:56,691 --> 01:31:59,091 s, recall, came from GetString, and we actually 1783 01:31:59,091 --> 01:32:01,469 have some fancy code in place that makes sure 1784 01:32:01,469 --> 01:32:03,261 that at the end of your program's execution 1785 01:32:03,261 --> 01:32:06,321 we free any memory that we allocated so we don't actually 1786 01:32:06,321 --> 01:32:08,256 waste memory like I described earlier. 1787 01:32:08,256 --> 01:32:10,131 But there's actually a couple of other things 1788 01:32:10,131 --> 01:32:12,631 if I really want to be pedantic I should put in here. 1789 01:32:12,631 --> 01:32:16,071 It turns out that sometimes malloc can fail, 1790 01:32:16,071 --> 01:32:18,809 and sometimes malloc doesn't have enough memory available 1791 01:32:18,809 --> 01:32:20,601 because maybe your computer's doing so much 1792 01:32:20,601 --> 01:32:22,701 stuff there's just no more RAM available. 1793 01:32:22,701 --> 01:32:24,981 So technically, I should do something like this-- 1794 01:32:24,981 --> 01:32:29,541 if t equals equals null, with two L's today, 1795 01:32:29,541 --> 01:32:32,751 then I should just return 1 or something to say that there was a problem. 1796 01:32:32,751 --> 01:32:34,626 I should probably print an error message too, 1797 01:32:34,626 --> 01:32:36,301 but for now I'm going to keep it simple. 1798 01:32:36,301 --> 01:32:38,526 I should also probably check this. 1799 01:32:38,526 --> 01:32:40,851 This is a little risky of me. 1800 01:32:40,851 --> 01:32:45,511 If I'm doing t bracket zero, this is assuming that there is a letter there. 1801 01:32:45,511 --> 01:32:48,231 But what if the human just hit Enter at the prompt 1802 01:32:48,231 --> 01:32:51,391 and didn't even type h, let alone h-i exclamation point? 1803 01:32:51,391 --> 01:32:53,631 What if there is no t bracket zero? 1804 01:32:53,631 --> 01:32:59,181 So technically, what I should probably do here is, if the length of t 1805 01:32:59,181 --> 01:33:05,121 is at least greater than zero, then go ahead and safely capitalize 1806 01:33:05,121 --> 01:33:06,441 the first letter of it. 1807 01:33:06,441 --> 01:33:08,731 And then at the very end if all goes well, 1808 01:33:08,731 --> 01:33:12,841 I can return zero, thereby signifying that indeed, this thing was successful. 1809 01:33:12,841 --> 01:33:16,711 So yes, these two functions, malloc and free, should be in concert. 1810 01:33:16,711 --> 01:33:21,651 And so if you call malloc you should call free eventually. 1811 01:33:21,651 --> 01:33:27,256 But you did not call malloc for s, so you should not call free for s. 1812 01:33:27,256 --> 01:33:28,131 Yeah, other question. 1813 01:33:28,131 --> 01:33:29,298 AUDIENCE: Here's a question. 1814 01:33:29,298 --> 01:33:31,579 Why do we do malloc plus 1? 1815 01:33:31,579 --> 01:33:33,371 DAVID J. MALAN: Why did I do malloc plus 1? 1816 01:33:33,371 --> 01:33:36,281 So malloc-- sorry, malloc of string length of s 1817 01:33:36,281 --> 01:33:39,903 plus 1-- the string length is the literal length of the string as a human 1818 01:33:39,903 --> 01:33:41,111 would perceive it in English. 1819 01:33:41,111 --> 01:33:44,111 So h-i exclamation point-- strlen gives me 3, 1820 01:33:44,111 --> 01:33:47,801 but I know now as of last week and this week what a string technically is 1821 01:33:47,801 --> 01:33:49,751 and a string always has an extra byte. 1822 01:33:49,751 --> 01:33:52,301 The onus is on me to understand and apply 1823 01:33:52,301 --> 01:33:57,011 that lesson learned so that I actually give str copy enough room for that 1824 01:33:57,011 --> 01:33:58,631 trailing null character. 1825 01:33:58,631 --> 01:34:04,301 And here's just an annoying thing when we called the backslash zero N-U-L last 1826 01:34:04,301 --> 01:34:08,351 week, it turns out that N-U-L-L is the same idea. 1827 01:34:08,351 --> 01:34:11,531 It's also zero, but it's zero in the context of pointer. 1828 01:34:11,531 --> 01:34:15,761 So long story short, you never really write N-U-L, I've just said it 1829 01:34:15,761 --> 01:34:17,051 and we saw it on the screen. 1830 01:34:17,051 --> 01:34:22,631 You will start writing N-U-L-L when you want to check whether or not a pointer 1831 01:34:22,631 --> 01:34:23,681 is valid or not. 1832 01:34:23,681 --> 01:34:25,091 And what I mean by that is this. 1833 01:34:25,091 --> 01:34:27,971 If malloc fails and there's just not enough memory left inside 1834 01:34:27,971 --> 01:34:31,271 of the computer for you, it's got to return a special value, 1835 01:34:31,271 --> 01:34:35,201 and that special value is N-U-L-L in all capital letters. 1836 01:34:35,201 --> 01:34:36,821 That signifies something went wrong. 1837 01:34:36,821 --> 01:34:41,771 Do not trust that I'm giving you a useful return value. 1838 01:34:41,771 --> 01:34:45,391 Other questions on these copies thus far? 1839 01:34:45,391 --> 01:34:47,530 Yeah, over there. 1840 01:34:47,530 --> 01:34:51,481 AUDIENCE: [INAUDIBLE] 1841 01:34:51,481 --> 01:34:52,731 DAVID J. MALAN: Good question. 1842 01:34:52,731 --> 01:34:54,621 Will str copy not work without malloc? 1843 01:34:54,621 --> 01:34:57,891 You kind of need both in this case because str copy, 1844 01:34:57,891 --> 01:35:01,281 by definition-- if I pull up its manual page-- needs a destination 1845 01:35:01,281 --> 01:35:03,261 to put the copied characters. 1846 01:35:03,261 --> 01:35:06,321 It's not sufficient just to say char star t semicolon. 1847 01:35:06,321 --> 01:35:07,761 That only gives you a pointer. 1848 01:35:07,761 --> 01:35:10,701 But I need another chunk of memory that's 1849 01:35:10,701 --> 01:35:14,811 just as big as h-i exclamation point backslash zero, 1850 01:35:14,811 --> 01:35:17,271 so malloc gives me a whole bunch of memory 1851 01:35:17,271 --> 01:35:21,561 and then str copy fills it with h-i exclamation point backslash zero. 1852 01:35:21,561 --> 01:35:24,021 So again, that's why we're going down to this lower level, 1853 01:35:24,021 --> 01:35:26,063 because once you understand what needs to be done 1854 01:35:26,063 --> 01:35:27,931 you now have the functions to do it. 1855 01:35:27,931 --> 01:35:29,971 So let's actually consider what we just solved. 1856 01:35:29,971 --> 01:35:33,831 So in this next version of the program where I actually introduced malloc, 1857 01:35:33,831 --> 01:35:37,341 t was initialized for the return value of malloc, 1858 01:35:37,341 --> 01:35:39,381 and maybe the memory that I got back was here-- 1859 01:35:39,381 --> 01:35:42,981 0x456457458459. 1860 01:35:42,981 --> 01:35:45,291 I've left it blank initially because nothing 1861 01:35:45,291 --> 01:35:47,001 is put there automatically by malloc. 1862 01:35:47,001 --> 01:35:51,111 I just get a chunk of memory that is now mine to use as I see fit. 1863 01:35:51,111 --> 01:35:56,031 I then assign t to that return value, which points t at the first address. 1864 01:35:56,031 --> 01:35:57,861 Notice there's no backslash zero. 1865 01:35:57,861 --> 01:36:00,741 This is not yet a string it's just a chunk of memory-- 1866 01:36:00,741 --> 01:36:02,871 four bytes-- an array of four bytes. 1867 01:36:02,871 --> 01:36:06,441 What str copy eventually did for me was it copied the h over, 1868 01:36:06,441 --> 01:36:10,671 the i over, the exclamation point over, and the backslash zero. 1869 01:36:10,671 --> 01:36:14,541 And if I didn't want to use str copy or I forgot that it existed, my for loop 1870 01:36:14,541 --> 01:36:18,701 would have done exactly the same thing. 1871 01:36:18,701 --> 01:36:23,818 Are any questions, then, on these examples here. 1872 01:36:23,818 --> 01:36:24,401 Any questions? 1873 01:36:24,401 --> 01:36:26,144 Yeah. 1874 01:36:26,144 --> 01:36:33,131 AUDIENCE: [INAUDIBLE] 1875 01:36:33,131 --> 01:36:34,381 DAVID J. MALAN: Good question. 1876 01:36:34,381 --> 01:36:38,731 After malloc, if I had then still done just t equals s, 1877 01:36:38,731 --> 01:36:41,851 it actually would have recreated the same original problem 1878 01:36:41,851 --> 01:36:45,571 by just copying 0x123 from s into t. 1879 01:36:45,571 --> 01:36:48,751 So then I would have been left with a picture that looked like this a few 1880 01:36:48,751 --> 01:36:52,711 steps ago, I would have-- and I can't quite do it live-- 1881 01:36:52,711 --> 01:36:55,021 this arrow, if I did what you just described, 1882 01:36:55,021 --> 01:36:58,998 would now be pointing over here and so I wouldn't have fundamentally solved 1883 01:36:58,998 --> 01:37:01,081 the problem, I would have just additionally wasted 1884 01:37:01,081 --> 01:37:04,141 four bytes temporarily that I'm not actually using. 1885 01:37:04,141 --> 01:37:05,983 Yeah. 1886 01:37:05,983 --> 01:37:09,781 AUDIENCE: [INAUDIBLE] 1887 01:37:09,781 --> 01:37:10,861 DAVID J. MALAN: You can-- 1888 01:37:10,861 --> 01:37:12,819 do you always use malloc and str copy together? 1889 01:37:12,819 --> 01:37:13,594 Not necessarily. 1890 01:37:13,594 --> 01:37:15,511 These are both solving two different problems. 1891 01:37:15,511 --> 01:37:19,771 malloc's giving me enough memory to make a copy, str copy is doing the copy. 1892 01:37:19,771 --> 01:37:23,581 However, you could actually use an array, if you wanted, of characters, 1893 01:37:23,581 --> 01:37:26,911 and you could use str copy on that, and there's other use cases for str copy. 1894 01:37:26,911 --> 01:37:29,071 But thus far, it's a reasonable mental model 1895 01:37:29,071 --> 01:37:31,291 to have that if you want to copy strings, 1896 01:37:31,291 --> 01:37:34,921 you use malloc and then str copy, or your own homegrown loop. 1897 01:37:34,921 --> 01:37:36,844 Yeah. 1898 01:37:36,844 --> 01:37:47,171 AUDIENCE: [INAUDIBLE] 1899 01:37:47,171 --> 01:37:49,370 DAVID J. MALAN: Say that once more. 1900 01:37:49,370 --> 01:37:54,579 AUDIENCE: [INAUDIBLE] 1901 01:37:54,579 --> 01:37:55,371 DAVID J. MALAN: No. 1902 01:37:55,371 --> 01:37:57,031 It will-- good question. 1903 01:37:57,031 --> 01:38:00,171 If I had a-- 1904 01:38:00,171 --> 01:38:03,441 str copy, per its documentation, will copy the whole string 1905 01:38:03,441 --> 01:38:05,661 plus the null character at the end. 1906 01:38:05,661 --> 01:38:08,121 It just assumes there will be one there. 1907 01:38:08,121 --> 01:38:12,291 It's therefore up to you to pass str copy a long enough chunk of memory 1908 01:38:12,291 --> 01:38:13,281 to have room for that. 1909 01:38:13,281 --> 01:38:15,471 If I only ask malloc for three bytes, that 1910 01:38:15,471 --> 01:38:17,541 could have potentially created a memory problem 1911 01:38:17,541 --> 01:38:20,901 whereby str copy would just still blindly copy one, two, three, 1912 01:38:20,901 --> 01:38:24,441 four bytes, but technically it should have only touched three of those. 1913 01:38:24,441 --> 01:38:27,291 You do not yet have access to the fourth one, or the rights to it, 1914 01:38:27,291 --> 01:38:29,541 because you never asked malloc for it. 1915 01:38:29,541 --> 01:38:31,461 Yeah. 1916 01:38:31,461 --> 01:38:34,461 AUDIENCE: So the number inside malloc would be the number of bytes. 1917 01:38:34,461 --> 01:38:34,821 DAVID J. MALAN: Correct. 1918 01:38:34,821 --> 01:38:36,696 The number inside malloc-- it's one argument. 1919 01:38:36,696 --> 01:38:39,723 It's the number of bytes you want back. 1920 01:38:39,723 --> 01:38:43,041 AUDIENCE: Does that mean you have to remember [INAUDIBLE]?? 1921 01:38:45,798 --> 01:38:48,131 DAVID J. MALAN: Yes, the onus is on you, the programmer, 1922 01:38:48,131 --> 01:38:50,298 to remember or frankly, use a function to figure out 1923 01:38:50,298 --> 01:38:51,821 how many bytes you actually need. 1924 01:38:51,821 --> 01:38:54,671 That's why I did not ultimately type in four manually, 1925 01:38:54,671 --> 01:38:56,441 I used str length plus 1. 1926 01:38:56,441 --> 01:38:59,831 So the plus 1 is necessary if you understand how strings are represented, 1927 01:38:59,831 --> 01:39:02,471 but using strlen means that I can actually 1928 01:39:02,471 --> 01:39:05,651 play around with any types of inputs and it will dynamically 1929 01:39:05,651 --> 01:39:07,541 figure out the length. 1930 01:39:07,541 --> 01:39:09,821 So suffice it to say, there's so many ways 1931 01:39:09,821 --> 01:39:11,931 already where you can start to break programs. 1932 01:39:11,931 --> 01:39:15,386 Let's give you at least one tool for finding mistakes that you might make. 1933 01:39:15,386 --> 01:39:17,261 And indeed, in upcoming problem sets you will 1934 01:39:17,261 --> 01:39:19,361 use this to find bugs in your own code. 1935 01:39:19,361 --> 01:39:22,991 Not just using printf, not just using the built-in debugger, but another tool 1936 01:39:22,991 --> 01:39:24,201 here as well. 1937 01:39:24,201 --> 01:39:27,371 So let me go ahead and deliberately write a program called memory.c 1938 01:39:27,371 --> 01:39:29,511 that has some memory-related errors. 1939 01:39:29,511 --> 01:39:34,901 Let me include stdio.h at the top and let me include stdlib.h at the top 1940 01:39:34,901 --> 01:39:36,551 so I have access to malloc now. 1941 01:39:36,551 --> 01:39:41,171 Let me do int main(void) and then inside of main, let me do this-- 1942 01:39:41,171 --> 01:39:44,351 I want to allocate maybe how about three-- 1943 01:39:44,351 --> 01:39:45,711 space for three integers. 1944 01:39:45,711 --> 01:39:46,211 Why? 1945 01:39:46,211 --> 01:39:48,191 Just for the sake of discussion. 1946 01:39:48,191 --> 01:39:52,721 So I'm going to go ahead and do malloc of three, but I don't want three bytes. 1947 01:39:52,721 --> 01:39:56,008 I want three integers and an integer is four bytes, 1948 01:39:56,008 --> 01:39:57,341 so technically I could do this-- 1949 01:39:57,341 --> 01:40:01,851 3 times 4, or I could do 12 but again, that's making certain assumptions 1950 01:40:01,851 --> 01:40:04,341 and if I run this program on a slightly different computer, 1951 01:40:04,341 --> 01:40:05,861 int might be a different size. 1952 01:40:05,861 --> 01:40:10,321 so the better way to do this would be 3 times whatever the size is of an int. 1953 01:40:10,321 --> 01:40:13,571 And this is just an operator you can use any time if you just want to find out 1954 01:40:13,571 --> 01:40:15,611 on this computer, how big is an int? 1955 01:40:15,611 --> 01:40:18,291 How big is a float, or something else? 1956 01:40:18,291 --> 01:40:20,411 So that's going to give me that many-- 1957 01:40:20,411 --> 01:40:22,811 that much memory for three ints. 1958 01:40:22,811 --> 01:40:24,821 What do I want to assign this to? 1959 01:40:24,821 --> 01:40:27,011 Well, malloc returns an address. 1960 01:40:27,011 --> 01:40:32,291 Pointers are addresses, so I'm going to create a pointer to an int called 1961 01:40:32,291 --> 01:40:34,521 x and assign it the value. 1962 01:40:34,521 --> 01:40:35,741 So what am I doing here? 1963 01:40:35,741 --> 01:40:38,321 This is a little less obvious, but again go back to basics. 1964 01:40:38,321 --> 01:40:43,091 The right hand side here gives me a chunk of memory for three integers. 1965 01:40:43,091 --> 01:40:46,661 malloc returns the address of the first byte of that chunk. 1966 01:40:46,661 --> 01:40:48,791 How do I store the address of anything? 1967 01:40:48,791 --> 01:40:49,691 I need a pointer. 1968 01:40:49,691 --> 01:40:53,561 The syntax for today is type of data, star, 1969 01:40:53,561 --> 01:40:58,631 where the type of data in question is three ints, so I do int star x. 1970 01:40:58,631 --> 01:41:02,531 Again, it's kind of purposeless, only for sort of instructional purposes 1971 01:41:02,531 --> 01:41:07,901 here, but this is equivalent now to having a chunk of memory of size 12 1972 01:41:07,901 --> 01:41:11,351 in total, presumably, so I can technically now do this. 1973 01:41:11,351 --> 01:41:15,491 I can go into maybe the first location and assign it the number 72 1974 01:41:15,491 --> 01:41:16,911 like the other day. 1975 01:41:16,911 --> 01:41:24,701 Second location, the number 73, and the third location, maybe the number 33. 1976 01:41:24,701 --> 01:41:27,551 Now I've deliberately made two mistakes here 1977 01:41:27,551 --> 01:41:30,701 because I'm trying to trip over my newfound understanding, 1978 01:41:30,701 --> 01:41:33,281 or my greenness with understanding pointers. 1979 01:41:33,281 --> 01:41:36,641 One, I didn't remember that I should be treating chunks of memory 1980 01:41:36,641 --> 01:41:37,751 as zero indexed. 1981 01:41:37,751 --> 01:41:41,141 malloc essentially returns an array, if you want to think of it as that. 1982 01:41:41,141 --> 01:41:43,541 An array of three ints, or more technically, 1983 01:41:43,541 --> 01:41:47,381 the address of a chunk of memory that could fit three ints. 1984 01:41:47,381 --> 01:41:50,681 So I can use my square bracket notation, or I could be really cool 1985 01:41:50,681 --> 01:41:53,631 and use pointer arithmetic, but this is a little more user friendly. 1986 01:41:53,631 --> 01:41:55,481 But I have made two mistakes. 1987 01:41:55,481 --> 01:41:59,081 I did not start indexing at zero, so line seven 1988 01:41:59,081 --> 01:42:00,941 should have been x bracket zero. 1989 01:42:00,941 --> 01:42:03,813 Line eight should have been x bracket 1, and then line nine 1990 01:42:03,813 --> 01:42:05,021 should have been x bracket 2. 1991 01:42:05,021 --> 01:42:06,231 So first mistake. 1992 01:42:06,231 --> 01:42:09,161 The second mistake that I've made as a side effect, 1993 01:42:09,161 --> 01:42:12,221 is I'm also touching memory that I shouldn't. 1994 01:42:12,221 --> 01:42:17,171 x bracket 3 would mean go to the fourth int in the chunk of memory 1995 01:42:17,171 --> 01:42:17,981 that came back. 1996 01:42:17,981 --> 01:42:20,501 I only asked for enough memory for three ints, 1997 01:42:20,501 --> 01:42:23,741 not four, so this is what's called a buffer overflow. 1998 01:42:23,741 --> 01:42:26,831 I am accidentally, but deliberately at the moment, 1999 01:42:26,831 --> 01:42:30,951 going beyond the boundaries of this array, this chunk of memory. 2000 01:42:30,951 --> 01:42:33,311 So bad things happen, but not necessarily 2001 01:42:33,311 --> 01:42:34,641 by just running your program. 2002 01:42:34,641 --> 01:42:36,191 Let me go ahead and just try this. 2003 01:42:36,191 --> 01:42:42,011 Make memory, and you'll see here that it compiles OK. ./memory, 2004 01:42:42,011 --> 01:42:44,139 and it actually does not segmentation fault, 2005 01:42:44,139 --> 01:42:46,181 which comes back to that point of nondeterminism. 2006 01:42:46,181 --> 01:42:48,551 Sometimes it does, sometimes it doesn't-- it depends on how bad 2007 01:42:48,551 --> 01:42:49,691 of a mistake you made. 2008 01:42:49,691 --> 01:42:52,858 But there's a program that can spot these kinds of mistakes, 2009 01:42:52,858 --> 01:42:55,691 and I'm going to go ahead and expand my terminal window for a moment 2010 01:42:55,691 --> 01:43:01,151 and I'm going to run not just ./memory, but a program called Valgrind./memory. 2011 01:43:01,151 --> 01:43:04,001 This is a command that comes with a lot of computer systems 2012 01:43:04,001 --> 01:43:07,071 that's designed to find memory-related bugs in code. 2013 01:43:07,071 --> 01:43:09,011 So it's a new tool in your toolkit today, 2014 01:43:09,011 --> 01:43:11,111 and you'll use it with the coming problem sets. 2015 01:43:11,111 --> 01:43:12,311 I'm going to run this now. 2016 01:43:12,311 --> 01:43:14,591 It's output, honestly, it's hideous. 2017 01:43:14,591 --> 01:43:17,981 But there's a few things that will start to jump out 2018 01:43:17,981 --> 01:43:20,381 and will help you with tools and the problems 2019 01:43:20,381 --> 01:43:21,951 sets to see these kinds of things. 2020 01:43:21,951 --> 01:43:23,531 Here's the first mistake. 2021 01:43:23,531 --> 01:43:26,471 Invalid write of size four. 2022 01:43:26,471 --> 01:43:30,461 That's on memory.c line nine, per my highlights. 2023 01:43:30,461 --> 01:43:32,351 So let me go look at line nine. 2024 01:43:32,351 --> 01:43:36,011 In what sense is this an invalid write of size four? 2025 01:43:36,011 --> 01:43:38,591 Well, I'm touching memory that I shouldn't, and I'm 2026 01:43:38,591 --> 01:43:40,061 touching it as though it's an int. 2027 01:43:40,061 --> 01:43:42,551 And an int is four bytes-- size four. 2028 01:43:42,551 --> 01:43:45,831 So again, this takes some practice to get used to, the nomenclature here, 2029 01:43:45,831 --> 01:43:48,771 but this is now a clue for me, the programmer, 2030 01:43:48,771 --> 01:43:52,231 that not only did I screw up, but I screwed up related to memory 2031 01:43:52,231 --> 01:43:54,749 and so this is just a hint, if you will. 2032 01:43:54,749 --> 01:43:57,291 It's not going to necessarily tell you exactly how to fix it, 2033 01:43:57,291 --> 01:44:01,131 you have to wrestle with the semantics, but invalid 2034 01:44:01,131 --> 01:44:02,961 write of size four-- oh, OK. 2035 01:44:02,961 --> 01:44:07,321 So I should not have indexed past the boundary here. 2036 01:44:07,321 --> 01:44:10,021 All right, so I shouldn't have done that. 2037 01:44:10,021 --> 01:44:15,764 So let me go ahead then and change this to zero, one, and two, perhaps, here. 2038 01:44:15,764 --> 01:44:17,931 All right, so let me go ahead and recompile my code. 2039 01:44:17,931 --> 01:44:24,261 Make memory, ./memory, still doesn't seem to be broken but it is technically 2040 01:44:24,261 --> 01:44:24,891 buggy. 2041 01:44:24,891 --> 01:44:31,101 Let me go ahead and run Valgrind again, so Valgrind of ./memory, Enter. 2042 01:44:31,101 --> 01:44:33,321 And now there's fewer scary-- 2043 01:44:33,321 --> 01:44:36,841 less scary output now, but there's still something in there. 2044 01:44:36,841 --> 01:44:40,368 Notice this-- 12 bytes in one blocks-- 2045 01:44:40,368 --> 01:44:42,201 no regard for grammar there-- are definitely 2046 01:44:42,201 --> 01:44:43,971 lost in lost record one of one. 2047 01:44:43,971 --> 01:44:47,611 Super cryptic, but this is hinting at a so-called memory leak. 2048 01:44:47,611 --> 01:44:51,441 The blocks of memory are lost in the sense that I malloc'd them-- 2049 01:44:51,441 --> 01:44:52,881 I asked for them but I never-- 2050 01:44:52,881 --> 01:44:55,071 take a guess-- freed them. 2051 01:44:55,071 --> 01:44:56,008 I have a memory leak. 2052 01:44:56,008 --> 01:44:58,341 And this is the arcane way of saying, you've screwed up. 2053 01:44:58,341 --> 01:44:59,551 You have a memory leak. 2054 01:44:59,551 --> 01:45:01,821 So this is an easy fix, fortunately. 2055 01:45:01,821 --> 01:45:06,211 Once I'm done with this memory I just need to free it at the end. 2056 01:45:06,211 --> 01:45:08,631 So now let me go ahead and rerun make memory, 2057 01:45:08,631 --> 01:45:12,441 it's still runs fine so all the while I might have thought, incorrectly, 2058 01:45:12,441 --> 01:45:13,581 my code is correct. 2059 01:45:13,581 --> 01:45:15,261 But let me run Valgrind one more time. 2060 01:45:15,261 --> 01:45:17,451 Valgrin of ./memory, Enter. 2061 01:45:17,451 --> 01:45:19,341 Now, this is pretty good. 2062 01:45:19,341 --> 01:45:21,531 All heap blocks were freed, whatever that means. 2063 01:45:21,531 --> 01:45:23,371 No leaks are possible. 2064 01:45:23,371 --> 01:45:26,481 And even though it's still a little cryptic, there's no other error here 2065 01:45:26,481 --> 01:45:29,985 and in fact, it's pretty explicit-- error summary, zero errors from zero 2066 01:45:29,985 --> 01:45:31,641 contexts, dot, dot, dot. 2067 01:45:31,641 --> 01:45:34,831 So even though this is one of the most arcane tools we'll use, 2068 01:45:34,831 --> 01:45:37,341 it's also one of the most powerful because it can see things 2069 01:45:37,341 --> 01:45:40,671 that you, the human, might not, and maybe even that the debugger might not. 2070 01:45:40,671 --> 01:45:42,741 It does a much closer reading of your code 2071 01:45:42,741 --> 01:45:48,501 while it's running to figure out exactly what is going on. 2072 01:45:48,501 --> 01:45:50,781 Any questions, then, on this tool? 2073 01:45:50,781 --> 01:45:54,681 And we'll guide you after today with actually using this, too. 2074 01:45:54,681 --> 01:45:57,201 Just helps you find memory-related mistakes 2075 01:45:57,201 --> 01:46:00,021 that you might now be capable of making. 2076 01:46:00,021 --> 01:46:02,181 All right, let's do one other memory-related thing. 2077 01:46:02,181 --> 01:46:04,171 Let me shrink my terminal window here. 2078 01:46:04,171 --> 01:46:07,911 Let me create one other file here called garbage.c. 2079 01:46:07,911 --> 01:46:11,421 It turns out there's a term of ours called garbage values in programming 2080 01:46:11,421 --> 01:46:12,931 that we can reveal as follows. 2081 01:46:12,931 --> 01:46:15,921 Let me include stdio.h, and let me include-- 2082 01:46:15,921 --> 01:46:19,461 how about stdlib.h, and then let me give myself int 2083 01:46:19,461 --> 01:46:22,561 main(void), and then in this relatively short program 2084 01:46:22,561 --> 01:46:25,461 let me give myself three ints using last week's 2085 01:46:25,461 --> 01:46:29,421 notation, just int scores bracket 3 for 3 quiz scores, or whatever. 2086 01:46:29,421 --> 01:46:33,441 Then let me go ahead and do for int i equals zero, i less than 3, 2087 01:46:33,441 --> 01:46:38,691 i plus plus, then let me go ahead and print out, %i backslash n, 2088 01:46:38,691 --> 01:46:40,911 scores bracket i semicolon. 2089 01:46:40,911 --> 01:46:43,491 That's it. 2090 01:46:43,491 --> 01:46:48,781 This code, pretty sure is going to compile and it's going to run, 2091 01:46:48,781 --> 01:46:51,171 but what is my logical bug? 2092 01:46:51,171 --> 01:46:55,701 I've forgotten a step even though the code that's written is not so wrong. 2093 01:46:55,701 --> 01:46:58,431 Yeah? 2094 01:46:58,431 --> 01:47:00,921 Yeah, I didn't provide the scores, so I didn't actually 2095 01:47:00,921 --> 01:47:04,851 initialize the array called scores to have any scores whatsoever. 2096 01:47:04,851 --> 01:47:08,391 What's curious about this, though, is that the computer technically 2097 01:47:08,391 --> 01:47:09,081 doesn't mind. 2098 01:47:09,081 --> 01:47:13,041 Let me go ahead and playfully make garbage, Enter, 2099 01:47:13,041 --> 01:47:15,621 and it's an apt description because what I'm about to see 2100 01:47:15,621 --> 01:47:18,231 are so-called garbage values. 2101 01:47:18,231 --> 01:47:23,061 When you, the programmer, do not initialize your codes variables to have 2102 01:47:23,061 --> 01:47:25,878 values, sometimes, who knows what's going to be there. 2103 01:47:25,878 --> 01:47:27,711 The computer's been doing some other things, 2104 01:47:27,711 --> 01:47:31,161 there's a bit of work that happens even before your code runs in the computer, 2105 01:47:31,161 --> 01:47:34,401 so there might be remnants of past ints, chars, strings, 2106 01:47:34,401 --> 01:47:37,041 floats-- anything else in there and what you're seeing 2107 01:47:37,041 --> 01:47:42,661 is those garbage values, which is to say you should never forget, 2108 01:47:42,661 --> 01:47:45,601 as I just did, to initialize the value of some variable. 2109 01:47:45,601 --> 01:47:47,601 And this is actually pretty dangerous, and there 2110 01:47:47,601 --> 01:47:51,081 have been many examples of software being compromised 2111 01:47:51,081 --> 01:47:54,261 because of one of these issues where a variable wasn't initialized 2112 01:47:54,261 --> 01:47:58,611 and all of a sudden users, maybe people on the internet in the context of web 2113 01:47:58,611 --> 01:48:02,481 applications, could suddenly see the contents of someone else's memory, 2114 01:48:02,481 --> 01:48:03,591 or remnants. 2115 01:48:03,591 --> 01:48:06,051 Maybe someone's password that had been previously typed in 2116 01:48:06,051 --> 01:48:08,031 or some other value like a credit card number 2117 01:48:08,031 --> 01:48:09,591 that had been previously typed in. 2118 01:48:09,591 --> 01:48:11,571 There are different defense mechanisms in place 2119 01:48:11,571 --> 01:48:15,111 to generally make this not so likely, but it's certainly 2120 01:48:15,111 --> 01:48:18,171 very possible, at least in this kind of context, 2121 01:48:18,171 --> 01:48:22,101 to see values that you probably shouldn't because they 2122 01:48:22,101 --> 01:48:25,621 might be remnants from something else that used them. 2123 01:48:25,621 --> 01:48:29,701 So this is to say again, you have this great power now to manipulate memory, 2124 01:48:29,701 --> 01:48:33,021 but also now you have this great hacking ability to poke around 2125 01:48:33,021 --> 01:48:36,441 the contents of memory, and this is exactly what hackers sometimes do when 2126 01:48:36,441 --> 01:48:40,431 trying to find ways to exploit systems. 2127 01:48:40,431 --> 01:48:41,661 Are any questions here? 2128 01:48:44,571 --> 01:48:45,071 No? 2129 01:48:45,071 --> 01:48:47,111 All right, let's go ahead and take a quick five minute break 2130 01:48:47,111 --> 01:48:49,511 and when we come back, we'll build on these final topics. 2131 01:48:49,511 --> 01:48:50,381 See you in five. 2132 01:48:50,381 --> 01:48:51,671 We are back. 2133 01:48:51,671 --> 01:48:55,481 First, just a little programmer humor from XKCD, which hopefully now 2134 01:48:55,481 --> 01:48:57,851 will make a little bit of sense to you. 2135 01:48:57,851 --> 01:49:02,321 And what we'll also do next to take a look at a short two minute video that 2136 01:49:02,321 --> 01:49:05,501 animates with claymation, if you will, from our friends at Stanford, 2137 01:49:05,501 --> 01:49:08,501 exactly what happens now if you have an understanding of what garbage 2138 01:49:08,501 --> 01:49:12,004 values are and how they get there, and what happens then if you misuse them. 2139 01:49:12,004 --> 01:49:14,171 It's one thing just to print them out as I just did, 2140 01:49:14,171 --> 01:49:18,431 it's another if you actually mistake a garbage value for a valid pointer, 2141 01:49:18,431 --> 01:49:21,881 because garbage values are just zeros and ones somewhere-- numbers, that is. 2142 01:49:21,881 --> 01:49:24,761 But if you use that new dereference operator, the star, 2143 01:49:24,761 --> 01:49:29,111 and try to go to a garbage value thinking incorrectly that it's 2144 01:49:29,111 --> 01:49:31,511 a valid pointer, bad things can happen. 2145 01:49:31,511 --> 01:49:36,431 Computers can crash or more familiarly, segmentation faults can happen. 2146 01:49:36,431 --> 01:49:39,401 So allow me to introduce, if we could dim the lights for two minutes, 2147 01:49:39,401 --> 01:49:41,111 our friend Binky from Stanford. 2148 01:49:44,951 --> 01:49:46,541 SPEAKER 1: Hey Binky, wake up. 2149 01:49:46,541 --> 01:49:49,221 It's time for pointer fun. 2150 01:49:49,221 --> 01:49:50,331 BINKY: What's that? 2151 01:49:50,331 --> 01:49:51,921 Learn about pointers? 2152 01:49:51,921 --> 01:49:53,184 Oh, goody! 2153 01:49:53,184 --> 01:49:55,101 SPEAKER 1: Well, to get started, I guess we're 2154 01:49:55,101 --> 01:49:56,721 going to need a couple of pointers. 2155 01:49:56,721 --> 01:50:00,998 BINKY: OK, this code allocates two pointers which can point to integers. 2156 01:50:00,998 --> 01:50:01,581 SPEAKER 1: OK. 2157 01:50:01,581 --> 01:50:05,188 Well, I see the two pointers, but they don't seem to be pointing to anything. 2158 01:50:05,188 --> 01:50:06,021 BINKY: That's right. 2159 01:50:06,021 --> 01:50:08,151 Initially, pointers don't point to anything. 2160 01:50:08,151 --> 01:50:11,181 The things they point to are called pointees, and setting them up 2161 01:50:11,181 --> 01:50:12,174 is a separate step. 2162 01:50:12,174 --> 01:50:13,341 SPEAKER 1: Oh, right, right. 2163 01:50:13,341 --> 01:50:14,031 I knew that. 2164 01:50:14,031 --> 01:50:16,021 The pointees are separate. 2165 01:50:16,021 --> 01:50:18,351 So how do you allocate a pointee? 2166 01:50:18,351 --> 01:50:21,921 BINKY: OK, well this code allocates a new integer pointee, 2167 01:50:21,921 --> 01:50:24,994 and this part sets x to point to it. 2168 01:50:24,994 --> 01:50:26,411 SPEAKER 1: Hey, that looks better. 2169 01:50:26,411 --> 01:50:28,021 So make it do something. 2170 01:50:28,021 --> 01:50:31,411 BINKY: OK, I'll dereference the pointer x to store the number 2171 01:50:31,411 --> 01:50:33,541 42 into its pointee. 2172 01:50:33,541 --> 01:50:37,201 For this trick, I'll need my magic wand of dereferencing. 2173 01:50:37,201 --> 01:50:40,591 SPEAKER 1: Your magic wand of dereferencing? 2174 01:50:40,591 --> 01:50:42,441 That great. 2175 01:50:42,441 --> 01:50:44,151 BINKY: This is what the code looks like. 2176 01:50:44,151 --> 01:50:46,946 I'll just set up the number and-- 2177 01:50:46,946 --> 01:50:47,821 SPEAKER 1: Hey, look. 2178 01:50:47,821 --> 01:50:49,171 There it goes. 2179 01:50:49,171 --> 01:50:54,091 So doing a dereference on x follows the arrow to access its pointee, 2180 01:50:54,091 --> 01:50:56,131 in this case to store 42 in there. 2181 01:50:56,131 --> 01:51:00,751 Hey, try using it to store the number 13 through the other pointer, y. 2182 01:51:00,751 --> 01:51:01,891 BINKY: OK. 2183 01:51:01,891 --> 01:51:06,271 I'll just go over here to y and get the number 13 set up, 2184 01:51:06,271 --> 01:51:10,801 and then take the wand of dereferencing and just-- 2185 01:51:10,801 --> 01:51:11,881 whoa! 2186 01:51:11,881 --> 01:51:14,101 SPEAKER 1: Oh hey, that didn't work. 2187 01:51:14,101 --> 01:51:17,821 Say, Binky, I don't think dereferencing y is a good idea 2188 01:51:17,821 --> 01:51:21,016 because setting up the pointee is a separate step 2189 01:51:21,016 --> 01:51:23,551 and I don't think we ever did it. 2190 01:51:23,551 --> 01:51:24,601 BINKY: Good point. 2191 01:51:24,601 --> 01:51:27,031 SPEAKER 1: Yeah, we allocated the pointer y, 2192 01:51:27,031 --> 01:51:30,271 but we never set it to point to a pointee. 2193 01:51:30,271 --> 01:51:31,439 BINKY: Very observant. 2194 01:51:31,439 --> 01:51:33,481 SPEAKER 1: Hey, you're looking good there, Binky. 2195 01:51:33,481 --> 01:51:36,361 Can you fix it so that y points to the same pointee as x? 2196 01:51:36,361 --> 01:51:39,721 BINKY: Sure, I'll use my magic wand of pointer assignment. 2197 01:51:39,721 --> 01:51:41,971 SPEAKER 1: Is that going to be a problem, like before? 2198 01:51:41,971 --> 01:51:43,861 BINKY: No, this doesn't touch the pointees, 2199 01:51:43,861 --> 01:51:47,491 it just changes one pointer to point to the same thing as another. 2200 01:51:47,491 --> 01:51:48,511 SPEAKER 1: Oh, I see. 2201 01:51:48,511 --> 01:51:51,181 Now y points to the same place as x. 2202 01:51:51,181 --> 01:51:53,071 So wait, now y is fixed. 2203 01:51:53,071 --> 01:51:56,131 It has a pointee so you can try the wand of dereferencing again 2204 01:51:56,131 --> 01:51:58,741 to send the 13 over. 2205 01:51:58,741 --> 01:52:01,073 BINKY: OK, here it goes. 2206 01:52:01,073 --> 01:52:02,281 SPEAKER 1: Hey, look at that. 2207 01:52:02,281 --> 01:52:04,111 Now dereferencing works on y. 2208 01:52:04,111 --> 01:52:08,161 And because the pointers are sharing that one pointee, they both see the 13. 2209 01:52:08,161 --> 01:52:09,301 BINKY: Yeah, sharing. 2210 01:52:09,301 --> 01:52:09,871 Whatever. 2211 01:52:09,871 --> 01:52:11,911 So are we going to switch places now? 2212 01:52:11,911 --> 01:52:13,831 SPEAKER 1: Oh look, we're out of time. 2213 01:52:13,831 --> 01:52:14,951 BINKY: But-- 2214 01:52:14,951 --> 01:52:17,171 That's from our friend Nick Parlante at Stanford. 2215 01:52:17,171 --> 01:52:19,511 So let's consider what Nick did here as Binky. 2216 01:52:19,511 --> 01:52:21,581 So here is all the code together. 2217 01:52:21,581 --> 01:52:25,258 These first couple of lines were not bad, and notice that in Stanford's code 2218 01:52:25,258 --> 01:52:26,591 they move the stars to the left. 2219 01:52:26,591 --> 01:52:27,341 That's fine. 2220 01:52:27,341 --> 01:52:30,251 Again, more conventional might be this syntax here. 2221 01:52:30,251 --> 01:52:31,461 These two lines are fine. 2222 01:52:31,461 --> 01:52:34,781 It's OK to create variables, even pointers, 2223 01:52:34,781 --> 01:52:38,411 and not assign them a value initially so long as you eventually do. 2224 01:52:38,411 --> 01:52:40,931 So we eventually do here, with this line. 2225 01:52:40,931 --> 01:52:43,991 We assign to x the return value of malloc, which 2226 01:52:43,991 --> 01:52:45,821 is presumably the address of something. 2227 01:52:45,821 --> 01:52:49,071 To be fair, we should really be checking for null as well, 2228 01:52:49,071 --> 01:52:50,991 but that's not the biggest problem here. 2229 01:52:50,991 --> 01:52:53,481 The biggest problem is not even this next line, 2230 01:52:53,481 --> 01:52:59,231 which means go to the memory location in x and store the number 42 there. 2231 01:52:59,231 --> 01:53:01,451 That's fine, because again, malloc returns 2232 01:53:01,451 --> 01:53:03,701 the address of some chunk of memory. 2233 01:53:03,701 --> 01:53:05,801 This chunk of memory is big enough for an int. 2234 01:53:05,801 --> 01:53:08,711 x is therefore going to store the address of that chunk that's 2235 01:53:08,711 --> 01:53:09,671 big enough for an int. 2236 01:53:09,671 --> 01:53:13,541 Star x recalls the dereference operator, means go to that address 2237 01:53:13,541 --> 01:53:15,341 and put 42 in it. 2238 01:53:15,341 --> 01:53:18,461 It's like going to the mailbox and putting the number 42 in it 2239 01:53:18,461 --> 01:53:21,371 instead of taking the number 50 out, like we did before. 2240 01:53:21,371 --> 01:53:23,051 But why is this line bad? 2241 01:53:23,051 --> 01:53:26,291 This is where Binky lost his head, so to speak. 2242 01:53:26,291 --> 01:53:27,641 Why is this bad? 2243 01:53:27,641 --> 01:53:28,681 Yeah. 2244 01:53:28,681 --> 01:53:30,681 AUDIENCE: We haven't yet allocated space for it. 2245 01:53:30,681 --> 01:53:31,231 DAVID J. MALAN: Exactly. 2246 01:53:31,231 --> 01:53:33,141 We haven't yet allocated space for y. 2247 01:53:33,141 --> 01:53:36,051 There's no mention of malloc, there's no assignment of y, 2248 01:53:36,051 --> 01:53:37,591 even to that same memory. 2249 01:53:37,591 --> 01:53:40,441 So this would be, go to the address in y, 2250 01:53:40,441 --> 01:53:43,831 but if there is no known address in y, it is a so-called garbage value, 2251 01:53:43,831 --> 01:53:46,761 which means go to some random address that you have no control over, 2252 01:53:46,761 --> 01:53:47,571 and boom-- 2253 01:53:47,571 --> 01:53:52,221 that might cause what we've seen in the past, perhaps as a segmentation fault. 2254 01:53:52,221 --> 01:53:54,111 Now this, fortunately, is the kind of thing 2255 01:53:54,111 --> 01:53:58,041 that if you don't quite have the eye for it yet, Valgrins, that new tool, 2256 01:53:58,041 --> 01:53:59,911 could help you find as well. 2257 01:53:59,911 --> 01:54:03,681 But it's just another example of again, the sort of upside and downside 2258 01:54:03,681 --> 01:54:07,111 of having control now over memory at this level. 2259 01:54:07,111 --> 01:54:07,611 All right. 2260 01:54:07,611 --> 01:54:09,444 Well, let's go ahead and do one other thing. 2261 01:54:09,444 --> 01:54:12,586 Considering from last week that this notion of swapping 2262 01:54:12,586 --> 01:54:14,211 was actually a really common operation. 2263 01:54:14,211 --> 01:54:17,211 We had all of our volunteers come up, we had to swap a lot of things 2264 01:54:17,211 --> 01:54:19,581 during bubble sorts and even selection sort, 2265 01:54:19,581 --> 01:54:21,681 and we just took for granted that the two 2266 01:54:21,681 --> 01:54:23,613 humans would swap themselves just fine. 2267 01:54:23,613 --> 01:54:25,821 But there needs to be code to do that if you actually 2268 01:54:25,821 --> 01:54:29,638 implement bubble sort, selection sort, or anything that involves swapping. 2269 01:54:29,638 --> 01:54:31,221 So let's consider some code like this. 2270 01:54:31,221 --> 01:54:33,291 We'll keep it simple like last week, and where 2271 01:54:33,291 --> 01:54:40,339 we wanted to swap some values like int A and int B, for instance, here. 2272 01:54:40,339 --> 01:54:43,131 Void because I'm not going to return a value, but I have a function 2273 01:54:43,131 --> 01:54:44,031 called swap. 2274 01:54:44,031 --> 01:54:49,341 So here, for instance, might be some code for this. 2275 01:54:49,341 --> 01:54:50,549 But why is it so complicated? 2276 01:54:50,549 --> 01:54:52,133 Here, let's actually take a step back. 2277 01:54:52,133 --> 01:54:53,301 Why don't we do this here. 2278 01:54:53,301 --> 01:54:54,921 I think we have time for one more volunteer. 2279 01:54:54,921 --> 01:54:56,379 Could we get someone to come on up? 2280 01:54:56,379 --> 01:54:58,671 You have to be comfy on camera and you're 2281 01:54:58,671 --> 01:55:01,701 being asked to help with your-- oh, I'll go with the friend, pointing. 2282 01:55:01,701 --> 01:55:05,641 So whoever has their friend doing this here-- 2283 01:55:05,641 --> 01:55:06,621 no? 2284 01:55:06,621 --> 01:55:08,511 Now they're pointing it over here. 2285 01:55:08,511 --> 01:55:10,251 Now, literally an arm is being twisted. 2286 01:55:10,251 --> 01:55:11,751 OK. 2287 01:55:11,751 --> 01:55:12,471 Come on down. 2288 01:55:12,471 --> 01:55:13,341 That backfired. 2289 01:55:18,311 --> 01:55:18,956 Come on over. 2290 01:55:24,481 --> 01:55:26,241 And what is your name? 2291 01:55:26,241 --> 01:55:27,153 AUDIENCE: Marina. 2292 01:55:27,153 --> 01:55:28,111 DAVID J. MALAN: Marina. 2293 01:55:28,111 --> 01:55:29,641 Nice to meet you. 2294 01:55:29,641 --> 01:55:31,718 Who were you trying to volunteer? 2295 01:55:31,718 --> 01:55:32,801 AUDIENCE: My friend Jesse. 2296 01:55:32,801 --> 01:55:33,971 DAVID J. MALAN: OK. 2297 01:55:33,971 --> 01:55:38,291 So here we have for Marina two glasses of liquid, orange and purple, 2298 01:55:38,291 --> 01:55:39,821 just so that they're super obvious. 2299 01:55:39,821 --> 01:55:42,226 And suppose that the problem at hand, like last week, 2300 01:55:42,226 --> 01:55:45,101 it's just to swap two values, as though these two glasses represented 2301 01:55:45,101 --> 01:55:47,111 two people and we want to swap them. 2302 01:55:47,111 --> 01:55:50,501 But let's consider these glasses to be like variables, or location 2303 01:55:50,501 --> 01:55:52,211 in an array, and you know what? 2304 01:55:52,211 --> 01:55:54,681 I'd really like you to swap the values. 2305 01:55:54,681 --> 01:55:58,241 So orange has to go in there, and purple has to go in there. 2306 01:55:58,241 --> 01:55:59,194 How would you do it? 2307 01:55:59,194 --> 01:56:01,361 And we'll see if we can then translate that to code. 2308 01:56:01,361 --> 01:56:03,508 AUDIENCE: [INAUDIBLE] 2309 01:56:03,508 --> 01:56:04,591 DAVID J. MALAN: OK, what-- 2310 01:56:04,591 --> 01:56:06,444 say it a little louder. 2311 01:56:06,444 --> 01:56:07,111 All right, yeah. 2312 01:56:07,111 --> 01:56:09,571 So presumably, you're struggling mentally 2313 01:56:09,571 --> 01:56:12,781 with how you would do this without having an extra cup, so good foresight 2314 01:56:12,781 --> 01:56:13,321 here. 2315 01:56:13,321 --> 01:56:16,191 Let me go ahead and we do have a temporary variable, if you will. 2316 01:56:16,191 --> 01:56:18,691 So if I hand you this, how would you now solve this problem? 2317 01:56:21,181 --> 01:56:22,931 AUDIENCE: I would go like that, but it's-- 2318 01:56:22,931 --> 01:56:23,581 DAVID J. MALAN: No, that's-- 2319 01:56:23,581 --> 01:56:24,371 Oh. 2320 01:56:24,371 --> 01:56:24,871 Well, OK. 2321 01:56:24,871 --> 01:56:27,981 Go do it-- go with your instincts. 2322 01:56:27,981 --> 01:56:29,541 OK. 2323 01:56:29,541 --> 01:56:30,681 Sure, go ahead. 2324 01:56:30,681 --> 01:56:32,811 Go to whatever your instincts are. 2325 01:56:39,201 --> 01:56:41,828 Yeah, so a little-- so strictly speaking, probably 2326 01:56:41,828 --> 01:56:43,911 shouldn't have moved the glasses just because that 2327 01:56:43,911 --> 01:56:45,931 would be like moving the array locations, 2328 01:56:45,931 --> 01:56:48,611 so let's actually do it one more time but the glasses now 2329 01:56:48,611 --> 01:56:50,361 have to go back where they originally are. 2330 01:56:50,361 --> 01:56:55,051 So how would you swap these now, using this temporary variable? 2331 01:56:55,051 --> 01:56:56,476 OK, good. 2332 01:56:56,476 --> 01:56:59,101 Otherwise we'd be completely uprooting the array, for instance, 2333 01:56:59,101 --> 01:57:01,081 by just physically moving it around. 2334 01:57:01,081 --> 01:57:03,571 So you moved the orange into this temporary variable, 2335 01:57:03,571 --> 01:57:05,911 then you copied the purple into where the orange was, 2336 01:57:05,911 --> 01:57:08,281 and now, presumably, excellent. 2337 01:57:08,281 --> 01:57:11,101 The orange is going to end up where the purple once was 2338 01:57:11,101 --> 01:57:13,621 and this temporary variable, it stored up some extra memory. 2339 01:57:13,621 --> 01:57:16,441 It was necessary at the time, but not necessary, ultimately. 2340 01:57:16,441 --> 01:57:22,131 But a round of applause if we could, and thank you for doing that so well. 2341 01:57:22,131 --> 01:57:26,311 So the fact that it instantly occurred to Mariana 2342 01:57:26,311 --> 01:57:29,711 that you need some temporary variable is a perfect translation to code, 2343 01:57:29,711 --> 01:57:32,951 and in fact this code here, that we might glimpse now, 2344 01:57:32,951 --> 01:57:35,038 is reminiscent of exactly that algorithm, 2345 01:57:35,038 --> 01:57:37,871 where A and B, at the end of the day, are the same chunks of memory. 2346 01:57:37,871 --> 01:57:39,881 Just like the second time, the two glasses 2347 01:57:39,881 --> 01:57:42,281 have to kind of stay put, even though we're physically lifting them, 2348 01:57:42,281 --> 01:57:44,031 but they're going back to where they were, 2349 01:57:44,031 --> 01:57:46,031 is kind of like having two values, A and B, 2350 01:57:46,031 --> 01:57:49,091 and you just have a temporary variable into which you copy A, 2351 01:57:49,091 --> 01:57:52,331 then you change A with B, then you go and change 2352 01:57:52,331 --> 01:57:55,271 B with whatever the original value of A was, 2353 01:57:55,271 --> 01:57:59,921 because you temporarily stored it in this temporary variable, tmp. 2354 01:57:59,921 --> 01:58:04,161 Unfortunately, this code doesn't necessarily work as intended. 2355 01:58:04,161 --> 01:58:07,391 So let me go over to my VS Code here and open up 2356 01:58:07,391 --> 01:58:10,661 a program called swap.c, and in swap.c, let 2357 01:58:10,661 --> 01:58:15,641 me whip up something really quickly here with, how about include stdio.h, 2358 01:58:15,641 --> 01:58:17,561 int main(void). 2359 01:58:17,561 --> 01:58:22,751 Inside of main let me do something like x gets 1 and y gets 2. 2360 01:58:22,751 --> 01:58:27,881 Let me just print out as a visual confirmation that x is %i, 2361 01:58:27,881 --> 01:58:32,891 y is %i backslash n, plugging in x and y, respectively. 2362 01:58:32,891 --> 01:58:36,071 Then let me call a swap function that we'll invent in just a moment. 2363 01:58:36,071 --> 01:58:42,761 Swap x and y And then let me print out again x is %i, y is %i backslash n, 2364 01:58:42,761 --> 01:58:46,331 just to print out again what they are, because presumably I should see 1, 2365 01:58:46,331 --> 01:58:49,494 2 first, then 2, 1 the second time. 2366 01:58:49,494 --> 01:58:51,161 Now how is swap going to be implemented? 2367 01:58:51,161 --> 01:58:54,591 Let me implement it exactly as on the screen a moment ago. 2368 01:58:54,591 --> 01:58:57,011 So void swap int x-- 2369 01:58:57,011 --> 01:58:59,501 or let's call it int A for consistency, int B. 2370 01:58:59,501 --> 01:59:01,661 But I could always call those anything I want. 2371 01:59:01,661 --> 01:59:05,891 Int tmp gets A, A gets B, B gets tmp. 2372 01:59:05,891 --> 01:59:08,981 So exactly as I proposed a moment ago, and exactly 2373 01:59:08,981 --> 01:59:12,761 as Mariana really implemented it using these glasses of water. 2374 01:59:12,761 --> 01:59:16,571 I need to now include my prototype, as always, so nothing new there. 2375 01:59:16,571 --> 01:59:20,261 And I'll just copy/paste that up here, and now let's go ahead and run this. 2376 01:59:20,261 --> 01:59:23,471 So make swap-- so far, so good-- swap-- 2377 01:59:23,471 --> 01:59:28,331 x is now 1, y is 2, x is 1, y is 2. 2378 01:59:28,331 --> 01:59:34,091 So there seems to be a bit of a bug here, but why might this be? 2379 01:59:34,091 --> 01:59:37,931 This code does not in fact work, even though it obviously works in reality. 2380 01:59:37,931 --> 01:59:39,725 Yeah? 2381 01:59:39,725 --> 01:59:46,239 AUDIENCE: Because A and B have different addresses than x and y [INAUDIBLE].. 2382 01:59:46,239 --> 01:59:48,031 DAVID J. MALAN: Good, and let me summarize. 2383 01:59:48,031 --> 01:59:51,361 A and B do indeed have different addresses of x and y, 2384 01:59:51,361 --> 01:59:54,961 and in fact what happens when you call a function like this on line 11, 2385 01:59:54,961 --> 01:59:59,221 calling swap, passing in x and y, you are calling a function 2386 01:59:59,221 --> 02:00:00,851 by value, so to speak. 2387 02:00:00,851 --> 02:00:02,611 And this is a term of art that just means 2388 02:00:02,611 --> 02:00:07,321 you are passing in copies of x and y, respectively, and calling them 2389 02:00:07,321 --> 02:00:11,551 A and B in the context of this function, but they're indeed copies. 2390 02:00:11,551 --> 02:00:15,451 Now technically, these names are local only. 2391 02:00:15,451 --> 02:00:18,211 I could have called this x, I could have called this y, 2392 02:00:18,211 --> 02:00:22,531 I could have changed this to x, this to y, this to x, and this to y. 2393 02:00:22,531 --> 02:00:24,031 The problem would still remain. 2394 02:00:24,031 --> 02:00:27,961 Just because you use the same names in one function as you do elsewhere, 2395 02:00:27,961 --> 02:00:29,551 that doesn't mean they're the same. 2396 02:00:29,551 --> 02:00:31,121 They just look the same to you. 2397 02:00:31,121 --> 02:00:35,821 But indeed, swap is going to get copies of this x and y, and in this context, 2398 02:00:35,821 --> 02:00:38,461 this scope, so to speak-- 2399 02:00:38,461 --> 02:00:40,801 x and y will be copies of the original. 2400 02:00:40,801 --> 02:00:43,141 So for clarity, let me revert this back to A and B 2401 02:00:43,141 --> 02:00:46,951 just to make super clear that they're indeed different, albeit copies, 2402 02:00:46,951 --> 02:00:48,901 but there's indeed a problem there. 2403 02:00:48,901 --> 02:00:51,041 This function actually works fine. 2404 02:00:51,041 --> 02:00:52,361 In fact, notice this. 2405 02:00:52,361 --> 02:00:56,921 Let me go ahead and print out inside of this. printf A is %i, 2406 02:00:56,921 --> 02:01:00,991 B is %i backslash n, and then I'll print A and B. 2407 02:01:00,991 --> 02:01:04,201 And let me do that same thing at the beginning of this function before it 2408 02:01:04,201 --> 02:01:05,381 does any work. 2409 02:01:05,381 --> 02:01:06,751 Let me go ahead and rerun. 2410 02:01:06,751 --> 02:01:10,741 Make swap, ./swap, and this is promising. 2411 02:01:10,741 --> 02:01:17,371 Initially, x is 1, y is 2, A is 1, B is 2, A is 2, B is 1, 2412 02:01:17,371 --> 02:01:19,598 but then nope-- x is 1, y is 2. 2413 02:01:19,598 --> 02:01:21,931 So if anything, I've confirmed that the logic is right-- 2414 02:01:21,931 --> 02:01:25,051 Mariana's logic is right, but there's something about C. 2415 02:01:25,051 --> 02:01:28,921 There's something about using one function versus another that's actually 2416 02:01:28,921 --> 02:01:30,671 creating a problem here. 2417 02:01:30,671 --> 02:01:35,021 The fact that I'm passing in copies of these values is creating this problem. 2418 02:01:35,021 --> 02:01:36,391 So what in fact is going on? 2419 02:01:36,391 --> 02:01:39,211 Well again, inside of your computer's memory there is these little chips, 2420 02:01:39,211 --> 02:01:41,086 and we've been talking about them abstractly, 2421 02:01:41,086 --> 02:01:43,141 it's just this grid of memory locations. 2422 02:01:43,141 --> 02:01:46,343 It turns out that your computer uses this memory 2423 02:01:46,343 --> 02:01:47,551 in a pretty conventional way. 2424 02:01:47,551 --> 02:01:51,631 It's not just random, where it just puts stuff wherever is available, 2425 02:01:51,631 --> 02:01:55,591 it actually uses different parts of the memory for different purposes. 2426 02:01:55,591 --> 02:01:58,981 And you have control over a lot of it, but the computer uses some of it 2427 02:01:58,981 --> 02:01:59,823 for itself. 2428 02:01:59,823 --> 02:02:01,531 And let's go ahead and zoom out from this 2429 02:02:01,531 --> 02:02:05,581 and consider that within your computer's memory, what a computer will typically 2430 02:02:05,581 --> 02:02:09,001 do is actually store initially, all of the zeros and ones 2431 02:02:09,001 --> 02:02:13,001 that you compiled in the top of your computer's memory, so to speak. 2432 02:02:13,001 --> 02:02:16,231 So when you compile a program and then you run it with ./whatever, 2433 02:02:16,231 --> 02:02:19,651 or on a Mac or PC you double click on it, the computer first-- 2434 02:02:19,651 --> 02:02:24,781 the operating system first-- loads all of your program zeros and ones, a.k.a. 2435 02:02:24,781 --> 02:02:29,371 Machine code, into just one big chunk of memory at the top, so to speak. 2436 02:02:29,371 --> 02:02:33,301 Below that it stores global variables-- any variables 2437 02:02:33,301 --> 02:02:37,183 you have created in your program that are outside of main and outside 2438 02:02:37,183 --> 02:02:37,891 of any functions. 2439 02:02:37,891 --> 02:02:39,691 Generally, the top of your file. 2440 02:02:39,691 --> 02:02:41,634 Globals tend to go at the top there. 2441 02:02:41,634 --> 02:02:44,551 Then there's this chunk of memory that's generally known as the heap-- 2442 02:02:44,551 --> 02:02:46,951 and we saw that word briefly in Valgin's output, 2443 02:02:46,951 --> 02:02:50,581 and then there's this other chunk of memory called the stack. 2444 02:02:50,581 --> 02:02:55,711 And it turns out that up until this week you were using the stack heavily. 2445 02:02:55,711 --> 02:03:00,961 Any time you use local variables in a function they end up on the stack. 2446 02:03:00,961 --> 02:03:04,681 Any time you use malloc, that memory ends up on the heap. 2447 02:03:04,681 --> 02:03:06,751 Now as the arrow suggests, this actually looks 2448 02:03:06,751 --> 02:03:09,834 like a problem waiting to happen because if you use more and more and more 2449 02:03:09,834 --> 02:03:11,671 heap, and more and more and more stack, it's 2450 02:03:11,671 --> 02:03:14,401 like two things barreling down the tracks at one another-- this does not 2451 02:03:14,401 --> 02:03:14,891 end well. 2452 02:03:14,891 --> 02:03:16,141 And that's actually a problem. 2453 02:03:16,141 --> 02:03:19,481 If you've ever heard the phrase stack overflow, or use the website, 2454 02:03:19,481 --> 02:03:21,271 this is the origin of its name. 2455 02:03:21,271 --> 02:03:23,521 When you start to use more and more and more 2456 02:03:23,521 --> 02:03:25,801 memory by calling lots and lots of functions 2457 02:03:25,801 --> 02:03:28,261 or using lots and lots of local variables, 2458 02:03:28,261 --> 02:03:30,511 you use a lot of this stack memory. 2459 02:03:30,511 --> 02:03:33,961 Or if you use malloc a lot and keep calling malloc, malloc, malloc, 2460 02:03:33,961 --> 02:03:37,681 and never really, or rarely calling free, you just use more and more memory 2461 02:03:37,681 --> 02:03:41,521 and eventually these two things might overflow each other, at which point 2462 02:03:41,521 --> 02:03:42,571 you're just out of luck. 2463 02:03:42,571 --> 02:03:45,191 The program will crash or something bad will happen. 2464 02:03:45,191 --> 02:03:47,971 So the onus is on you just to don't do that. 2465 02:03:47,971 --> 02:03:50,221 But this is the design, generally, of what's 2466 02:03:50,221 --> 02:03:52,111 going on inside of your computer's memory. 2467 02:03:52,111 --> 02:03:55,711 Now within that memory, though, there are certain conventions 2468 02:03:55,711 --> 02:03:57,571 focusing on here, the stack. 2469 02:03:57,571 --> 02:04:00,031 And in fact, let me go over here with a marker 2470 02:04:00,031 --> 02:04:03,521 and say that this represents the bottom of my memory, ultimately. 2471 02:04:03,521 --> 02:04:07,801 And so here we have a whole bunch of wooden blocks and each of these squares 2472 02:04:07,801 --> 02:04:10,091 represents a byte of memory and this, for instance, 2473 02:04:10,091 --> 02:04:12,781 might represent four bytes altogether-- good enough for an int, 2474 02:04:12,781 --> 02:04:14,111 or something like that. 2475 02:04:14,111 --> 02:04:18,451 So in my original code that I wrote earlier, that is in fact, buggy, 2476 02:04:18,451 --> 02:04:20,851 what is in fact going on inside the swap function? 2477 02:04:20,851 --> 02:04:24,901 We can visualize it like this-- when you run ./swap or any program for that 2478 02:04:24,901 --> 02:04:28,501 matter, main is the first function to get called with a C program, 2479 02:04:28,501 --> 02:04:32,011 and so I'm just going to label this bottom row of memory as main. 2480 02:04:32,011 --> 02:04:36,381 And what were the two variables I had in main called in this code? 2481 02:04:36,381 --> 02:04:37,631 Yeah. 2482 02:04:37,631 --> 02:04:38,201 x and y. 2483 02:04:38,201 --> 02:04:40,401 And each of those was an int, so that's four bytes, 2484 02:04:40,401 --> 02:04:43,121 so it's deliberate that I reserved four-- 2485 02:04:43,121 --> 02:04:45,951 a chunk of wood here that's four bytes. 2486 02:04:45,951 --> 02:04:49,901 So let me just call this x, and I'm just going to write the number 1 in this box 2487 02:04:49,901 --> 02:04:50,411 here. 2488 02:04:50,411 --> 02:04:54,431 And then I had my other variable y, and I'm going to put the number 2 there. 2489 02:04:54,431 --> 02:04:58,641 What happens when main calls swap like it does in this code here? 2490 02:04:58,641 --> 02:05:04,931 Well, it has two variables of its own, A and B, and A initially is 1 2491 02:05:04,931 --> 02:05:09,341 and B is initially 2, but it has a third variable, tmp, 2492 02:05:09,341 --> 02:05:12,371 which is a local variable in addition to the arguments A and B 2493 02:05:12,371 --> 02:05:16,931 that are passed in, so I'm going to call this tmp, tmp over here. 2494 02:05:16,931 --> 02:05:18,156 And what is the value of tmp? 2495 02:05:18,156 --> 02:05:19,781 Well, we have to look back at the code. 2496 02:05:19,781 --> 02:05:24,431 tmp initially gets the value of A. All right, the value of a was 1, 2497 02:05:24,431 --> 02:05:26,141 so tmp initially gets 1. 2498 02:05:26,141 --> 02:05:28,601 That's step one in my three line program. 2499 02:05:28,601 --> 02:05:32,621 OK, A equals B. So that is assigned from the right to the left of the B 2500 02:05:32,621 --> 02:05:36,251 into the A So B is 2, A is this, so let me go ahead 2501 02:05:36,251 --> 02:05:38,361 and erase this and just overwrite that. 2502 02:05:38,361 --> 02:05:41,891 So at this moment in the story you have two copies of two, 2503 02:05:41,891 --> 02:05:44,711 so that's OK though, because the third line of code 2504 02:05:44,711 --> 02:05:47,741 says tmp gets copied into B. So what's tmp-- 2505 02:05:47,741 --> 02:05:53,171 1, gets copied into B, so let me overwrite this 2 with a 1, 2506 02:05:53,171 --> 02:05:54,821 and now what happens? 2507 02:05:54,821 --> 02:05:57,941 Now unfortunately, the code ends. 2508 02:05:57,941 --> 02:06:01,511 swap doesn't actually do anything with the result, and the problem in C 2509 02:06:01,511 --> 02:06:03,521 is that I could have had a return value. 2510 02:06:03,521 --> 02:06:05,741 I could go in there and change void to int, 2511 02:06:05,741 --> 02:06:07,511 but which one am I going to return? 2512 02:06:07,511 --> 02:06:09,221 The A or the B? 2513 02:06:09,221 --> 02:06:11,631 The whole goal is to swap two values, and it 2514 02:06:11,631 --> 02:06:13,631 seems kind of lame if you can't write a function 2515 02:06:13,631 --> 02:06:16,661 to do something as common per last week sorting algorithms 2516 02:06:16,661 --> 02:06:18,191 as swapping two values. 2517 02:06:18,191 --> 02:06:19,541 But what really happens? 2518 02:06:19,541 --> 02:06:22,751 Well, even though when this program starts running, 2519 02:06:22,751 --> 02:06:25,991 main is using this chunk of memory at the bottom in the so-called stack, 2520 02:06:25,991 --> 02:06:28,661 and the stack is just like a cafeteria stack of trays-- 2521 02:06:28,661 --> 02:06:30,201 it grows up, like this. 2522 02:06:30,201 --> 02:06:32,291 Here's main's memory on the stack. 2523 02:06:32,291 --> 02:06:34,571 Here's the swap function's memory on the stack. 2524 02:06:34,571 --> 02:06:37,241 It's using three ints instead of two-- 2525 02:06:37,241 --> 02:06:38,951 instead of only two. 2526 02:06:38,951 --> 02:06:42,461 What happens when the function returns, whether it's void or not? 2527 02:06:42,461 --> 02:06:45,701 The sort of recollection that this is swap's memory goes away 2528 02:06:45,701 --> 02:06:47,291 and garbage values are left. 2529 02:06:47,291 --> 02:06:51,531 So, adorably, we get rid of these values here, 2530 02:06:51,531 --> 02:06:55,991 and there's still data there-- technically, the numbers 1, 1, and 2 2531 02:06:55,991 --> 02:06:59,591 are still there in the computer's memory but they no longer belong to us 2532 02:06:59,591 --> 02:07:01,341 because the function has now returned. 2533 02:07:01,341 --> 02:07:04,421 So they're still in there and this is kind of an example visually 2534 02:07:04,421 --> 02:07:07,781 of why there's other stuff in memory even though you didn't put it there, 2535 02:07:07,781 --> 02:07:08,621 necessarily. 2536 02:07:08,621 --> 02:07:11,071 Sometimes you did put it there, but now once 2537 02:07:11,071 --> 02:07:14,711 swap returns you only should be touching memory inside of main. 2538 02:07:14,711 --> 02:07:19,001 But we've never actually copied one value into main. 2539 02:07:19,001 --> 02:07:22,661 We haven't returned anything and we haven't solved this fundamentally. 2540 02:07:22,661 --> 02:07:24,291 So how could we do this? 2541 02:07:24,291 --> 02:07:28,301 Well, what if we instead passed into swap not copies of x and y, 2542 02:07:28,301 --> 02:07:32,681 calling them A and B. What if they passed in breadcrumbs to x and y, 2543 02:07:32,681 --> 02:07:35,861 sort of a treasure map that will lead swap to the actual x 2544 02:07:35,861 --> 02:07:37,241 and to the actual y? 2545 02:07:37,241 --> 02:07:41,051 Today we have that capability using pointers. 2546 02:07:41,051 --> 02:07:44,921 So suppose that we use this code instead. 2547 02:07:44,921 --> 02:07:47,831 There's a lot of stars going on here, which is a bit annoying, 2548 02:07:47,831 --> 02:07:50,501 but let's consider what it is we're trying to achieve. 2549 02:07:50,501 --> 02:07:55,391 What if we pass in not x and y, but the address of x and the address of y, 2550 02:07:55,391 --> 02:07:57,501 respectively-- breadcrumbs, if you will-- 2551 02:07:57,501 --> 02:08:00,521 that will lead swap to the original values. 2552 02:08:00,521 --> 02:08:04,331 Then what we do is we still give ourselves a tmp variable, 2553 02:08:04,331 --> 02:08:05,351 like an empty glass. 2554 02:08:05,351 --> 02:08:07,691 It's still a glass, so we still call it an int, 2555 02:08:07,691 --> 02:08:10,071 but what do we want to put into that temporary variable? 2556 02:08:10,071 --> 02:08:12,654 We don't want to put A into it, because that's an address now. 2557 02:08:12,654 --> 02:08:15,371 We want to go to that address per the star 2558 02:08:15,371 --> 02:08:17,141 and put whatever's at that address. 2559 02:08:17,141 --> 02:08:18,381 What do we then want to do? 2560 02:08:18,381 --> 02:08:22,121 Well, we want to then copy into whatever's at location A, 2561 02:08:22,121 --> 02:08:24,911 we want to copy over to location A's contents 2562 02:08:24,911 --> 02:08:29,111 whatever is at location B's contents and then lastly, we 2563 02:08:29,111 --> 02:08:32,261 want to copy tmp into whatever's at location B. 2564 02:08:32,261 --> 02:08:36,149 So again, we're very deliberately introducing all of these stars 2565 02:08:36,149 --> 02:08:38,441 because we don't want to change any of these addresses, 2566 02:08:38,441 --> 02:08:41,861 we want to go to these addresses per the reference operator 2567 02:08:41,861 --> 02:08:46,221 and put values there, or get values from. 2568 02:08:46,221 --> 02:08:47,691 So what does this actually mean? 2569 02:08:47,691 --> 02:08:52,001 Well, if I kind of rewind in this story and I go back here, I still have tmp, 2570 02:08:52,001 --> 02:08:57,671 although I'm going to delete its value to begin with, I still have B 2571 02:08:57,671 --> 02:09:01,121 and I still have A, but what's going to be different 2572 02:09:01,121 --> 02:09:05,051 this time is how I use A and B. So let me finish erasing those. 2573 02:09:05,051 --> 02:09:07,181 That's A on the left, this is B on the right. 2574 02:09:07,181 --> 02:09:09,701 At this point in the story, we're rerunning swap 2575 02:09:09,701 --> 02:09:13,151 with this new and improved version, and let's see what happens. 2576 02:09:13,151 --> 02:09:16,871 Well, x is presumably at some address. 2577 02:09:16,871 --> 02:09:20,351 Maybe it's like 0x123, as always. 2578 02:09:20,351 --> 02:09:23,471 What then does A get when I'm using this code? 2579 02:09:23,471 --> 02:09:27,131 The value of A is 0x123. 2580 02:09:27,131 --> 02:09:28,391 What is the value of B? 2581 02:09:28,391 --> 02:09:31,661 Maybe y is that 0x456. 2582 02:09:31,661 --> 02:09:32,651 What goes in B? 2583 02:09:32,651 --> 02:09:38,281 Well, I'm going to put 0x456, and the what am I going to do? 2584 02:09:38,281 --> 02:09:40,471 Based on these three lines of code, I'm going 2585 02:09:40,471 --> 02:09:44,671 to store in tmp whatever is at the address in A. What is the address in A? 2586 02:09:44,671 --> 02:09:47,701 That's this thing here, so I'm going to put 1 in tmp. 2587 02:09:47,701 --> 02:09:50,251 Line two-- I'm going to go to B-- 2588 02:09:50,251 --> 02:09:53,131 all right, B is 456, so I'm going to B and I'm 2589 02:09:53,131 --> 02:09:57,931 going to store 2 at whatever is at location A, and at location A 2590 02:09:57,931 --> 02:10:01,211 is 123, so that's this, so what am I going to do? 2591 02:10:01,211 --> 02:10:03,901 I'm going to change this 1 to a 2. 2592 02:10:03,901 --> 02:10:06,631 Last line of code-- get the value of tmp, which is 1, 2593 02:10:06,631 --> 02:10:11,731 and then put it at whatever the location B is, so B, 456, go there 2594 02:10:11,731 --> 02:10:16,291 and change it to be the value of tmp, tmp, which puts 1 here. 2595 02:10:16,291 --> 02:10:17,521 That's it for the code. 2596 02:10:17,521 --> 02:10:19,081 There's still no return value. 2597 02:10:19,081 --> 02:10:22,381 swap returns, which means these three temporary variables 2598 02:10:22,381 --> 02:10:24,091 are garbage values now. 2599 02:10:24,091 --> 02:10:26,471 They can be reused by subsequent function calls 2600 02:10:26,471 --> 02:10:31,091 but now, I've actually swapped the values of x and y. 2601 02:10:31,091 --> 02:10:35,041 Which is to say what came as naturally as the real world here for Mariana 2602 02:10:35,041 --> 02:10:38,521 is not quite as simply done in C because again, 2603 02:10:38,521 --> 02:10:40,861 functions are isolated from each other. 2604 02:10:40,861 --> 02:10:44,141 You can pass in values but you get copies of those values. 2605 02:10:44,141 --> 02:10:48,691 If you want one function to affect the value of a variable somewhere else, 2606 02:10:48,691 --> 02:10:52,021 you have to 1, understand what's going on but 2, 2607 02:10:52,021 --> 02:10:54,971 pass things in as by a pointer here. 2608 02:10:54,971 --> 02:10:58,561 So if I go back to my code here, I need to make a few changes now. 2609 02:10:58,561 --> 02:11:00,661 Let me get rid of these extra printf's. 2610 02:11:00,661 --> 02:11:03,391 Let me go in and add all these stars. 2611 02:11:03,391 --> 02:11:07,411 So I'm dereferencing these actual addresses here and here, 2612 02:11:07,411 --> 02:11:09,821 and I've got to make one more change. 2613 02:11:09,821 --> 02:11:16,381 How do I now call swap if swap is expecting an int star and an int star? 2614 02:11:16,381 --> 02:11:19,441 That is, the address of an int and the address of another int. 2615 02:11:19,441 --> 02:11:21,931 What do I change on line 11 here? 2616 02:11:21,931 --> 02:11:24,231 Yeah. 2617 02:11:24,231 --> 02:11:25,983 Sorry, a little louder. 2618 02:11:25,983 --> 02:11:30,231 AUDIENCE: [INAUDIBLE] 2619 02:11:30,231 --> 02:11:33,051 DAVID J. MALAN: Sorry, the address of operator. 2620 02:11:33,051 --> 02:11:37,731 So up here on line 11, we do ampersand x and ampersand y. 2621 02:11:37,731 --> 02:11:41,001 So that yes, we're technically passing in a copy of a value, 2622 02:11:41,001 --> 02:11:43,881 but this time the copy we're passing in is technically an address, 2623 02:11:43,881 --> 02:11:47,271 and as soon as we have an address, just like when I held up the fuzzy finger-- 2624 02:11:47,271 --> 02:11:50,571 the foamy finger-- I can point at that address, I can go to that address 2625 02:11:50,571 --> 02:11:54,561 and actually get a value from the mailbox or put a value into the mailbox 2626 02:11:54,561 --> 02:11:56,821 if I even want. 2627 02:11:56,821 --> 02:12:01,551 So let's cross our fingers now and do make swap, Enter. 2628 02:12:01,551 --> 02:12:02,721 Oh my God, so many mistakes. 2629 02:12:02,721 --> 02:12:04,881 Oh, I didn't remember to change my prototype, 2630 02:12:04,881 --> 02:12:08,421 so let me go way up here and add two more stars because I 2631 02:12:08,421 --> 02:12:09,801 made that change already. 2632 02:12:09,801 --> 02:12:14,961 Make swap, ./swap, and viola-- now I have actually swapped. 2633 02:12:14,961 --> 02:12:15,741 Thank you. 2634 02:12:19,291 --> 02:12:19,831 Thank you. 2635 02:12:19,831 --> 02:12:21,661 The two values. 2636 02:12:21,661 --> 02:12:24,491 All right, so what more can we do here? 2637 02:12:24,491 --> 02:12:29,461 Well, let me consider that all this time we've 2638 02:12:29,461 --> 02:12:33,691 been deliberately using GetString and GetInt and GetFloat 2639 02:12:33,691 --> 02:12:35,111 and so forth, but for a reason. 2640 02:12:35,111 --> 02:12:38,069 These aren't just training wheels for the sake of making things easier, 2641 02:12:38,069 --> 02:12:41,071 they're actually in place to make your code safer. 2642 02:12:41,071 --> 02:12:45,511 And to illustrate this, let me go ahead and open up one other file here. 2643 02:12:45,511 --> 02:12:49,861 How about a file called scanf.c. 2644 02:12:49,861 --> 02:12:52,891 It turns out that the old school way-- the way in C, 2645 02:12:52,891 --> 02:12:57,151 really, of getting user input, is via functions like scanf, 2646 02:12:57,151 --> 02:13:00,751 and let me go ahead and include stdio.h, int main(void), 2647 02:13:00,751 --> 02:13:04,441 and without using the CS50 library at all for strings or for any of those 2648 02:13:04,441 --> 02:13:05,611 get functions. 2649 02:13:05,611 --> 02:13:08,161 Let me give myself an int called x. 2650 02:13:08,161 --> 02:13:12,076 Let me just print out what the value of x is, even though it's going to be a-- 2651 02:13:12,076 --> 02:13:15,361 or rather, ask the user for the value by asking them for x. 2652 02:13:15,361 --> 02:13:18,781 And I'm going to use a function called scanf that's going to scan 2653 02:13:18,781 --> 02:13:25,351 in an integer using %i, and I'm going to store whatever the human types 2654 02:13:25,351 --> 02:13:27,306 in at this location. 2655 02:13:27,306 --> 02:13:30,181 And then I'm going to go ahead and, just so we can see what happened, 2656 02:13:30,181 --> 02:13:34,231 I'm going to print out with %i whatever the human typed in as follows. 2657 02:13:34,231 --> 02:13:37,321 All right, so line eight is week 1 style code. 2658 02:13:37,321 --> 02:13:40,991 Line five and six is week 1 style code. 2659 02:13:40,991 --> 02:13:46,411 So the curiosity today is this new line. scanf is another function in stdio.h, 2660 02:13:46,411 --> 02:13:47,971 and notice what I'm doing. 2661 02:13:47,971 --> 02:13:50,671 I'm using the same syntax that I use for printf, 2662 02:13:50,671 --> 02:13:54,091 which is kind of a little clue-- a format code to tell scanf what it is I 2663 02:13:54,091 --> 02:13:57,031 want to scan in, that is, read from the human's keyboard-- 2664 02:13:57,031 --> 02:14:00,571 and I'm telling it where to put whatever the human typed in. 2665 02:14:00,571 --> 02:14:04,321 I can't just say x, because we run into the same darn problem as with swap. 2666 02:14:04,321 --> 02:14:06,811 I have to give a little breadcrumb to the variable 2667 02:14:06,811 --> 02:14:10,111 where I want scanf to put the human's integer. 2668 02:14:10,111 --> 02:14:13,541 And so this just tells the computer to get an int. 2669 02:14:13,541 --> 02:14:15,781 This is what you would have had to type, essentially, 2670 02:14:15,781 --> 02:14:18,691 in week 1 just to get an int from the user, 2671 02:14:18,691 --> 02:14:21,541 and there's a whole bunch of things that can go wrong still, 2672 02:14:21,541 --> 02:14:24,931 but that's the cryptic syntax we would have had to show you in week 1. 2673 02:14:24,931 --> 02:14:26,881 Let me go ahead and make scanf here-- 2674 02:14:26,881 --> 02:14:29,941 oops-- user error. 2675 02:14:29,941 --> 02:14:31,891 Put the semicolon in the wrong place. 2676 02:14:31,891 --> 02:14:33,781 Make scanf, Enter. 2677 02:14:33,781 --> 02:14:35,281 Oh my God. 2678 02:14:35,281 --> 02:14:36,676 Non void doesn't return a value. 2679 02:14:40,371 --> 02:14:42,591 Oh, thank you. 2680 02:14:42,591 --> 02:14:43,221 Strike two. 2681 02:14:43,221 --> 02:14:43,851 OK. 2682 02:14:43,851 --> 02:14:45,141 Make scanf. 2683 02:14:45,141 --> 02:14:45,831 There we go. 2684 02:14:45,831 --> 02:14:46,971 OK, so scanf-- 2685 02:14:46,971 --> 02:14:49,951 I'm going to type in a number like 50 and it just prints it back out. 2686 02:14:49,951 --> 02:14:54,181 So that is the traditional way of implementing something like GetInt. 2687 02:14:54,181 --> 02:14:57,651 The problem, though, is when you start to get into strings, things 2688 02:14:57,651 --> 02:14:59,121 get dangerous quickly. 2689 02:14:59,121 --> 02:15:01,289 Let me delete all of this and give myself 2690 02:15:01,289 --> 02:15:03,831 a string s, although wait a minute-- we don't call it strings 2691 02:15:03,831 --> 02:15:06,891 anymore-- char star to store a string. 2692 02:15:06,891 --> 02:15:10,731 Then let me go ahead and just prompt the user for a string, using just printf. 2693 02:15:10,731 --> 02:15:15,531 Then let me go ahead and use scanf, ask them for a string this time with %s, 2694 02:15:15,531 --> 02:15:18,211 and store it at that address. 2695 02:15:18,211 --> 02:15:20,751 Then let me go ahead and print out whatever the human typed 2696 02:15:20,751 --> 02:15:23,641 in just by using the same notation. 2697 02:15:23,641 --> 02:15:28,791 So here, line five is the same thing as string s, but we've taken back 2698 02:15:28,791 --> 02:15:31,191 that layer today so it's char star s. 2699 02:15:31,191 --> 02:15:35,991 This is just week one this is just week one, line seven is new. 2700 02:15:35,991 --> 02:15:41,811 scanf will also read from the human's keyboard a string and store it at s. 2701 02:15:41,811 --> 02:15:43,641 But that's OK, because s is an address. 2702 02:15:43,641 --> 02:15:46,551 It's correct not to do the ampersand. 2703 02:15:46,551 --> 02:15:47,451 It's not necessary. 2704 02:15:47,451 --> 02:15:52,071 A string is and has always been a char star, a.k.a string. 2705 02:15:52,071 --> 02:15:54,091 The problem, though, arises as follows-- 2706 02:15:54,091 --> 02:15:56,411 if I do make scanf-- 2707 02:15:56,411 --> 02:15:57,911 oh my God, what did I do wrong-- 2708 02:15:57,911 --> 02:16:00,431 I can't-- OK, we have certain defenses in place with make. 2709 02:16:00,431 --> 02:16:06,881 Let me do clang of scanf.c, an output of program called scanf. 2710 02:16:06,881 --> 02:16:09,838 All right, so I'm overriding some of our pedagogical defenses 2711 02:16:09,838 --> 02:16:11,171 that we have in place with make. 2712 02:16:11,171 --> 02:16:15,761 Let me now run scanf of this version, Enter, and let me type in something 2713 02:16:15,761 --> 02:16:20,341 like, how about hi again. 2714 02:16:20,341 --> 02:16:23,161 So it didn't even store something and it weirdly printed out null. 2715 02:16:23,161 --> 02:16:26,821 This time it's in lowercase, but that is somewhat related. 2716 02:16:26,821 --> 02:16:31,561 What did I fundamentally do wrong though, here? 2717 02:16:31,561 --> 02:16:33,691 Why is this getting more and more dangerous? 2718 02:16:33,691 --> 02:16:35,471 And let me illustrate the point even more. 2719 02:16:35,471 --> 02:16:38,741 What if I type in not just something like hello, which also doesn't work. 2720 02:16:38,741 --> 02:16:44,581 What if I do like, hellooooo and make a really long string, Enter-- 2721 02:16:44,581 --> 02:16:45,871 that still works. 2722 02:16:45,871 --> 02:16:48,191 Can I do this again? 2723 02:16:48,191 --> 02:16:50,091 Let's try again. 2724 02:16:50,091 --> 02:16:53,271 Right, a really long, unexpectedly long string. 2725 02:16:53,271 --> 02:16:55,131 This is the nondeterminism kicking in. 2726 02:16:55,131 --> 02:16:55,851 Enter. 2727 02:16:55,851 --> 02:16:56,421 All right, damn it. 2728 02:16:56,421 --> 02:16:58,254 I was trying to trigger a segmentation fault 2729 02:16:58,254 --> 02:17:01,491 but it wouldn't, but the point still remains. 2730 02:17:01,491 --> 02:17:06,181 It's still not working, but what's the essence of why this isn't working, 2731 02:17:06,181 --> 02:17:07,851 and it's not storing my actual input? 2732 02:17:07,851 --> 02:17:08,731 Yeah. 2733 02:17:08,731 --> 02:17:10,666 AUDIENCE: Do you have to make a space? 2734 02:17:10,666 --> 02:17:12,541 DAVID J. MALAN: We have to make space for it. 2735 02:17:12,541 --> 02:17:15,781 So what we're missing here is malloc, or something like that. 2736 02:17:15,781 --> 02:17:18,741 So I could do that, I could do something like this. 2737 02:17:18,741 --> 02:17:21,441 Well, let the human type in at least a three letter word 2738 02:17:21,441 --> 02:17:25,581 so I could do malloc of 3 plus 1 for the null character. 2739 02:17:25,581 --> 02:17:29,961 So let me give them four characters, and let me go ahead and do make scanf-- 2740 02:17:29,961 --> 02:17:30,921 whoops. 2741 02:17:30,921 --> 02:17:33,081 Nope, sorry. clang, I have to-- 2742 02:17:33,081 --> 02:17:33,721 nope. 2743 02:17:33,721 --> 02:17:34,221 Dammit. 2744 02:17:34,221 --> 02:17:40,811 Oh, include stdlib.h-- there we go. 2745 02:17:40,811 --> 02:17:43,836 That gives me malloc, now I'm going to recompile this with clang, 2746 02:17:43,836 --> 02:17:46,961 now I'm going to rerun it, and now I'm going to type in my first thing, hi. 2747 02:17:46,961 --> 02:17:48,341 That now works. 2748 02:17:48,341 --> 02:17:52,061 And let me get a little aggressive now and type in hello, which is too long. 2749 02:17:52,061 --> 02:17:54,101 Still works, but I'm getting lucky. 2750 02:17:54,101 --> 02:17:57,671 Let me try a hellooooooo. 2751 02:17:57,671 --> 02:17:59,995 Damn it, that still works, too. 2752 02:17:59,995 --> 02:18:01,091 Sort of. 2753 02:18:01,091 --> 02:18:03,290 But it actually-- not quite. 2754 02:18:03,290 --> 02:18:05,411 There's some weirdness going on there already. 2755 02:18:05,411 --> 02:18:07,011 It turns out I can also do this. 2756 02:18:07,011 --> 02:18:10,390 I could actually just say char star four and give myself 2757 02:18:10,390 --> 02:18:11,681 an array of four characters. 2758 02:18:11,681 --> 02:18:13,101 Let me try this one more time. 2759 02:18:13,101 --> 02:18:16,661 So let me rerun clang ./scanf. 2760 02:18:16,661 --> 02:18:21,460 Hellooooooo, clearly exceeding the four characters-- 2761 02:18:21,460 --> 02:18:22,091 there we go. 2762 02:18:22,091 --> 02:18:23,080 Thank you, all right. 2763 02:18:26,821 --> 02:18:29,342 So the point here, though, is if we hadn't given you GetInt, 2764 02:18:29,342 --> 02:18:31,800 you would have had to use the scanf thing-- not a huge deal 2765 02:18:31,800 --> 02:18:33,071 because it seemed to work. 2766 02:18:33,071 --> 02:18:36,321 But if we hadn't given you GetString you would have had to do stuff like this, 2767 02:18:36,321 --> 02:18:39,481 knowing about malloc already or knowing about strings being erased, 2768 02:18:39,481 --> 02:18:41,550 and even now there's a danger. 2769 02:18:41,550 --> 02:18:45,751 If the human types in five letters, six letters, 100 letters-- this code, 2770 02:18:45,751 --> 02:18:49,501 like with the Hello input, will probably just crash, which is bad. 2771 02:18:49,501 --> 02:18:51,481 So GetString also has this functionality built 2772 02:18:51,481 --> 02:18:53,790 in where we have a fancy loop inside such 2773 02:18:53,790 --> 02:18:58,321 that we allocate using malloc as many bytes as you physically type in, 2774 02:18:58,321 --> 02:19:00,271 and we use malloc essentially every keystroke. 2775 02:19:00,271 --> 02:19:05,101 The moment you type in h-e-l-l-o, we're laying the tracks as we go and we keep 2776 02:19:05,101 --> 02:19:09,571 allocating more and more memory so that we theoretically will never crash with 2777 02:19:09,571 --> 02:19:12,300 GetString even though it's this easy to crack-- 2778 02:19:12,300 --> 02:19:15,451 this easy to crash your code using scanf if you again 2779 02:19:15,451 --> 02:19:18,121 did it without the help of a library. 2780 02:19:18,121 --> 02:19:20,178 So where are we all going with this? 2781 02:19:20,178 --> 02:19:22,261 Well, let me show you a few final examples that'll 2782 02:19:22,261 --> 02:19:24,601 pave the way for what will be problem set four. 2783 02:19:24,601 --> 02:19:27,761 Let me go ahead and open up from today's code-- 2784 02:19:27,761 --> 02:19:29,880 which is available on the course's website-- 2785 02:19:29,880 --> 02:19:36,841 for instance, a program like this, called phonebook.c, 2786 02:19:36,841 --> 02:19:39,540 and I'm just going to give you a quick tour of it, 2787 02:19:39,540 --> 02:19:42,502 that you'll see more details on in the context of p-set four itself. 2788 02:19:42,502 --> 02:19:45,210 We're going to introduce a few new functions you're going to see. 2789 02:19:45,210 --> 02:19:48,451 You're going to see a function called fopen, which stands for file open, 2790 02:19:48,451 --> 02:19:51,842 and it takes two arguments-- the name of a file to open like a CSV 2791 02:19:51,842 --> 02:19:55,050 that you might manipulate in Excel or Google Spreadsheets or the like-- comma 2792 02:19:55,050 --> 02:19:59,851 separated values, and then something like A for append, R for read, 2793 02:19:59,851 --> 02:20:02,790 W for write, depending on whether you want to add to the file, 2794 02:20:02,790 --> 02:20:05,321 just open it up, or change it. 2795 02:20:05,321 --> 02:20:07,831 We're going to introduce you to a file pointer. 2796 02:20:07,831 --> 02:20:09,671 You'll see that capital file-- 2797 02:20:09,671 --> 02:20:12,271 which is a little bit unconventional-- capital file is 2798 02:20:12,271 --> 02:20:15,121 a pointer to an actual file on the computer's hard drive 2799 02:20:15,121 --> 02:20:17,640 so that you can actually access something like a CSV file, 2800 02:20:17,640 --> 02:20:18,991 or heck, even images. 2801 02:20:18,991 --> 02:20:21,300 And we're going to see down below that you're also 2802 02:20:21,300 --> 02:20:25,050 going to have the ability to write files as well, or print to files. 2803 02:20:25,050 --> 02:20:28,981 You'll see functions like printf printf for file printf. 2804 02:20:28,981 --> 02:20:34,111 Or fwrite-- file write-- which now that you will begin to understand pointers, 2805 02:20:34,111 --> 02:20:37,951 you'll have the ability to actually not only read files-- 2806 02:20:37,951 --> 02:20:41,470 text files, images, other things-- but also write them out. 2807 02:20:41,470 --> 02:20:46,921 In fact for instance, just as a teaser here, JPEGs will be one of the things 2808 02:20:46,921 --> 02:20:49,321 we focus on this week where we give you a forensic image 2809 02:20:49,321 --> 02:20:51,991 and your goal is to recover as many photographs 2810 02:20:51,991 --> 02:20:55,651 from this forensic image of a digital camera as you possibly can. 2811 02:20:55,651 --> 02:20:59,071 And the way you're going to do that is by knowing in advance 2812 02:20:59,071 --> 02:21:03,571 that every JPEG in the world starts with these three bytes, written 2813 02:21:03,571 --> 02:21:05,800 in hexadecimal, but these three numbers. 2814 02:21:05,800 --> 02:21:08,521 And so in fact, just as a teaser, let me open up 2815 02:21:08,521 --> 02:21:11,701 an example you'll see on the course's website for today. 2816 02:21:11,701 --> 02:21:14,436 If I scroll through here, you'll see a program 2817 02:21:14,436 --> 02:21:16,061 that does a little something like this. 2818 02:21:16,061 --> 02:21:18,211 And again, more on this-- 2819 02:21:18,211 --> 02:21:20,401 if we could hit the button-- 2820 02:21:20,401 --> 02:21:21,041 there we go. 2821 02:21:21,041 --> 02:21:26,221 So here we have the notion of a byte we're going to create for ourselves. 2822 02:21:26,221 --> 02:21:29,101 We'll see a data type called byte, which is a common convention. 2823 02:21:29,101 --> 02:21:30,341 This gives me three bytes. 2824 02:21:30,341 --> 02:21:32,674 And you're going to learn about a function called fread, 2825 02:21:32,674 --> 02:21:36,571 which reads from a file some number of bytes-- for instance, three bytes. 2826 02:21:36,571 --> 02:21:38,341 We might then use code like this. 2827 02:21:38,341 --> 02:21:42,001 If bytes bracket zero equals equals 0xFF and bytes 2828 02:21:42,001 --> 02:21:47,761 bracket 1 equals 0xD8 and bytes bracket 2 equals 0xFF, all three of those 2829 02:21:47,761 --> 02:21:52,481 bytes I just claimed represent a JPEG, you'll see an output like this. 2830 02:21:52,481 --> 02:21:55,811 Let me go ahead and run this program as follows. 2831 02:21:55,811 --> 02:21:59,921 Let me copy jpeg.c into my directory from today's distribution. 2832 02:21:59,921 --> 02:22:08,071 Let me do make jpeg, and let me run jpeg on a file which is available online 2833 02:22:08,071 --> 02:22:11,841 called lecture.jpeg, and I claim yes, it's possibly a JPEG. 2834 02:22:11,841 --> 02:22:12,841 Well, what is that file? 2835 02:22:12,841 --> 02:22:16,481 Let me open it up for us, called lecture.jpeg, and here, for instance, 2836 02:22:16,481 --> 02:22:20,581 is that same photo with which we began class, namely implemented as a JPEG. 2837 02:22:20,581 --> 02:22:22,711 But what we're also going to do this week 2838 02:22:22,711 --> 02:22:27,631 is start to implement our own sort of filters a la Instagram, whereby 2839 02:22:27,631 --> 02:22:30,901 we might take images and actually run them through a program that 2840 02:22:30,901 --> 02:22:32,919 creates different versions thereof. 2841 02:22:32,919 --> 02:22:34,711 For instance, using a different file format 2842 02:22:34,711 --> 02:22:38,501 called BMP, which essentially lays out all of its pixels from left to right, 2843 02:22:38,501 --> 02:22:39,901 top to bottom, in a grid. 2844 02:22:39,901 --> 02:22:41,461 You're going to see a struct-- 2845 02:22:41,461 --> 02:22:43,501 a data struct in C that's way more complicated 2846 02:22:43,501 --> 02:22:45,631 than the candidate structure from the past, 2847 02:22:45,631 --> 02:22:47,866 or the person structure from the past, that 2848 02:22:47,866 --> 02:22:50,491 looks like this, which is just a whole bunch more values in it, 2849 02:22:50,491 --> 02:22:52,408 but we'll walk you through these in the p-set. 2850 02:22:52,408 --> 02:22:54,421 And we might take a photograph like this and ask 2851 02:22:54,421 --> 02:22:56,881 you to run a few different filters on it a la Instagram, 2852 02:22:56,881 --> 02:23:00,511 like a black and white filter, or grayscale, a sepia filter 2853 02:23:00,511 --> 02:23:04,531 to give it some old school feel, or a reflection like this to invert it, 2854 02:23:04,531 --> 02:23:07,121 or blur it, even in this way. 2855 02:23:07,121 --> 02:23:10,111 And just to end on a note here, I have a version 2856 02:23:10,111 --> 02:23:13,621 of this code ready to go that doesn't implement all of those filters, 2857 02:23:13,621 --> 02:23:16,351 it just implements one filter initially. 2858 02:23:16,351 --> 02:23:19,051 Let me go ahead and just ready this on my computer here. 2859 02:23:19,051 --> 02:23:21,106 I'm going to go into my own version of filter 2860 02:23:21,106 --> 02:23:22,981 and you'll see a few files that will give you 2861 02:23:22,981 --> 02:23:26,621 a tour of this coming week in bitmap.h, for instance, 2862 02:23:26,621 --> 02:23:31,511 is a version of this structure that I claimed existed a moment ago. 2863 02:23:31,511 --> 02:23:39,361 And let me show you this file here, helpers.c, in which there is a function 2864 02:23:39,361 --> 02:23:43,051 called filter that I've already implemented in advance today. 2865 02:23:43,051 --> 02:23:46,111 But the ones we give you for the piece that won't already be implemented, 2866 02:23:46,111 --> 02:23:48,486 this function called filter takes the height of an image, 2867 02:23:48,486 --> 02:23:51,581 the width of an image, and a two dimensional array. 2868 02:23:51,581 --> 02:23:54,571 So rows and columns of pixels, and then I 2869 02:23:54,571 --> 02:23:58,411 have a loop like this that iterates over all of the pixels in an image from top 2870 02:23:58,411 --> 02:24:00,041 to bottom, left to right. 2871 02:24:00,041 --> 02:24:02,011 And then notice what I'm going to do here. 2872 02:24:02,011 --> 02:24:05,191 I'm going to change the blue value to be zero in this case, 2873 02:24:05,191 --> 02:24:07,601 and the green value to be zero in this case. 2874 02:24:07,601 --> 02:24:08,341 But why? 2875 02:24:08,341 --> 02:24:12,091 Well, the image I have here in mind is this one, 2876 02:24:12,091 --> 02:24:14,881 whereby we have this hidden image that simply 2877 02:24:14,881 --> 02:24:18,151 has old school style-- a secret message embedded in it. 2878 02:24:18,151 --> 02:24:21,361 And if you don't happen to have in your dorm one of these secret decoder 2879 02:24:21,361 --> 02:24:23,581 glasses that essentially make everything red-- 2880 02:24:23,581 --> 02:24:26,456 getting rid of the green in the world and the blue in the world-- 2881 02:24:26,456 --> 02:24:28,831 you can actually-- I'm actually probably the only one who 2882 02:24:28,831 --> 02:24:31,111 can read this right now-- see what message 2883 02:24:31,111 --> 02:24:33,391 is hidden behind all of this red noise. 2884 02:24:33,391 --> 02:24:39,121 But if using my code written here in helpers.c I get rid of all the blue 2885 02:24:39,121 --> 02:24:41,821 in the picture and I get rid of all the green in the picture, 2886 02:24:41,821 --> 02:24:44,431 essentially implementing the idea of this filter-- 2887 02:24:44,431 --> 02:24:47,251 this red filter where you only see red-- 2888 02:24:47,251 --> 02:24:50,501 well, let's go ahead and compile this program. 2889 02:24:50,501 --> 02:24:55,471 Make filter, run ./filter on this hidden message.bmp. 2890 02:24:55,471 --> 02:24:58,531 I'm going to save it in a new file called message.bmp, 2891 02:24:58,531 --> 02:25:01,471 and with one final flourish we're going to open up 2892 02:25:01,471 --> 02:25:05,371 message.bmp, which is the result of having put on these glasses, 2893 02:25:05,371 --> 02:25:08,521 and hopefully now you too will see what I see. 2894 02:25:17,531 --> 02:25:18,931 All right, that's it for CS50! 2895 02:25:18,931 --> 02:25:19,931 We'll see you next time. 2896 02:25:21,731 --> 02:25:25,681 [MUSIC PLAYING] 245641

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.