All language subtitles for lecture2

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:04,910 [MUSIC PLAYING] 2 00:00:04,910 --> 00:00:17,155 3 00:00:17,155 --> 00:00:18,780 CARTER ZENKE: Well, hello, one and all. 4 00:00:18,780 --> 00:00:22,890 And welcome back to CS50's Introduction to Databases with SQL. 5 00:00:22,890 --> 00:00:24,360 My name is Carter Zenke. 6 00:00:24,360 --> 00:00:26,820 And last we left off, we learned about relating-- 7 00:00:26,820 --> 00:00:31,360 that is, how to have multiple tables in our database for people, places, 8 00:00:31,360 --> 00:00:34,500 and things, and how to have them relate to one another in the way 9 00:00:34,500 --> 00:00:36,580 you might do in the real world. 10 00:00:36,580 --> 00:00:38,550 Now, today, we'll take a step forward. 11 00:00:38,550 --> 00:00:41,370 And we'll talk about how to put you in the driver's seat, 12 00:00:41,370 --> 00:00:45,870 designing your very own database schemas to organize your data. 13 00:00:45,870 --> 00:00:49,890 Now, we'll pick up where we left off, which is with this database of books. 14 00:00:49,890 --> 00:00:52,050 So we had this database. 15 00:00:52,050 --> 00:00:56,760 It was full of books that have been longlisted for the International Booker 16 00:00:56,760 --> 00:00:57,450 Prize. 17 00:00:57,450 --> 00:01:01,740 To be longlisted means to be nominated for some prize, let's say. 18 00:01:01,740 --> 00:01:05,190 So we had the past five years of books in this database. 19 00:01:05,190 --> 00:01:10,170 And we worked on improving this database over time from week 0 to week 1. 20 00:01:10,170 --> 00:01:13,080 Now, this week, we'll actually take a look underneath the hood 21 00:01:13,080 --> 00:01:17,980 and see what commands we had used to create these varied databases. 22 00:01:17,980 --> 00:01:21,660 So let's reveal now what we had done all along. 23 00:01:21,660 --> 00:01:24,010 I'll go back to my terminal here. 24 00:01:24,010 --> 00:01:26,040 And if you remember, I could use a command 25 00:01:26,040 --> 00:01:29,730 to open up a database file, which was this command here, 26 00:01:29,730 --> 00:01:34,450 SQL sqlite3, and then the name of the file I want to open. 27 00:01:34,450 --> 00:01:35,910 So let's try this. 28 00:01:35,910 --> 00:01:39,420 I'll go back to week 0, to my week 0 folder like this. 29 00:01:39,420 --> 00:01:43,035 And I'll open up my long list database, week0/longlist.db. 30 00:01:43,035 --> 00:01:46,170 31 00:01:46,170 --> 00:01:47,430 I'll hit Enter. 32 00:01:47,430 --> 00:01:49,720 Now, I'm in my sqlite prompt. 33 00:01:49,720 --> 00:01:52,230 So I could work on typing some SQL commands, 34 00:01:52,230 --> 00:01:58,030 SQL statements or queries inside of this terminal prompt here. 35 00:01:58,030 --> 00:02:01,660 So if I want to get a feel for what's inside this database, 36 00:02:01,660 --> 00:02:04,230 we saw I could use a command called select. 37 00:02:04,230 --> 00:02:07,860 So I'll select now some rows from this table. 38 00:02:07,860 --> 00:02:14,370 I'll say SELECT, let's say, the title and also the author columns 39 00:02:14,370 --> 00:02:19,740 from my longlist table, semicolon, and Enter. 40 00:02:19,740 --> 00:02:23,760 Now, I'll see all the titles and all the authors that 41 00:02:23,760 --> 00:02:27,030 were inside of this long list table. 42 00:02:27,030 --> 00:02:29,250 But I only want to peek here. 43 00:02:29,250 --> 00:02:32,760 I only want to see roughly what kind of data is inside. 44 00:02:32,760 --> 00:02:36,130 So I could probably improve this command here. 45 00:02:36,130 --> 00:02:43,530 I could instead say SELECT the author and title columns from longlist-- 46 00:02:43,530 --> 00:02:49,470 from longlist, but limit now to the first five rows we saw before. 47 00:02:49,470 --> 00:02:52,440 Semicolon, and I'll hit Enter on this query. 48 00:02:52,440 --> 00:02:56,590 And now, I'll see only the top five rows. 49 00:02:56,590 --> 00:03:02,860 So I'm able now to see what kind of data is inside my database. 50 00:03:02,860 --> 00:03:08,790 But what I can't yet see is what command was used to create this table 51 00:03:08,790 --> 00:03:11,950 and what kind of data could be stored inside of it. 52 00:03:11,950 --> 00:03:16,140 So let's reveal now what was going on underneath the hood all this time. 53 00:03:16,140 --> 00:03:20,040 I'll say, this new command, a sqlite command-- not a SQL keyword, 54 00:03:20,040 --> 00:03:24,580 but sqlite command called .schema, .schema. 55 00:03:24,580 --> 00:03:28,350 Now, if I hit Enter, I'll see the following-- 56 00:03:28,350 --> 00:03:30,840 the command, the statement, the query that 57 00:03:30,840 --> 00:03:34,260 was used to create this longlist table. 58 00:03:34,260 --> 00:03:37,980 And notice how I have many columns inside this table. 59 00:03:37,980 --> 00:03:43,480 I have an ISBN column, a title column, an author column, and so on. 60 00:03:43,480 --> 00:03:47,340 And each column seems to have some kind of data 61 00:03:47,340 --> 00:03:52,080 that could be stored inside of it like text, or integers, or real, 62 00:03:52,080 --> 00:03:56,270 or floating point values, or decimals, if you're familiar. 63 00:03:56,270 --> 00:04:01,510 So this is how we created the very first version of longlist.db. 64 00:04:01,510 --> 00:04:04,165 But let's also see how we created the second. 65 00:04:04,165 --> 00:04:09,190 So I'll type .quit to leave this version of longlist.db. 66 00:04:09,190 --> 00:04:12,650 And now, let me open up the next version we had created, 67 00:04:12,650 --> 00:04:13,910 which is more relational. 68 00:04:13,910 --> 00:04:17,350 It had tables inside of it that could relate to one another. 69 00:04:17,350 --> 00:04:21,100 So I'm going to type sqlite3, then long-- 70 00:04:21,100 --> 00:04:26,320 sqlite3, and go to week 1, and then type longlist.db. 71 00:04:26,320 --> 00:04:27,700 I'll hit Enter. 72 00:04:27,700 --> 00:04:31,720 Now, I'm on my next version of longlist.db. 73 00:04:31,720 --> 00:04:32,960 Well, what could I do? 74 00:04:32,960 --> 00:04:36,050 I could type SELECT and look at some of the tables in here 75 00:04:36,050 --> 00:04:38,200 to see what kind of data is inside. 76 00:04:38,200 --> 00:04:44,350 I could perhaps say SELECT maybe the names from the authors table 77 00:04:44,350 --> 00:04:48,220 here, from the authors table, and hit semicolon. 78 00:04:48,220 --> 00:04:54,280 Now, I'll see all the names of authors that are inside of the authors table. 79 00:04:54,280 --> 00:04:55,930 I could do the same thing for books. 80 00:04:55,930 --> 00:04:58,150 I could maybe look at the titles of books. 81 00:04:58,150 --> 00:05:05,350 I could say SELECT title from the books table, semicolon, and Enter. 82 00:05:05,350 --> 00:05:10,490 Now, I see all of the titles that are inside my books table. 83 00:05:10,490 --> 00:05:14,230 But what I haven't seen yet is the schema of this database. 84 00:05:14,230 --> 00:05:17,080 It is the way it is organized and the commands 85 00:05:17,080 --> 00:05:19,700 that were used to create these tables. 86 00:05:19,700 --> 00:05:21,730 So let me work on that now. 87 00:05:21,730 --> 00:05:24,460 I'll clear my terminal using Control-L. 88 00:05:24,460 --> 00:05:28,090 And now, let me type .schema again. 89 00:05:28,090 --> 00:05:34,420 I'll type .schema to see what commands were used to create this database. 90 00:05:34,420 --> 00:05:37,630 Hit Enter, and I can see-- 91 00:05:37,630 --> 00:05:39,670 I mean, there are quite a lot of commands here. 92 00:05:39,670 --> 00:05:42,610 Now, if this feels overwhelming, I mean, I'd be right there with you. 93 00:05:42,610 --> 00:05:45,610 This is a lot of commands to parse through and read. 94 00:05:45,610 --> 00:05:49,850 So there's probably a better way to do this. 95 00:05:49,850 --> 00:05:55,660 And one way to try is to type .schema, and then give it some table name. 96 00:05:55,660 --> 00:05:58,930 Let's say, want to understand the schema for just the books table, 97 00:05:58,930 --> 00:06:00,770 like just that for now. 98 00:06:00,770 --> 00:06:04,060 So I'll say .schema, and then the table name-- 99 00:06:04,060 --> 00:06:05,720 books, in this case. 100 00:06:05,720 --> 00:06:07,840 Then I can hit Enter. 101 00:06:07,840 --> 00:06:11,500 Now, I'll see the schema, the organization, 102 00:06:11,500 --> 00:06:15,860 the command that we used to create the books table, in this case. 103 00:06:15,860 --> 00:06:18,250 And notice again, we have several columns-- 104 00:06:18,250 --> 00:06:22,420 ID, ISBN, title, publisher ID, and so on. 105 00:06:22,420 --> 00:06:26,380 Each one has their own kind of data they could support or take 106 00:06:26,380 --> 00:06:30,590 into this column like integers, text, and so on. 107 00:06:30,590 --> 00:06:33,700 So again, what we'll do today is have you all 108 00:06:33,700 --> 00:06:36,640 learn how to write your very own create table 109 00:06:36,640 --> 00:06:41,050 commands to build your very own databases that represent what you 110 00:06:41,050 --> 00:06:44,360 want to represent in the real world. 111 00:06:44,360 --> 00:06:47,860 So let me exit this prompt here. 112 00:06:47,860 --> 00:06:51,370 And let me propose that we'll have a bit of a design challenge 113 00:06:51,370 --> 00:06:55,600 today to actually try to represent a real world entity with some database 114 00:06:55,600 --> 00:06:56,240 here. 115 00:06:56,240 --> 00:06:59,050 And if you're not already familiar, Boston 116 00:06:59,050 --> 00:07:02,110 is perhaps famous for being among the first cities 117 00:07:02,110 --> 00:07:05,560 to have a subway system in the United States. 118 00:07:05,560 --> 00:07:09,730 So here is a picture from the late 1800s of a subway being 119 00:07:09,730 --> 00:07:11,950 built in Boston's city streets. 120 00:07:11,950 --> 00:07:14,170 Underneath the streets here, there would be 121 00:07:14,170 --> 00:07:18,100 trolley cars that would go and transport people across Boston. 122 00:07:18,100 --> 00:07:20,410 Here's another picture of a trolley actually working 123 00:07:20,410 --> 00:07:21,790 underneath the streets. 124 00:07:21,790 --> 00:07:23,920 So people would go down underneath. 125 00:07:23,920 --> 00:07:25,360 They would hop on a trolley. 126 00:07:25,360 --> 00:07:29,950 They'd be able to different parts of Boston, perhaps from Harvard to MIT 127 00:07:29,950 --> 00:07:33,550 or downtown up to, let's say, Braintree or down to Braintree, 128 00:07:33,550 --> 00:07:36,070 which is more south of Boston, for example. 129 00:07:36,070 --> 00:07:39,280 One of the famous stops is the Park Street stop, 130 00:07:39,280 --> 00:07:41,530 which is right down in the middle of Boston, 131 00:07:41,530 --> 00:07:45,010 one of the central hubs of this subway system. 132 00:07:45,010 --> 00:07:50,480 And now, these photos are all from, let's say, the early 1900s, late 1800s, 133 00:07:50,480 --> 00:07:51,230 and so on. 134 00:07:51,230 --> 00:07:54,070 But the subway has gotten a lot more modern since then. 135 00:07:54,070 --> 00:07:59,000 And actually, now, we have several lines that span the entire city and beyond. 136 00:07:59,000 --> 00:08:02,740 So here, we have the Red Line, of which Harvard and MIT are a part. 137 00:08:02,740 --> 00:08:05,800 We have the green line, which brings you kind of west to east-- 138 00:08:05,800 --> 00:08:09,680 the blue line, the orange line, and so on. 139 00:08:09,680 --> 00:08:14,790 So many more lines and stations have been added to this system. 140 00:08:14,790 --> 00:08:18,460 It's a big design challenge to represent all of these stations, 141 00:08:18,460 --> 00:08:24,560 all of these lines, and all of these people who might ride this subway, too. 142 00:08:24,560 --> 00:08:31,220 So the question then becomes, how can we create a schema for this data? 143 00:08:31,220 --> 00:08:36,350 And again, by schema, we mean what kinds of tables should we have, 144 00:08:36,350 --> 00:08:40,610 what kinds of columns might those tables have, and what kind of data 145 00:08:40,610 --> 00:08:44,159 should we put in each of those columns, for instance. 146 00:08:44,159 --> 00:08:46,320 So let me propose this. 147 00:08:46,320 --> 00:08:49,790 Let's say we start just first with names and stations. 148 00:08:49,790 --> 00:08:53,750 So Charlie here, our very first rider on this system, 149 00:08:53,750 --> 00:08:56,810 is going to be at the Kendall at MIT station. 150 00:08:56,810 --> 00:08:58,910 So this is what this table represents now. 151 00:08:58,910 --> 00:09:01,370 But what more could we have? 152 00:09:01,370 --> 00:09:05,750 Well, we might also want to have maybe what Charlie is doing at that station. 153 00:09:05,750 --> 00:09:09,050 Maybe he's entering the station, for instance. 154 00:09:09,050 --> 00:09:11,390 And if you're familiar with the subway system, 155 00:09:11,390 --> 00:09:17,550 you often have to pay to get onto a train or get onto the station itself. 156 00:09:17,550 --> 00:09:23,570 So let's say Charlie pays some fare to enter into the Kendall/MIT station. 157 00:09:23,570 --> 00:09:26,900 Well, back in the mid 1900s, the fare was only about a dime. 158 00:09:26,900 --> 00:09:27,800 It was $0.10. 159 00:09:27,800 --> 00:09:33,820 So we'll say Charlie paid $0.10 to enter the Kendall/MIT station. 160 00:09:33,820 --> 00:09:36,140 And now, this seems pretty good. 161 00:09:36,140 --> 00:09:41,300 But if I am the Transit Authority, the person who runs the subway system, 162 00:09:41,300 --> 00:09:45,890 I probably want to know, does Charlie have enough money to get on the train? 163 00:09:45,890 --> 00:09:48,730 And if so I want to make sure that, OK, well, Charlie actually 164 00:09:48,730 --> 00:09:50,590 could get on this train. 165 00:09:50,590 --> 00:09:53,420 So let's say not only does Charlie pay a fare. 166 00:09:53,420 --> 00:09:56,570 He has some remaining balance afterwards. 167 00:09:56,570 --> 00:10:00,160 So Charlie here has gotten onto the Kendall/MIT stop. 168 00:10:00,160 --> 00:10:05,330 He's paid the fare of $0.10 and has $0.05 left. 169 00:10:05,330 --> 00:10:08,750 OK, so here's a bit of a table. 170 00:10:08,750 --> 00:10:11,030 We probably add more information to it. 171 00:10:11,030 --> 00:10:15,230 Let's say Charlie then leaves at the Jamaica Plain stop. 172 00:10:15,230 --> 00:10:18,650 And the fare to leave is about a nickel, $0.05. 173 00:10:18,650 --> 00:10:21,410 And now, Charlie has no cents left over. 174 00:10:21,410 --> 00:10:27,680 So again, Charlie paid $0.10 to get on, had $0.05 left, paid $0.05 to get off, 175 00:10:27,680 --> 00:10:31,930 and now has no remaining balance here anymore. 176 00:10:31,930 --> 00:10:32,440 OK. 177 00:10:32,440 --> 00:10:34,040 So that's Charlie's story. 178 00:10:34,040 --> 00:10:35,860 Let's look at Alice, though. 179 00:10:35,860 --> 00:10:38,530 Let's say Alice gets on at the Harvard stop. 180 00:10:38,530 --> 00:10:42,310 They too pay $0.10 to get on at the Harvard stop. 181 00:10:42,310 --> 00:10:45,370 And they have a remaining balance of $0.20. 182 00:10:45,370 --> 00:10:49,510 Alice will go, let's say, to Park Street, get off at Park Street, 183 00:10:49,510 --> 00:10:51,160 pay the nickel to leave. 184 00:10:51,160 --> 00:10:55,430 And now, they'll have a balance of $0.15 at the end. 185 00:10:55,430 --> 00:10:56,620 Let's go to Bob. 186 00:10:56,620 --> 00:10:59,260 Bob enters the Alewife station. 187 00:10:59,260 --> 00:11:00,940 They pay $0.10. 188 00:11:00,940 --> 00:11:02,800 They have remaining balance of $0.30. 189 00:11:02,800 --> 00:11:08,337 And let's say they leave at Park Street and have a fare of $0.10 to leave 190 00:11:08,337 --> 00:11:09,670 because it's a further distance. 191 00:11:09,670 --> 00:11:14,490 Now, they'll have a running balance of $0.20 overall. 192 00:11:14,490 --> 00:11:18,840 So this table is OK, I might admit. 193 00:11:18,840 --> 00:11:23,100 I mean, last time, we learned about having what we called primary keys 194 00:11:23,100 --> 00:11:24,340 and foreign keys. 195 00:11:24,340 --> 00:11:26,220 So it seems like that's missing here. 196 00:11:26,220 --> 00:11:28,020 Let's go ahead and add that here. 197 00:11:28,020 --> 00:11:32,940 I'll give each row a unique ID, so I can know who entered, who exited, 198 00:11:32,940 --> 00:11:35,580 and so on, give that a unique ID here. 199 00:11:35,580 --> 00:11:41,950 But I might even say that this table could really be improved substantially. 200 00:11:41,950 --> 00:11:49,410 And I want to ask you what redundancies or inefficiencies do you see here? 201 00:11:49,410 --> 00:11:52,680 If we're trying to represent riders and stations, 202 00:11:52,680 --> 00:11:56,990 what can we improve about this design? 203 00:11:56,990 --> 00:12:01,740 AUDIENCE: So probably the redundancy will be the names and the stations, 204 00:12:01,740 --> 00:12:02,240 too. 205 00:12:02,240 --> 00:12:06,760 For example, if Charlie will go to the train daily, 206 00:12:06,760 --> 00:12:10,875 then he will become most of the names in the data. 207 00:12:10,875 --> 00:12:12,250 CARTER ZENKE: Yeah, a good point. 208 00:12:12,250 --> 00:12:13,450 If I'm hearing what you're saying, [? Loren, ?] 209 00:12:13,450 --> 00:12:16,420 let me show you some examples that I highlighted here. 210 00:12:16,420 --> 00:12:20,290 One example could be, to your point, about these names. 211 00:12:20,290 --> 00:12:24,370 These names seem to be telling us the name of a person. 212 00:12:24,370 --> 00:12:27,640 But here, we have only three names-- 213 00:12:27,640 --> 00:12:29,680 Charlie, Alice, and Bob. 214 00:12:29,680 --> 00:12:32,170 Well, my name is Carter. 215 00:12:32,170 --> 00:12:35,110 And what if somebody else named Carter also 216 00:12:35,110 --> 00:12:37,810 tried to get on and leave at some stop? 217 00:12:37,810 --> 00:12:39,820 Well, I wouldn't be able to know which Carter 218 00:12:39,820 --> 00:12:44,210 was which or which Charlie was which, which Alice was which, and so on. 219 00:12:44,210 --> 00:12:47,920 So we probably need a way to represent people and their names a little better 220 00:12:47,920 --> 00:12:50,220 here, too. 221 00:12:50,220 --> 00:12:55,870 What other ideas do we have for how to improve the design of this table? 222 00:12:55,870 --> 00:13:00,330 AUDIENCE: Yes, I think we can have a singular ID for a singular person. 223 00:13:00,330 --> 00:13:02,998 That way, we'll be better able to track their activities. 224 00:13:02,998 --> 00:13:03,790 CARTER ZENKE: Nice. 225 00:13:03,790 --> 00:13:06,555 So we probably have an ID for each person, 226 00:13:06,555 --> 00:13:10,830 a bit what we learned about last week, putting people in their own table 227 00:13:10,830 --> 00:13:14,890 and giving them their own unique ID, a primary key. 228 00:13:14,890 --> 00:13:16,380 Let's show that here. 229 00:13:16,380 --> 00:13:18,150 I'll go to some slides. 230 00:13:18,150 --> 00:13:21,345 And I'll pick out one that shows us just riders. 231 00:13:21,345 --> 00:13:23,220 So to your point, [? Soqanya, ?] we could try 232 00:13:23,220 --> 00:13:25,830 to have maybe a table for just riders. 233 00:13:25,830 --> 00:13:29,130 And maybe to simplify, this table has only two columns. 234 00:13:29,130 --> 00:13:32,850 It has a column for ID and a column for name. 235 00:13:32,850 --> 00:13:38,250 So here, we have Charlie, Alice, and Bob all in their own table. 236 00:13:38,250 --> 00:13:42,270 Well, let me propose to you, we could do the same thing for stations. 237 00:13:42,270 --> 00:13:44,760 Let's say we have a table of stations now. 238 00:13:44,760 --> 00:13:48,030 And we give each one their very own ID as well, 239 00:13:48,030 --> 00:13:50,400 our own primary key for this table. 240 00:13:50,400 --> 00:13:52,540 We have Harvard, Kendall, and Park Street. 241 00:13:52,540 --> 00:13:57,130 We can differentiate between them using their IDs here. 242 00:13:57,130 --> 00:14:00,000 So a few improvements could be made. 243 00:14:00,000 --> 00:14:04,560 And as we're making these improvements, splitting one table into many, 244 00:14:04,560 --> 00:14:08,010 and debating what kind of data to store in each table, 245 00:14:08,010 --> 00:14:12,180 the process we're going through is one called normalizing. 246 00:14:12,180 --> 00:14:14,640 We're normalizing our data. 247 00:14:14,640 --> 00:14:18,210 To normalize means to reduce redundancies, effectively, 248 00:14:18,210 --> 00:14:19,530 to take a table-- 249 00:14:19,530 --> 00:14:22,230 take one table, for instance, split up into multiple, 250 00:14:22,230 --> 00:14:26,550 and have each entity be part of its very own table. 251 00:14:26,550 --> 00:14:31,590 Some academics in the world have named different normal forms, quote unquote. 252 00:14:31,590 --> 00:14:36,120 There's like first normal form, second normal form, third normal form, 253 00:14:36,120 --> 00:14:39,420 this progression of making your day more and more efficient. 254 00:14:39,420 --> 00:14:41,160 You can use those as heuristics. 255 00:14:41,160 --> 00:14:43,770 But end of the day, a few principles might apply. 256 00:14:43,770 --> 00:14:48,750 First, take your entities, like in this case stations and riders, 257 00:14:48,750 --> 00:14:51,840 and put them each in their own table. 258 00:14:51,840 --> 00:14:56,760 And if you add more data, make sure that if I were to add a column, let's say, 259 00:14:56,760 --> 00:15:00,780 to riders, it is only a data point about riders-- 260 00:15:00,780 --> 00:15:05,190 not about stations, not about fares, only about riders. 261 00:15:05,190 --> 00:15:08,220 And that process can help us make a data set that 262 00:15:08,220 --> 00:15:14,050 is more dynamic, more easy to reproduce, and more easy to write queries on. 263 00:15:14,050 --> 00:15:18,090 So that is the process here of normalizing. 264 00:15:18,090 --> 00:15:18,960 OK. 265 00:15:18,960 --> 00:15:25,830 So if we have now riders and stations, we want to represent them in our table. 266 00:15:25,830 --> 00:15:28,050 Well, we could use what we learned about relating 267 00:15:28,050 --> 00:15:30,960 last week to ask, how could we actually represent 268 00:15:30,960 --> 00:15:33,790 these riders and these stations? 269 00:15:33,790 --> 00:15:36,640 So let's say here I can have riders and stations. 270 00:15:36,640 --> 00:15:40,330 I want to make sure that I have the right relationship between them. 271 00:15:40,330 --> 00:15:42,360 Well, if you're familiar with subways, we 272 00:15:42,360 --> 00:15:45,870 might say that a rider goes to one station. 273 00:15:45,870 --> 00:15:50,070 And this big T here is the symbol for a station here in Boston, 274 00:15:50,070 --> 00:15:52,890 for the T's that we call it, for the subway. 275 00:15:52,890 --> 00:15:56,010 So a rider might go to one station. 276 00:15:56,010 --> 00:15:58,530 But of course, that might not be the full picture. 277 00:15:58,530 --> 00:16:01,660 A rider also gets off at some station on. 278 00:16:01,660 --> 00:16:06,990 So a rider could be associated with not just one station, but multiple. 279 00:16:06,990 --> 00:16:10,380 And if you're familiar, at least with any subway system or the Boston one, 280 00:16:10,380 --> 00:16:12,660 too, it can often get pretty busy. 281 00:16:12,660 --> 00:16:16,890 And so riders might not just go to, of course, one station or two. 282 00:16:16,890 --> 00:16:22,750 Stations could also have multiple riders that are on a particular station here. 283 00:16:22,750 --> 00:16:28,290 So to recap, one rider might be associated with more than one station. 284 00:16:28,290 --> 00:16:33,030 They might get on at this first one and get off at this later one. 285 00:16:33,030 --> 00:16:37,350 But each station could presumably have more than one rider. 286 00:16:37,350 --> 00:16:41,130 Each station here could have rider A or rider B, the rider 287 00:16:41,130 --> 00:16:47,050 up here or the rider down below, and even many more than that as well. 288 00:16:47,050 --> 00:16:50,550 So to put it in the language of our ER diagrams, 289 00:16:50,550 --> 00:16:53,160 our entity relation diagrams from last week, 290 00:16:53,160 --> 00:16:57,840 we could look at it bit like this, where we have riders and stations. 291 00:16:57,840 --> 00:17:02,550 Riders visit stations, and they're associated like this. 292 00:17:02,550 --> 00:17:07,540 A rider must have at least one station associated with them. 293 00:17:07,540 --> 00:17:09,270 That's what this horizontal line means. 294 00:17:09,270 --> 00:17:13,140 If they aren't at a station, they aren't really a rider, right? 295 00:17:13,140 --> 00:17:16,960 A rider though could have many stations associated with them. 296 00:17:16,960 --> 00:17:19,470 That's what this three prongs down here means. 297 00:17:19,470 --> 00:17:22,440 They could have one, two, three, four, they 298 00:17:22,440 --> 00:17:26,290 could have many stations they get on and get off of. 299 00:17:26,290 --> 00:17:31,030 Now a station could have anywhere between zero riders, 300 00:17:31,030 --> 00:17:35,290 if it's maybe out of commission or isn't very popular, upwards to many. 301 00:17:35,290 --> 00:17:39,730 It could have two, three, four, five, even hundreds of riders associated 302 00:17:39,730 --> 00:17:42,070 with this particular station. 303 00:17:42,070 --> 00:17:47,350 So here is our entity relation diagram for these particular riders 304 00:17:47,350 --> 00:17:50,030 and these stations here. 305 00:17:50,030 --> 00:17:55,660 So let me ask, what questions do we have on these relationships between riders 306 00:17:55,660 --> 00:17:59,700 and stations and how to design this table so far? 307 00:17:59,700 --> 00:18:01,450 AUDIENCE: Then I want to ask that you have 308 00:18:01,450 --> 00:18:06,370 used the same ID for stations and riders so that maybe 309 00:18:06,370 --> 00:18:08,895 give us a problem in coding? 310 00:18:08,895 --> 00:18:10,520 CARTER ZENKE: Yeah, a good observation. 311 00:18:10,520 --> 00:18:14,950 So you might have noticed that in the riders table and in the stations table, 312 00:18:14,950 --> 00:18:17,140 I gave the same kind of ID. 313 00:18:17,140 --> 00:18:19,602 Like I had one, two, three for each of them. 314 00:18:19,602 --> 00:18:21,310 And let me just show you that again here. 315 00:18:21,310 --> 00:18:24,950 I'll come back to some slides, and I'll show you again, 316 00:18:24,950 --> 00:18:30,730 the riders table where we had Charlie, Alice, and Bob, ID 1, 2, 3. 317 00:18:30,730 --> 00:18:31,870 Same for the stations. 318 00:18:31,870 --> 00:18:36,730 We had stations Harvard, Kendall, Park Street, ID 1, 2, 3. 319 00:18:36,730 --> 00:18:39,950 And to your question, isn't that a problem? 320 00:18:39,950 --> 00:18:43,540 Well, I would argue in this case, it's not, 321 00:18:43,540 --> 00:18:48,460 so long as we keep clear that these IDs are for stations 322 00:18:48,460 --> 00:18:50,800 and these IDs are for riders. 323 00:18:50,800 --> 00:18:54,550 And we'll see how to do that using our SQL keywords later on. 324 00:18:54,550 --> 00:18:59,500 But again, so long as we have an ID just for our riders and an ID 325 00:18:59,500 --> 00:19:02,770 just for our stations, we can keep these separate 326 00:19:02,770 --> 00:19:05,560 even if they might have the same values. 327 00:19:05,560 --> 00:19:07,990 But a great question here. 328 00:19:07,990 --> 00:19:10,440 Let's take just one more. 329 00:19:10,440 --> 00:19:12,870 AUDIENCE: Regarding the entity relationship diagram, 330 00:19:12,870 --> 00:19:17,340 how is it possible for a station to have a possibility of zero riders, 331 00:19:17,340 --> 00:19:21,303 but riders must compulsorily have at least one station. 332 00:19:21,303 --> 00:19:22,720 CARTER ZENKE: Yeah, good question. 333 00:19:22,720 --> 00:19:25,470 So this might be up to you and how you formulate it, 334 00:19:25,470 --> 00:19:28,380 but for me, let me show that diagram again. 335 00:19:28,380 --> 00:19:32,280 I'll go back to over here. 336 00:19:32,280 --> 00:19:36,540 In my mind, to be a rider, you have to visit a station. 337 00:19:36,540 --> 00:19:39,780 If you aren't visiting a station, you aren't really a rider, right? 338 00:19:39,780 --> 00:19:43,740 Now presumably, there are stations that were built but aren't really being 339 00:19:43,740 --> 00:19:46,560 used right now or aren't really in service yet. 340 00:19:46,560 --> 00:19:49,230 That could be a station that has no visitors. 341 00:19:49,230 --> 00:19:50,910 So you could argue-- 342 00:19:50,910 --> 00:19:54,090 let's make sure every station has at least one rider 343 00:19:54,090 --> 00:19:56,910 and every rider may or may not have to visit a station. 344 00:19:56,910 --> 00:19:59,640 For that I would say, we could probably reasonably 345 00:19:59,640 --> 00:20:03,300 disagree there and talk about how we could represent the diagram here, too. 346 00:20:03,300 --> 00:20:09,940 But a great observation and a good other critique of this system here. 347 00:20:09,940 --> 00:20:11,150 All right. 348 00:20:11,150 --> 00:20:16,000 So let's now turn to representing this in our database. 349 00:20:16,000 --> 00:20:19,180 I'll go back to my computer and we'll learn 350 00:20:19,180 --> 00:20:23,440 about this new SQL keyword, SQL statement, 351 00:20:23,440 --> 00:20:26,200 this one called Create Table. 352 00:20:26,200 --> 00:20:31,400 Create table allows us to, as the name suggests, create a brand new table. 353 00:20:31,400 --> 00:20:34,450 So let's do just that in our new database 354 00:20:34,450 --> 00:20:37,480 to represent riders and stations. 355 00:20:37,480 --> 00:20:42,820 I'll go into my terminal, and I want to make a brand new database. 356 00:20:42,820 --> 00:20:49,120 I'll call this one mbta.db, because mbta stands for the Massachusetts Bay 357 00:20:49,120 --> 00:20:52,630 Transportation Authority, the people who run the subway, essentially. 358 00:20:52,630 --> 00:20:57,610 So I'll do sqlite3, mbta.db, hit Enter, and I'll 359 00:20:57,610 --> 00:21:04,220 type Y to say yes, I want to create this brand new database. 360 00:21:04,220 --> 00:21:09,010 Now if I type dot schema, I see nothing. 361 00:21:09,010 --> 00:21:10,150 But that's expected. 362 00:21:10,150 --> 00:21:11,530 I don't have any tables yet. 363 00:21:11,530 --> 00:21:13,870 I have nothing inside this database. 364 00:21:13,870 --> 00:21:17,350 It's up to me to create these tables myself. 365 00:21:17,350 --> 00:21:22,240 So what I'll do is clear my terminal, and let's start first with riders. 366 00:21:22,240 --> 00:21:23,890 I might create a table for riders. 367 00:21:23,890 --> 00:21:26,440 I'll say, create table. 368 00:21:26,440 --> 00:21:29,500 And now I have to give that table some name. 369 00:21:29,500 --> 00:21:31,960 I might call it riders here. 370 00:21:31,960 --> 00:21:36,310 And then in parentheses, like this, I can specify what 371 00:21:36,310 --> 00:21:39,500 columns should be part of this table. 372 00:21:39,500 --> 00:21:40,540 So let's start first. 373 00:21:40,540 --> 00:21:44,320 I'll hit Enter here and continue this query. 374 00:21:44,320 --> 00:21:47,770 Now, all by convention, I'll just indent four spaces-- 375 00:21:47,770 --> 00:21:52,760 one, two, three, four, and I'll give an ID to each of these riders 376 00:21:52,760 --> 00:21:54,430 as we saw before. 377 00:21:54,430 --> 00:21:58,180 I'll say ID here is one of my columns. 378 00:21:58,180 --> 00:22:04,300 Now to create a new column, I'll follow this up with a comma and hit Enter. 379 00:22:04,300 --> 00:22:08,350 I'll again, by convention, for style, just indent four spaces. 380 00:22:08,350 --> 00:22:09,730 And what's my next column? 381 00:22:09,730 --> 00:22:12,160 Perhaps a name for these riders. 382 00:22:12,160 --> 00:22:17,530 I'll give this column the name, name, and I'll leave it at that. 383 00:22:17,530 --> 00:22:22,660 Once I'm done adding columns, I no longer need to have a comma. 384 00:22:22,660 --> 00:22:26,320 I could simply close out this query, this statement. 385 00:22:26,320 --> 00:22:29,740 I could hit Enter here, say close in parentheses 386 00:22:29,740 --> 00:22:33,400 to close the top parentheses here, semicolon, hit Enter. 387 00:22:33,400 --> 00:22:36,730 And now nothing seems to happen. 388 00:22:36,730 --> 00:22:38,330 But that's actually a good sign. 389 00:22:38,330 --> 00:22:41,290 So let me type dot schema, hit Enter, and I'll 390 00:22:41,290 --> 00:22:44,650 see the result of that statement before. 391 00:22:44,650 --> 00:22:48,820 Create table, if it doesn't already exist, riders, and riders 392 00:22:48,820 --> 00:22:53,590 has inside of it two columns, ID and name. 393 00:22:53,590 --> 00:22:54,740 OK, let's keep going. 394 00:22:54,740 --> 00:22:56,800 Let's make one for stations, too. 395 00:22:56,800 --> 00:22:59,380 I'll clear my terminal and I'll say, create me 396 00:22:59,380 --> 00:23:02,740 a table called stations and include-- 397 00:23:02,740 --> 00:23:04,330 actually, not station. 398 00:23:04,330 --> 00:23:09,140 If you ever want to, let's say fix this kind of table here, 399 00:23:09,140 --> 00:23:14,500 let me try closing the parentheses, hit semicolon, Enter. 400 00:23:14,500 --> 00:23:15,850 I'll get a syntax error. 401 00:23:15,850 --> 00:23:17,050 I can restart. 402 00:23:17,050 --> 00:23:22,180 I'll do Control L. Now I'll do create table, stations, plural. 403 00:23:22,180 --> 00:23:24,250 Open parentheses, Enter. 404 00:23:24,250 --> 00:23:25,870 I'll indent by four spaces. 405 00:23:25,870 --> 00:23:27,430 One, two, three, four. 406 00:23:27,430 --> 00:23:31,870 And now, I'll similarly include an ID for each of these stations. 407 00:23:31,870 --> 00:23:36,220 I'll say ID comma, and then what all should I have? 408 00:23:36,220 --> 00:23:40,840 Well, stations tend to have a name, like the Kendall MIT station, 409 00:23:40,840 --> 00:23:43,480 the Harvard station, the Park Street Station. 410 00:23:43,480 --> 00:23:48,760 So I'll say, I'll give each of these their very own name, comma. 411 00:23:48,760 --> 00:23:50,770 What else do stations have? 412 00:23:50,770 --> 00:23:54,220 Well, they might also have a line that they're on. 413 00:23:54,220 --> 00:23:58,700 Let's say it's the red line, or the blue line, or the green line and so on. 414 00:23:58,700 --> 00:24:03,720 I'll have a space for them to write in their line that they're a part of. 415 00:24:03,720 --> 00:24:06,160 OK, and maybe I'll leave it at that to keep it simple. 416 00:24:06,160 --> 00:24:09,490 I'll say, stations have an ID, a name and a line. 417 00:24:09,490 --> 00:24:10,750 Now I'll close this out. 418 00:24:10,750 --> 00:24:15,430 I'll say end parentheses, semicolon, hit Enter, and nothing seems to happen. 419 00:24:15,430 --> 00:24:20,770 But if I type dot schema, I'll now see my riders and my stations 420 00:24:20,770 --> 00:24:24,870 tables inside of my database. 421 00:24:24,870 --> 00:24:26,760 Now one more step here. 422 00:24:26,760 --> 00:24:31,920 We have riders, we have stations, but we have to relate to them. 423 00:24:31,920 --> 00:24:34,800 We have to relate them using this many to many relationships, 424 00:24:34,800 --> 00:24:37,420 as we saw last week. 425 00:24:37,420 --> 00:24:40,950 So let me try making a table to implement 426 00:24:40,950 --> 00:24:43,110 this many to many relationship. 427 00:24:43,110 --> 00:24:47,400 And if you remember, we might call this kind of table a junction table, 428 00:24:47,400 --> 00:24:50,040 an associative entity, a join table. 429 00:24:50,040 --> 00:24:53,730 It has a lot of names, but it looks a bit like this. 430 00:24:53,730 --> 00:24:58,710 I'll create this new table to represent, let's say, visits. 431 00:24:58,710 --> 00:25:01,110 A rider visits a station. 432 00:25:01,110 --> 00:25:03,780 So I'll call this table visits. 433 00:25:03,780 --> 00:25:09,840 And inside, I'll make sure it has two columns, one for a rider 434 00:25:09,840 --> 00:25:17,130 ID to represent a rider, and one, let's say, for a station ID, 435 00:25:17,130 --> 00:25:19,020 to represent a station. 436 00:25:19,020 --> 00:25:24,630 Now when I see a rider ID next to a station ID in this table, 437 00:25:24,630 --> 00:25:27,870 I'll know the rider with that certain ID visited 438 00:25:27,870 --> 00:25:30,180 the station with that certain ID. 439 00:25:30,180 --> 00:25:31,500 So I'll close this out. 440 00:25:31,500 --> 00:25:36,330 I'll say end, parentheses here, semicolon, Enter, and finally, clear 441 00:25:36,330 --> 00:25:38,940 my terminal, type dot schema. 442 00:25:38,940 --> 00:25:45,000 And I can see, I have riders, stations, and visits between riders and stations 443 00:25:45,000 --> 00:25:48,630 in this associative entity, this junction table or a joined table. 444 00:25:48,630 --> 00:25:51,990 Up to you what you might want to call it in this case. 445 00:25:51,990 --> 00:25:55,840 Now what questions do we have? 446 00:25:55,840 --> 00:25:58,480 AUDIENCE: Why we have not use the primary key 447 00:25:58,480 --> 00:26:00,563 and secondary key in this table? 448 00:26:00,563 --> 00:26:01,730 CARTER ZENKE: Good question. 449 00:26:01,730 --> 00:26:03,605 So we're going to get there in just a minute. 450 00:26:03,605 --> 00:26:07,120 But if I look back at my terminal here, my schema, I'll see, 451 00:26:07,120 --> 00:26:09,340 I really just have column names. 452 00:26:09,340 --> 00:26:13,840 And we saw before and we typed dot schema on our longlist.db, 453 00:26:13,840 --> 00:26:16,150 we had more than just column names. 454 00:26:16,150 --> 00:26:19,400 We had column names, we had perhaps data types, 455 00:26:19,400 --> 00:26:21,807 we had primary keys and foreign keys. 456 00:26:21,807 --> 00:26:23,390 So we'll get to that in just a minute. 457 00:26:23,390 --> 00:26:28,300 But suffice to say for now, we're going to keep improving this over time. 458 00:26:28,300 --> 00:26:30,580 Let's take one more. 459 00:26:30,580 --> 00:26:33,300 AUDIENCE: Is it required to put spaces-- 460 00:26:33,300 --> 00:26:37,422 the four spaces indents, or that's just for the visual look? 461 00:26:37,422 --> 00:26:38,880 CARTER ZENKE: Yeah, great question. 462 00:26:38,880 --> 00:26:43,260 Is it required to have these four spaces before each column name. 463 00:26:43,260 --> 00:26:46,720 And in fact, no, it's not, but it makes the code more readable. 464 00:26:46,720 --> 00:26:49,590 So I could put this all in one line-- 465 00:26:49,590 --> 00:26:51,270 I shouldn't, but I could. 466 00:26:51,270 --> 00:26:55,620 And if I have instead this new line followed by four spaces, 467 00:26:55,620 --> 00:27:00,390 I can make this more readable for myself and for my colleagues, too. 468 00:27:00,390 --> 00:27:02,060 Good question. 469 00:27:02,060 --> 00:27:07,430 OK, so to our earlier point, there are things 470 00:27:07,430 --> 00:27:09,680 that are missing from this schema. 471 00:27:09,680 --> 00:27:13,800 Like, we have column names, but as we saw before, 472 00:27:13,800 --> 00:27:16,550 we should ideally specify what kind of data 473 00:27:16,550 --> 00:27:20,870 should be able to go into each of these columns. 474 00:27:20,870 --> 00:27:24,990 And for that, we'll need some new ideas to talk about here. 475 00:27:24,990 --> 00:27:30,950 So let's focus now on this idea of data types and storage classes. 476 00:27:30,950 --> 00:27:36,680 Data types and storage classes are two very similar but distinct IDs, 477 00:27:36,680 --> 00:27:40,230 and they're distinct in a way we'll talk about in just a minute. 478 00:27:40,230 --> 00:27:45,770 Now SQLite has five storage classes, five kind of storage-- 479 00:27:45,770 --> 00:27:49,520 kind of type, so to speak, of values that can hold. 480 00:27:49,520 --> 00:27:51,920 So let's talk about the first one, null, for instance. 481 00:27:51,920 --> 00:27:54,230 Null in this case means nothing. 482 00:27:54,230 --> 00:27:57,170 There's nothing that actually is inside of this value. 483 00:27:57,170 --> 00:27:59,990 It's kind of a central value to mean nothing is here. 484 00:27:59,990 --> 00:28:04,700 Integer means a whole number, like 1, 2, 3, 4, or 5. 485 00:28:04,700 --> 00:28:10,880 Real talks about decimals like floating points like 1.2 or 2.4 and so on. 486 00:28:10,880 --> 00:28:12,740 Text is used for characters. 487 00:28:12,740 --> 00:28:19,740 And blob, kind of a funny one, blob stands for Binary Large Object, 488 00:28:19,740 --> 00:28:24,570 and it represents the data exactly as I give it to this value. 489 00:28:24,570 --> 00:28:30,800 If I tell it to store 101010, it will store exactly 101010 in binary. 490 00:28:30,800 --> 00:28:34,220 So useful for storing, in this case like images and video files 491 00:28:34,220 --> 00:28:37,160 and audio files, things that have some structure 492 00:28:37,160 --> 00:28:40,030 we don't want to mess around with. 493 00:28:40,030 --> 00:28:44,290 Now let's focus on this idea of a storage class. 494 00:28:44,290 --> 00:28:50,830 These, I'll say, are storage classes and not data types in SQLite. 495 00:28:50,830 --> 00:28:58,390 Now a storage class like integer can comprise, can hold several data types. 496 00:28:58,390 --> 00:29:01,480 Notice how there are seven different data 497 00:29:01,480 --> 00:29:05,380 types that could be stored under this integer storage class. 498 00:29:05,380 --> 00:29:10,960 We have a 6-byte integer, 2-byte integer, a 8 and 0 and so on. 499 00:29:10,960 --> 00:29:13,750 It could be any of these particular types, 500 00:29:13,750 --> 00:29:19,330 but each of these types under the umbrella of this integer storage class. 501 00:29:19,330 --> 00:29:23,740 And SQLite itself will take care of making sure 502 00:29:23,740 --> 00:29:25,750 it uses the appropriate data type. 503 00:29:25,750 --> 00:29:30,910 Like if I give a very large number, like let's say 4 billion or 6 billion 504 00:29:30,910 --> 00:29:35,200 or even bigger than that, it will probably use a longer-- 505 00:29:35,200 --> 00:29:39,010 that is, a bigger byte integer to store that kind of value. 506 00:29:39,010 --> 00:29:42,610 If I give it a smaller one, like 1, 2, 3, or 4, 507 00:29:42,610 --> 00:29:47,140 it will probably use a 1-byte or a 2-byte integer for that. 508 00:29:47,140 --> 00:29:50,560 But SQLite's idea is that I, as a programmer, 509 00:29:50,560 --> 00:29:54,910 shouldn't have to care if I use an 8-byte or a 1-byte or a 2-byte integer, 510 00:29:54,910 --> 00:29:58,180 I just care that I'm using integers, whole numbers, 511 00:29:58,180 --> 00:30:03,670 and they give me a storage class to use any of these up to their choice 512 00:30:03,670 --> 00:30:05,960 here as well. 513 00:30:05,960 --> 00:30:10,930 Now let's look at a few examples of values in SQLite that we could store. 514 00:30:10,930 --> 00:30:14,590 Well, we have perhaps the red line as some text. 515 00:30:14,590 --> 00:30:16,690 And because this is characters, it's quoted, 516 00:30:16,690 --> 00:30:22,330 we could use the text storage class to represent this particular value here. 517 00:30:22,330 --> 00:30:25,630 We could have maybe an image, and to the earlier point, 518 00:30:25,630 --> 00:30:29,350 we could say, well, this image might be best represented 519 00:30:29,350 --> 00:30:33,490 using a blob, a binary large object, to keep all of these pixels 520 00:30:33,490 --> 00:30:36,790 exactly as they are in this image. 521 00:30:36,790 --> 00:30:40,330 But we do get some choice, some interesting design 522 00:30:40,330 --> 00:30:43,720 challenges when we look at the idea of fares. 523 00:30:43,720 --> 00:30:50,230 So let's say to our point earlier, fares are $0.10 back in the 1950s or so. 524 00:30:50,230 --> 00:30:55,960 Well, $0.10 we could store as an integer, which seems just fine. 525 00:30:55,960 --> 00:30:58,480 But this could get confused. 526 00:30:58,480 --> 00:31:00,790 I'm talking about dollars here or cents? 527 00:31:00,790 --> 00:31:03,205 Maybe it would be better, let's say, if I did this. 528 00:31:03,205 --> 00:31:07,610 A dollar sign, 0.10. 529 00:31:07,610 --> 00:31:09,680 And what might that be stored as? 530 00:31:09,680 --> 00:31:11,630 Well, probably text, right? 531 00:31:11,630 --> 00:31:14,840 I could say this dollar sign isn't really a number, 532 00:31:14,840 --> 00:31:18,140 but now I have to include it, so I'll say this will be quoted, essentially, 533 00:31:18,140 --> 00:31:22,358 as dollar sign 0.10. 534 00:31:22,358 --> 00:31:23,900 Now there's some downsides here, too. 535 00:31:23,900 --> 00:31:25,550 Like let's say I have-- 536 00:31:25,550 --> 00:31:26,840 I want to add these up. 537 00:31:26,840 --> 00:31:28,580 Well, I can't add up text. 538 00:31:28,580 --> 00:31:33,140 Like, what does it mean to say dollar sign 0.10 plus dollar sign 0.20. 539 00:31:33,140 --> 00:31:35,630 I can't do math with text. 540 00:31:35,630 --> 00:31:41,090 So maybe it'll be better after all if I used a real or a decimal like this, 541 00:31:41,090 --> 00:31:43,970 0.10. 542 00:31:43,970 --> 00:31:46,880 But I mean even here, you'll run into some problems. 543 00:31:46,880 --> 00:31:50,780 If you are familiar with how the number is represented in binary, 544 00:31:50,780 --> 00:31:54,380 you might know that decimal values or floating point values 545 00:31:54,380 --> 00:31:57,020 can't be perfectly, precisely represented. 546 00:31:57,020 --> 00:32:04,850 And if I talk about 0.10, the computer might store 0.10000056789. 547 00:32:04,850 --> 00:32:09,450 It could get very wonky out to the many, many decimal digits down below here. 548 00:32:09,450 --> 00:32:13,460 So trade offs and challenges overall. 549 00:32:13,460 --> 00:32:15,080 Let's look at these three, though. 550 00:32:15,080 --> 00:32:17,480 I have the first one to store as an integer. 551 00:32:17,480 --> 00:32:23,000 I'm trying to store fares here, second one as text, and the third one 552 00:32:23,000 --> 00:32:27,320 as a floating point or a real in this case. 553 00:32:27,320 --> 00:32:30,350 Let me ask for some opinions here, which one 554 00:32:30,350 --> 00:32:36,297 would you use and why, for trying represent fares in this case? 555 00:32:36,297 --> 00:32:37,130 AUDIENCE: Thank you. 556 00:32:37,130 --> 00:32:41,700 I prefer using integers because of course, 557 00:32:41,700 --> 00:32:44,690 I need to get the calculation very accurately. 558 00:32:44,690 --> 00:32:45,830 That's my point of view. 559 00:32:45,830 --> 00:32:50,030 Well, sometimes I can use float but, you know, 560 00:32:50,030 --> 00:32:52,310 like you said before, it can get very wonky 561 00:32:52,310 --> 00:32:59,815 if I really need that kind of precision, I don't really recommend using floats. 562 00:32:59,815 --> 00:33:01,190 CARTER ZENKE: Yeah, a good point. 563 00:33:01,190 --> 00:33:04,820 So if I go back to some slides here, you might argue for the integer 564 00:33:04,820 --> 00:33:07,940 because you know you can precisely represent integers. 565 00:33:07,940 --> 00:33:12,290 And let's say I want to add up fares over a lot, a lot of riders. 566 00:33:12,290 --> 00:33:15,020 This might be useful for me because I know that each number will 567 00:33:15,020 --> 00:33:16,820 be perfectly, precisely represented. 568 00:33:16,820 --> 00:33:20,720 I can do lots of big kind of math with this number here. 569 00:33:20,720 --> 00:33:24,260 To your point, this decimal might kind of, as you said, 570 00:33:24,260 --> 00:33:26,840 get wonky later on towards the later decimal points. 571 00:33:26,840 --> 00:33:30,980 I might get some unexpected results if we add up these overall. 572 00:33:30,980 --> 00:33:36,380 Let me ask, though, are there any proponents of this floating point 573 00:33:36,380 --> 00:33:40,120 value or a real value? 574 00:33:40,120 --> 00:33:42,930 AUDIENCE: So I think the-- 575 00:33:42,930 --> 00:33:48,810 I think the float is the number of, for example, 576 00:33:48,810 --> 00:33:56,190 for each pair, like the answers like truncation 577 00:33:56,190 --> 00:33:58,875 probably suggest by the comments there. 578 00:33:58,875 --> 00:34:00,250 CARTER ZENKE: Yeah, a good point. 579 00:34:00,250 --> 00:34:02,083 So if you talk about using this float value, 580 00:34:02,083 --> 00:34:06,491 I mean, one thing we could say for it is that this decimal could be-- 581 00:34:06,491 --> 00:34:09,449 it's more accurate to say, like, where you're working with dollars now, 582 00:34:09,449 --> 00:34:13,259 and we could have maybe $0.10, which is only 0.1 of a dollar. 583 00:34:13,259 --> 00:34:15,030 I totally hear that point as well. 584 00:34:15,030 --> 00:34:17,280 And the point we're making here is that they're really 585 00:34:17,280 --> 00:34:20,580 just trade offs among these data storage classes 586 00:34:20,580 --> 00:34:23,699 to use, whether you're using integers or real values, 587 00:34:23,699 --> 00:34:27,730 it just depends on your use case and what you're designing for. 588 00:34:27,730 --> 00:34:30,900 So be sure to read up on trade offs among these data types, 589 00:34:30,900 --> 00:34:35,420 and determine for yourself which one should you best use. 590 00:34:35,420 --> 00:34:36,409 OK. 591 00:34:36,409 --> 00:34:41,090 So we have now these storage classes to store values, 592 00:34:41,090 --> 00:34:47,210 and it turns out that columns can be made to store certain storage classes 593 00:34:47,210 --> 00:34:49,460 or prioritize certain classes. 594 00:34:49,460 --> 00:34:53,750 And the key distinction here is that columns in SQLite 595 00:34:53,750 --> 00:34:57,440 don't always store one particular type. 596 00:34:57,440 --> 00:35:01,250 Instead, they have type affinities, meaning 597 00:35:01,250 --> 00:35:07,280 that they'll try to convert some value you insert into a given cell or given 598 00:35:07,280 --> 00:35:11,660 row to the type they have the affinity for. 599 00:35:11,660 --> 00:35:16,490 Now there are, let's say five type affinities in SQLite-- 600 00:35:16,490 --> 00:35:17,270 text. 601 00:35:17,270 --> 00:35:19,610 Columns can be of the type affinity text, 602 00:35:19,610 --> 00:35:22,520 meaning they store just characters at the end of the day. 603 00:35:22,520 --> 00:35:27,950 There's also numeric, which stores either integer values or real values, 604 00:35:27,950 --> 00:35:31,010 depending on which one seems best to convert to. 605 00:35:31,010 --> 00:35:33,980 You have integer type affinity, which means it can store whole numbers. 606 00:35:33,980 --> 00:35:36,990 Real, to store floating points or decimal values. 607 00:35:36,990 --> 00:35:39,120 And we have blob here again, our old friend 608 00:35:39,120 --> 00:35:42,030 to store binary exactly as we get it. 609 00:35:42,030 --> 00:35:44,750 So whereas before we were talking about storage classes-- 610 00:35:44,750 --> 00:35:47,090 those are associated with individual values-- 611 00:35:47,090 --> 00:35:51,210 type affinities are now associated with individual columns. 612 00:35:51,210 --> 00:35:55,460 So let's see some example of how this might work in SQLite. 613 00:35:55,460 --> 00:35:59,000 Let's say I have this table of fares, and we've 614 00:35:59,000 --> 00:36:02,720 decided to store fares as integers. 615 00:36:02,720 --> 00:36:09,470 Well, if I said that this column called amount has the affinity for the text 616 00:36:09,470 --> 00:36:14,300 storage class, what would happen is if I insert this integer, 10, 617 00:36:14,300 --> 00:36:16,490 it would look a bit like this later on. 618 00:36:16,490 --> 00:36:20,250 It would be converted to text, it would be quoted in a sense, 619 00:36:20,250 --> 00:36:24,240 to represent-- it's now been converted to some set of characters. 620 00:36:24,240 --> 00:36:27,050 Let's say I insert this value 25. 621 00:36:27,050 --> 00:36:31,250 Well, 25 has a storage class right now of an integer. 622 00:36:31,250 --> 00:36:33,560 It is a whole number. 623 00:36:33,560 --> 00:36:38,480 But if I insert this into a column that has the text affinity, 624 00:36:38,480 --> 00:36:44,250 it will be converted into, in this case, text at the end of the day. 625 00:36:44,250 --> 00:36:45,920 Let's do the opposite. 626 00:36:45,920 --> 00:36:51,270 Let's say I have my fare as a string, some text in this case. 627 00:36:51,270 --> 00:36:54,690 I want to insert it into this column called amount, 628 00:36:54,690 --> 00:36:58,820 but now amount has the integer type affinity. 629 00:36:58,820 --> 00:37:03,830 Well, if I insert 10, quote unquote, into the column amount, 630 00:37:03,830 --> 00:37:10,850 I'll get back not 10, the text, but 10 the integer, because again, amount-- 631 00:37:10,850 --> 00:37:16,310 this column here-- has an affinity for the integer storage class. 632 00:37:16,310 --> 00:37:17,210 Let's try this. 633 00:37:17,210 --> 00:37:21,650 25, some text again I'll insert it into this table. 634 00:37:21,650 --> 00:37:26,690 Now I'll have 25 as an integer. 635 00:37:26,690 --> 00:37:31,300 So this is how SQLite allows us to give certain columns 636 00:37:31,300 --> 00:37:33,610 an affinity for certain types, that they'll 637 00:37:33,610 --> 00:37:38,860 try to store values of that type, so long as we insert values it could be 638 00:37:38,860 --> 00:37:41,900 feasibly converted to that type here. 639 00:37:41,900 --> 00:37:45,820 So let's go back to our schema and try to improve it now, 640 00:37:45,820 --> 00:37:50,860 to use not just column names, but also type affinities to store 641 00:37:50,860 --> 00:37:53,140 certain data of a certain type. 642 00:37:53,140 --> 00:37:58,000 Go back to my computer here, and let's improve this once more. 643 00:37:58,000 --> 00:38:00,910 So I'll go over to my table. 644 00:38:00,910 --> 00:38:05,890 And now, I probably want to improve the design here. 645 00:38:05,890 --> 00:38:08,680 And often, if I want to improve this, I might just 646 00:38:08,680 --> 00:38:10,910 need to erase what I've already done. 647 00:38:10,910 --> 00:38:16,210 So let me introduce this new keyword, this new statement called Drop Table. 648 00:38:16,210 --> 00:38:19,460 To drop a table means to delete it, to remove it, effectively. 649 00:38:19,460 --> 00:38:23,080 So let me try doing this for riders, stations, and visits. 650 00:38:23,080 --> 00:38:28,840 I'll type drop table writers, semicolon, Enter. 651 00:38:28,840 --> 00:38:34,240 Nothing seems to happen, but if I type dot schema, well, riders is gone. 652 00:38:34,240 --> 00:38:40,960 I'll try drop table stations, then semicolon, hit Enter, and type, 653 00:38:40,960 --> 00:38:42,760 let's say dot schema again. 654 00:38:42,760 --> 00:38:44,320 No more stations. 655 00:38:44,320 --> 00:38:49,030 I'll try drop table visits, semicolon, Enter, 656 00:38:49,030 --> 00:38:52,510 and then dot schema, and our table-- 657 00:38:52,510 --> 00:38:54,160 our database is really gone. 658 00:38:54,160 --> 00:38:57,280 There are no more tables inside of it. 659 00:38:57,280 --> 00:39:02,370 So let me propose that instead of working inside of the SQLite 660 00:39:02,370 --> 00:39:05,730 prompt, like typing out again, and again, and again, create table riders, 661 00:39:05,730 --> 00:39:07,650 create table stations, create table visits, 662 00:39:07,650 --> 00:39:10,650 let me be more efficient about this and create myself 663 00:39:10,650 --> 00:39:14,790 a schema file that I could reuse throughout this lesson 664 00:39:14,790 --> 00:39:18,870 and also later on while I'm working on this database on my own. 665 00:39:18,870 --> 00:39:23,850 To do that, let me quit my SQLite prompt here, and let me 666 00:39:23,850 --> 00:39:27,300 type something like code schema dot sql. 667 00:39:27,300 --> 00:39:31,620 I'm just creating this file called schema.sql. 668 00:39:31,620 --> 00:39:36,060 Now a dot SQL file allows me to type in SQL keywords 669 00:39:36,060 --> 00:39:38,790 and have them be syntax highlighted, so I know 670 00:39:38,790 --> 00:39:42,040 what's going on inside of this file. 671 00:39:42,040 --> 00:39:44,130 So let's just try this once more. 672 00:39:44,130 --> 00:39:49,170 I'll type Create table riders, and inside I'll 673 00:39:49,170 --> 00:39:53,820 say it has the ID column of what type affinity? 674 00:39:53,820 --> 00:39:58,270 Well, IDs are whole numbers, so perhaps integer in this case. 675 00:39:58,270 --> 00:40:02,160 I could say, ID has the integer type affinity. 676 00:40:02,160 --> 00:40:05,010 Now let me say the riders also have a name, 677 00:40:05,010 --> 00:40:07,710 and how could names be best represented? 678 00:40:07,710 --> 00:40:09,210 Maybe text, right? 679 00:40:09,210 --> 00:40:10,120 Characters here. 680 00:40:10,120 --> 00:40:12,750 So I'll say name and text. 681 00:40:12,750 --> 00:40:14,850 Now I'll include a semicolon to say, this 682 00:40:14,850 --> 00:40:17,820 is the end of my create table statement. 683 00:40:17,820 --> 00:40:21,630 And before, remember how I had to kind of air out 684 00:40:21,630 --> 00:40:25,560 or I had to like, backspace and so on to improve the design? 685 00:40:25,560 --> 00:40:28,890 Here I can literally just point and click and edit this file 686 00:40:28,890 --> 00:40:30,630 to improve my schema. 687 00:40:30,630 --> 00:40:35,130 And I'll later on apply this in my database using 688 00:40:35,130 --> 00:40:37,418 a command that we'll see a bit later. 689 00:40:37,418 --> 00:40:38,460 So let's keep going here. 690 00:40:38,460 --> 00:40:44,040 I'll say create table stations, and inside of the stations table, 691 00:40:44,040 --> 00:40:48,330 I'll make sure it has an ID column of type integer, 692 00:40:48,330 --> 00:40:51,960 a name column that probably stores some text, 693 00:40:51,960 --> 00:40:57,270 and a line column that also stores some text, where name 694 00:40:57,270 --> 00:41:00,130 is the name of the station like Kendell MIT, Harvard, 695 00:41:00,130 --> 00:41:05,640 and line is blue line or green line, what line it's part of in our subway. 696 00:41:05,640 --> 00:41:07,290 Let me try now visits. 697 00:41:07,290 --> 00:41:08,850 I'll say, create table visits. 698 00:41:08,850 --> 00:41:13,620 And then I'll do rider ID, which has what type affinity? 699 00:41:13,620 --> 00:41:15,030 Probably integer. 700 00:41:15,030 --> 00:41:19,680 And then I'll do station ID, which has this same type of affinity. 701 00:41:19,680 --> 00:41:23,070 It's relying on integers, whole numbers here for IDs. 702 00:41:23,070 --> 00:41:25,450 So my colon to finish this statement. 703 00:41:25,450 --> 00:41:29,040 Now this is my schema as a whole. 704 00:41:29,040 --> 00:41:31,830 I have riders, stations, and visits. 705 00:41:31,830 --> 00:41:36,760 But now I want to apply this schema to my database. 706 00:41:36,760 --> 00:41:37,780 So what could I do? 707 00:41:37,780 --> 00:41:39,330 I could reopen it, let's say. 708 00:41:39,330 --> 00:41:43,510 I'll do sqlite3 mbta.db in my terminal. 709 00:41:43,510 --> 00:41:47,370 And now I want to read in this schema.sql file, 710 00:41:47,370 --> 00:41:51,340 just run the commands that are inside of this file. 711 00:41:51,340 --> 00:41:52,660 So what could I do? 712 00:41:52,660 --> 00:41:59,040 I could say dot read schema.sql, where dot read is a command that 713 00:41:59,040 --> 00:42:01,860 says take whatever file you give me, like schema.sql 714 00:42:01,860 --> 00:42:08,070 and read it into this database, running any SQL keywords you come across there. 715 00:42:08,070 --> 00:42:09,450 I'll hit Enter. 716 00:42:09,450 --> 00:42:11,640 And now nothing seems to happen. 717 00:42:11,640 --> 00:42:17,658 But if I type dot schema, I now see my schema improved. 718 00:42:17,658 --> 00:42:19,950 And this is helpful for me, because what I could do now 719 00:42:19,950 --> 00:42:23,580 is I could just edit my schema.sql file and rerun it and rerun 720 00:42:23,580 --> 00:42:29,770 it to make sure I now have these tables being improved over time. 721 00:42:29,770 --> 00:42:30,730 OK. 722 00:42:30,730 --> 00:42:37,720 So this, then, is our new representation of our database. 723 00:42:37,720 --> 00:42:42,280 We have riders, of course, their own entity, and stations. 724 00:42:42,280 --> 00:42:47,170 They have an ID and a name, and stations have an ID, a name, and a line. 725 00:42:47,170 --> 00:42:52,810 We've also now included these type affinities-- integer, text, integer, 726 00:42:52,810 --> 00:42:56,320 text, to tell SQL what kinds of storage classes 727 00:42:56,320 --> 00:43:00,490 could be put inside of each of these columns. 728 00:43:00,490 --> 00:43:03,580 Now before we keep improving this, let me ask, 729 00:43:03,580 --> 00:43:10,780 what questions do we have on these storage classes and type affinities? 730 00:43:10,780 --> 00:43:12,400 AUDIENCE: That you would use-- 731 00:43:12,400 --> 00:43:17,870 when you were creating the table, the table was not in line. 732 00:43:17,870 --> 00:43:21,340 So when we search for the authors of books, 733 00:43:21,340 --> 00:43:25,982 so it comes with a perfect table, so how can we make a perfect table in the SQL? 734 00:43:25,982 --> 00:43:27,940 CARTER ZENKE: Yeah, do you mind clarifying what 735 00:43:27,940 --> 00:43:30,160 you mean by like, the perfect table? 736 00:43:30,160 --> 00:43:33,760 AUDIENCE: So I mean that it was arranged in something 737 00:43:33,760 --> 00:43:36,443 like in boxes, block boxes. 738 00:43:36,443 --> 00:43:37,610 CARTER ZENKE: Good question. 739 00:43:37,610 --> 00:43:41,770 So before we able to see the results of our queries 740 00:43:41,770 --> 00:43:46,660 inside some boxes in our terminal, and that is actually a mode of SQLite, 741 00:43:46,660 --> 00:43:51,370 I think type like dot mode table to see your results in that version. 742 00:43:51,370 --> 00:43:54,937 Here though, we have no data inside of our tables, 743 00:43:54,937 --> 00:43:56,770 so I can't really select anything from them. 744 00:43:56,770 --> 00:44:01,300 Like, if I go to my terminal here and I try 745 00:44:01,300 --> 00:44:04,690 to select some data from this riders table, 746 00:44:04,690 --> 00:44:10,510 let me say select star from rider's, semicolon, I won't get anything back. 747 00:44:10,510 --> 00:44:15,040 Next week, though, we'll see how to insert, and add, and update, and delete 748 00:44:15,040 --> 00:44:17,990 data inside of these tables, at which point, 749 00:44:17,990 --> 00:44:21,460 you could write, select star from riders and see some data 750 00:44:21,460 --> 00:44:23,560 you've inserted yourself. 751 00:44:23,560 --> 00:44:25,120 Great question. 752 00:44:25,120 --> 00:44:27,580 Let's take one more here. 753 00:44:27,580 --> 00:44:28,360 AUDIENCE: Yes. 754 00:44:28,360 --> 00:44:33,130 I would like to know if you have a column of the type boolean. 755 00:44:33,130 --> 00:44:36,730 CARTER ZENKE: Yeah, do we have a Boolean type affinity, let's say. 756 00:44:36,730 --> 00:44:39,850 So here, we don't, at least in SQLite. 757 00:44:39,850 --> 00:44:45,610 Some other DBMS', Database Management Systems might have bool or Boolean, 758 00:44:45,610 --> 00:44:47,110 true or false, right? 759 00:44:47,110 --> 00:44:48,700 Let me show you this instead. 760 00:44:48,700 --> 00:44:52,840 If I go to my terminal, I can see-- if I type dot schema, 761 00:44:52,840 --> 00:44:56,320 I haven't used Boolean, there's no need for me in this case, 762 00:44:56,320 --> 00:44:58,150 but I have used integer. 763 00:44:58,150 --> 00:45:02,350 And integer for SQLite can kind of serve the same purpose. 764 00:45:02,350 --> 00:45:06,550 I could have the integer 0 or the integer 1 to be true 765 00:45:06,550 --> 00:45:09,880 or to be false or true, respectively, 0 being false and true being 1, 766 00:45:09,880 --> 00:45:11,140 I believe in this case. 767 00:45:11,140 --> 00:45:13,970 But good question. 768 00:45:13,970 --> 00:45:14,750 OK. 769 00:45:14,750 --> 00:45:19,190 So to the earlier question, like, we've improved our tables. 770 00:45:19,190 --> 00:45:21,950 We now have type affinities for our columns. 771 00:45:21,950 --> 00:45:26,490 But we don't yet have this ID we talked about last week, 772 00:45:26,490 --> 00:45:29,450 which was primary keys and foreign keys. 773 00:45:29,450 --> 00:45:34,280 This idea of trying to uniquely represent each item in our own table 774 00:45:34,280 --> 00:45:36,770 using primary keys and trying to reference 775 00:45:36,770 --> 00:45:40,800 that primary key from some other table using foreign keys. 776 00:45:40,800 --> 00:45:43,820 So let's try to work on that now. 777 00:45:43,820 --> 00:45:48,980 And for this, we'll need this new ID called a table constraint. 778 00:45:48,980 --> 00:45:54,500 In SQLite, you can apply what's called a constraint to your entire table. 779 00:45:54,500 --> 00:45:59,300 A constraint means that some values have to be a certain way. 780 00:45:59,300 --> 00:46:04,340 Like let's say for a primary key, primary keys must be unique. 781 00:46:04,340 --> 00:46:07,730 They can't repeat, and they must be, at least in our case, 782 00:46:07,730 --> 00:46:12,990 is going to be integers, to be able to quickly add on to them over time. 783 00:46:12,990 --> 00:46:16,280 Similarly, for foreign keys, well, a constraint 784 00:46:16,280 --> 00:46:19,220 is that if you have a foreign key, you better 785 00:46:19,220 --> 00:46:22,850 find that value in some other table, otherwise 786 00:46:22,850 --> 00:46:26,250 you violated this constraint to having a foreign key. 787 00:46:26,250 --> 00:46:29,940 So we have two kinds of table constraints, among others, 788 00:46:29,940 --> 00:46:33,500 but two of these are primary key and foreign key. 789 00:46:33,500 --> 00:46:36,800 And we can apply these to our table by applying them 790 00:46:36,800 --> 00:46:42,030 underneath the columns we tend to say will be inside of our table. 791 00:46:42,030 --> 00:46:43,850 Let's try these two here now. 792 00:46:43,850 --> 00:46:47,840 So we come back to my terminal here so we implement our very own primary key 793 00:46:47,840 --> 00:46:49,730 and foreign key constraints. 794 00:46:49,730 --> 00:46:52,920 We'll go back to my SQLite terminal and clear my screen, 795 00:46:52,920 --> 00:46:55,670 and let's then pull up our schema.sql file 796 00:46:55,670 --> 00:46:58,130 so we can keep modifying our schema. 797 00:46:58,130 --> 00:47:01,610 I can now see I have the riders table, stations, and visits, 798 00:47:01,610 --> 00:47:06,080 and I have some columns that could be primary keys or foreign keys, 799 00:47:06,080 --> 00:47:08,850 but I need to declare them as such. 800 00:47:08,850 --> 00:47:12,120 So here in the riders table, what was our primary key? 801 00:47:12,120 --> 00:47:13,520 Well, it was ID. 802 00:47:13,520 --> 00:47:16,520 Every rider should have their own, unique ID that should not 803 00:47:16,520 --> 00:47:19,460 be duplicated across any two riders. 804 00:47:19,460 --> 00:47:24,410 So to ensure that constraint is applied, I could follow this up with a comma 805 00:47:24,410 --> 00:47:28,770 and then say, primary key ID, just like this. 806 00:47:28,770 --> 00:47:32,690 Now ID is the primary key of this rider's table. 807 00:47:32,690 --> 00:47:36,450 I can go down to stations and ask, what was my primary key? 808 00:47:36,450 --> 00:47:40,520 Well, similarly, it was ID, this ID column on line 8. 809 00:47:40,520 --> 00:47:45,590 So I'll type a comma followed up with primary key ID, 810 00:47:45,590 --> 00:47:49,170 and now that ID column is a primary key. 811 00:47:49,170 --> 00:47:52,580 It has that constraint applied in stations. 812 00:47:52,580 --> 00:47:56,900 But now if I get down to visits, we'll have a few more options. 813 00:47:56,900 --> 00:48:01,700 Visits here actually doesn't have its own ID column that I created. 814 00:48:01,700 --> 00:48:05,750 I instead have a rider ID and a station ID column. 815 00:48:05,750 --> 00:48:07,110 So a few options here. 816 00:48:07,110 --> 00:48:10,730 One option is actually to make a joint primary key. 817 00:48:10,730 --> 00:48:17,630 I could have a primary key composed of two columns, both rider ID and station 818 00:48:17,630 --> 00:48:18,170 ID. 819 00:48:18,170 --> 00:48:20,780 If I applied that constraint, that would mean 820 00:48:20,780 --> 00:48:24,110 that if I were to ever insert a row that had the same rider 821 00:48:24,110 --> 00:48:26,990 ID and the same station ID as another row, 822 00:48:26,990 --> 00:48:29,810 I would trigger a constraint violation. 823 00:48:29,810 --> 00:48:33,080 Every row that has a rider ID and a station ID 824 00:48:33,080 --> 00:48:36,860 has to be unique in their combination of those two values. 825 00:48:36,860 --> 00:48:39,770 To write that kind of scenario, I could follow this up 826 00:48:39,770 --> 00:48:43,190 and I could say similarly, the primary key of this table 827 00:48:43,190 --> 00:48:48,410 is not just rider ID like this, but it's also station ID. 828 00:48:48,410 --> 00:48:52,080 Now this is a joint primary key constraint. 829 00:48:52,080 --> 00:48:54,110 But if we think about this logically, I mean, 830 00:48:54,110 --> 00:48:57,620 it kind of stands to reason that somebody would visit a station more 831 00:48:57,620 --> 00:49:01,850 than once, and I don't want to make sure that every combination of rider 832 00:49:01,850 --> 00:49:03,420 and station ID should be unique. 833 00:49:03,420 --> 00:49:06,600 I want people to able to visit a station more than once. 834 00:49:06,600 --> 00:49:09,900 So maybe not the best design for this table, but I could certainly use it. 835 00:49:09,900 --> 00:49:12,530 In other cases or other contexts. 836 00:49:12,530 --> 00:49:15,410 One other option would be to do it before 837 00:49:15,410 --> 00:49:19,670 and to have maybe the ID column here of type integer, 838 00:49:19,670 --> 00:49:23,670 and then down below, make that our primary key, a bit like this. 839 00:49:23,670 --> 00:49:27,140 And now visits has its own ID column. 840 00:49:27,140 --> 00:49:33,395 But actually, SQLite by default will give me its very own primary key, 841 00:49:33,395 --> 00:49:35,810 one called row ID. 842 00:49:35,810 --> 00:49:39,170 It's implicit, I can't see it, but I actually could query it. 843 00:49:39,170 --> 00:49:41,810 I could query it for row ID, all one word, 844 00:49:41,810 --> 00:49:46,070 and get back a unique primary key for this table SQLite has automatically 845 00:49:46,070 --> 00:49:47,950 created for me. 846 00:49:47,950 --> 00:49:50,880 Now we have seen the primary key options, 847 00:49:50,880 --> 00:49:52,420 what are our foreign key options? 848 00:49:52,420 --> 00:49:55,180 Well, it seems like rider ID and station ID 849 00:49:55,180 --> 00:49:59,320 are the foreign keys of this table where rider ID references the ID 850 00:49:59,320 --> 00:50:05,020 column in rider's, and station ID references the ID column in stations. 851 00:50:05,020 --> 00:50:08,350 So to codify that, to make that a reality in our schema, 852 00:50:08,350 --> 00:50:11,540 I could follow this up with the foreign key. 853 00:50:11,540 --> 00:50:14,260 The foreign key of this table is rider ID, 854 00:50:14,260 --> 00:50:19,690 and it references the rider's table and the ID column 855 00:50:19,690 --> 00:50:22,840 inside of it using this syntax here. 856 00:50:22,840 --> 00:50:24,530 Now I could keep going. 857 00:50:24,530 --> 00:50:26,840 I could say I have more than one foreign key. 858 00:50:26,840 --> 00:50:29,500 I also have a foreign key-- 859 00:50:29,500 --> 00:50:32,020 foreign key called station ID. 860 00:50:32,020 --> 00:50:34,150 And that references, like we said before, 861 00:50:34,150 --> 00:50:37,670 the stations table and the ID column inside of it. 862 00:50:37,670 --> 00:50:40,060 So now here I have my completed schema. 863 00:50:40,060 --> 00:50:43,780 I have a primary key for the tables I've declared an explicit column 864 00:50:43,780 --> 00:50:47,800 for a primary key, and now I also have foreign key constraints 865 00:50:47,800 --> 00:50:51,710 for those columns that should be foreign keys. 866 00:50:51,710 --> 00:50:55,570 So now let me ask, what questions do we have on the schema 867 00:50:55,570 --> 00:50:59,508 or on primary keys and foreign keys? 868 00:50:59,508 --> 00:51:00,300 AUDIENCE: So, yeah. 869 00:51:00,300 --> 00:51:03,570 I just noticed that whenever before we hadn't 870 00:51:03,570 --> 00:51:06,210 added the affinities and the keys, we were not 871 00:51:06,210 --> 00:51:09,340 applying commas after each column name. 872 00:51:09,340 --> 00:51:11,295 So what is the difference there? 873 00:51:11,295 --> 00:51:12,670 CARTER ZENKE: Yeah, a good catch. 874 00:51:12,670 --> 00:51:15,128 Let me kind of show you what this looks like in my terminal 875 00:51:15,128 --> 00:51:17,310 so you can see it live. 876 00:51:17,310 --> 00:51:20,430 You, I think, had noticed that before we had, 877 00:51:20,430 --> 00:51:25,110 let's say, this primary key ID constraint in riders, 878 00:51:25,110 --> 00:51:26,740 we had done something like this. 879 00:51:26,740 --> 00:51:28,290 Let me just copy paste that. 880 00:51:28,290 --> 00:51:30,450 And we had removed this last column-- 881 00:51:30,450 --> 00:51:32,190 or comma from the name column. 882 00:51:32,190 --> 00:51:33,090 Is that right? 883 00:51:33,090 --> 00:51:38,290 And if that's the case, well, it's just convention, just style here. 884 00:51:38,290 --> 00:51:41,100 So if I want to keep adding some constraint 885 00:51:41,100 --> 00:51:45,000 or like a new line to my table, I should include a comma. 886 00:51:45,000 --> 00:51:51,720 Here, this name column was the last portion of my table I had specified. 887 00:51:51,720 --> 00:51:55,770 I have this column called name that has type affinity text, right? 888 00:51:55,770 --> 00:51:58,590 But now, if I add this new constraint, we'll 889 00:51:58,590 --> 00:52:03,360 have to follow it up with-- follow it up after a comma from this new column 890 00:52:03,360 --> 00:52:04,290 here. 891 00:52:04,290 --> 00:52:10,230 Notice now, this constraint, primary key ID, is the last-- 892 00:52:10,230 --> 00:52:12,810 let's say attribute of my table I specified. 893 00:52:12,810 --> 00:52:15,870 I no longer need to include a comma at the end of it. 894 00:52:15,870 --> 00:52:20,430 So whatever is the last portion, I should not have a comma after, 895 00:52:20,430 --> 00:52:23,460 but everything else, I should. 896 00:52:23,460 --> 00:52:25,800 Let's take one more question here, too. 897 00:52:25,800 --> 00:52:29,813 AUDIENCE: Would it be OK for the visits table to have an ID column as well? 898 00:52:29,813 --> 00:52:31,230 CARTER ZENKE: Yeah, good question. 899 00:52:31,230 --> 00:52:35,490 Would it be OK for the visits table to have an ID column as well. 900 00:52:35,490 --> 00:52:36,450 It certainly would be. 901 00:52:36,450 --> 00:52:39,870 We could define our very own primary key for this table, too. 902 00:52:39,870 --> 00:52:42,210 So let me go back and show you how that could work. 903 00:52:42,210 --> 00:52:44,880 I'll go to my visits table here. 904 00:52:44,880 --> 00:52:49,440 And I could try to add my own primary key to this table. 905 00:52:49,440 --> 00:52:54,870 I could say ID, make a new column here, make it a type affinity integer, 906 00:52:54,870 --> 00:52:55,900 like this. 907 00:52:55,900 --> 00:52:56,700 Let me scroll up. 908 00:52:56,700 --> 00:53:00,090 And now let me add some new constraint. 909 00:53:00,090 --> 00:53:03,450 I could say, because I've made my very own primary key, 910 00:53:03,450 --> 00:53:06,310 I'll say primary key ID. 911 00:53:06,310 --> 00:53:12,510 Now, this table has a primary key that I've created called ID. 912 00:53:12,510 --> 00:53:15,310 And this will be in place of SQLite, so that it 913 00:53:15,310 --> 00:53:21,730 would have made called row ID itself but hidden from my own view. 914 00:53:21,730 --> 00:53:24,450 OK, let's keep going then. 915 00:53:24,450 --> 00:53:27,570 We've seen table constraints, we've seen type affinities, 916 00:53:27,570 --> 00:53:30,540 but we could probably do more to improve the design 917 00:53:30,540 --> 00:53:32,740 of this table or this database. 918 00:53:32,740 --> 00:53:35,850 So let's introduce one more kind of constraint, 919 00:53:35,850 --> 00:53:38,740 this one called a column constraint. 920 00:53:38,740 --> 00:53:42,900 So whereas a table constraint applies to the table as a whole, 921 00:53:42,900 --> 00:53:47,160 a column constraint applies to a particular column, let's say. 922 00:53:47,160 --> 00:53:51,930 Maybe I want to column to have certain data inside of it. 923 00:53:51,930 --> 00:53:54,720 Maybe I want to make sure it doesn't have null values, and so on. 924 00:53:54,720 --> 00:53:57,690 I could do that with column constraints. 925 00:53:57,690 --> 00:54:00,510 There are four in SQLite-- 926 00:54:00,510 --> 00:54:07,440 check, default, not null, and unique, and each one does something different. 927 00:54:07,440 --> 00:54:11,340 Check allows me to make my very own check, like check to be sure 928 00:54:11,340 --> 00:54:14,250 that this amount is greater than 0. 929 00:54:14,250 --> 00:54:17,250 Or I could use default. Default means if I 930 00:54:17,250 --> 00:54:20,310 don't supply a value when I add a new row, 931 00:54:20,310 --> 00:54:23,640 it'll just use a default value instead. 932 00:54:23,640 --> 00:54:29,730 Not null means I can't insert null or empty values into this column. 933 00:54:29,730 --> 00:54:31,680 And in fact, it's required. 934 00:54:31,680 --> 00:54:37,770 Unique means, I want to make sure that every row in this column 935 00:54:37,770 --> 00:54:39,390 is a unique value. 936 00:54:39,390 --> 00:54:42,370 It doesn't appear twice in my data set. 937 00:54:42,370 --> 00:54:46,440 So let's try applying a few of these to our schema here. 938 00:54:46,440 --> 00:54:48,270 Go back to my terminal. 939 00:54:48,270 --> 00:54:50,920 And now let me check out this. 940 00:54:50,920 --> 00:54:55,020 Well, I could try applying the not null constraint 941 00:54:55,020 --> 00:54:59,190 when I know I want a column to be required, effectively. 942 00:54:59,190 --> 00:55:01,530 Now where could I best apply that? 943 00:55:01,530 --> 00:55:05,790 Maybe I could apply that to the name column in stations, 944 00:55:05,790 --> 00:55:07,870 like stations must have a name. 945 00:55:07,870 --> 00:55:11,140 So I'll say, the name column cannot be null. 946 00:55:11,140 --> 00:55:14,310 It cannot be empty in here. 947 00:55:14,310 --> 00:55:17,280 Line also should probably be not null. 948 00:55:17,280 --> 00:55:19,710 A station must be part of some line. 949 00:55:19,710 --> 00:55:21,780 I can't have an empty value for line. 950 00:55:21,780 --> 00:55:26,240 So I'll say, too, this should be not null. 951 00:55:26,240 --> 00:55:28,790 Now I could apply this up at name. 952 00:55:28,790 --> 00:55:31,610 I could say, riders must have a name, too. 953 00:55:31,610 --> 00:55:32,310 Let me try that. 954 00:55:32,310 --> 00:55:34,220 I'll say text not null. 955 00:55:34,220 --> 00:55:35,630 Or I could leave it optional. 956 00:55:35,630 --> 00:55:39,020 I could say, maybe text just on its own and let 957 00:55:39,020 --> 00:55:42,680 riders choose to supply a name or not. 958 00:55:42,680 --> 00:55:48,650 Now the question here is, should I apply not null to my primary key columns, 959 00:55:48,650 --> 00:55:53,300 like ID not null or ID not null here? 960 00:55:53,300 --> 00:55:56,420 You might think that you should, for thoroughness sake. 961 00:55:56,420 --> 00:56:01,940 Well it turns out that when you apply the primary key table constraint down 962 00:56:01,940 --> 00:56:06,440 below here, this already ensures that there are several constraints applied 963 00:56:06,440 --> 00:56:12,800 to this particular column called ID, among them being that ID cannot be 964 00:56:12,800 --> 00:56:13,350 null. 965 00:56:13,350 --> 00:56:17,480 So no need to duplicate this and say that this ID cannot be null when I 966 00:56:17,480 --> 00:56:23,240 already have it specified down below that ID is a primary key. 967 00:56:23,240 --> 00:56:25,520 Let me check others here. 968 00:56:25,520 --> 00:56:29,800 You might also think, could I do it for rider ID and station ID? 969 00:56:29,800 --> 00:56:32,470 Should I include not null here? 970 00:56:32,470 --> 00:56:36,490 Rider ID, not null, station ID, not null. 971 00:56:36,490 --> 00:56:39,250 Well, that would be a good thought, but again, we're 972 00:56:39,250 --> 00:56:43,540 taken care of by our table constraint using our foreign key here. 973 00:56:43,540 --> 00:56:48,520 Again, this constraint will say, if rider ID doesn't already 974 00:56:48,520 --> 00:56:53,380 exist in the ID column of rider's, I can't insert that value. 975 00:56:53,380 --> 00:56:58,090 And we could probably presume that if rider's ID is a primary key, 976 00:56:58,090 --> 00:57:01,130 well, null will not be part of this column, 977 00:57:01,130 --> 00:57:06,430 and therefore, I already can't insert null for rider ID or station ID. 978 00:57:06,430 --> 00:57:08,860 This would be in this case, redundant. 979 00:57:08,860 --> 00:57:12,910 So not null is good when you have columns that are neither 980 00:57:12,910 --> 00:57:15,910 primary keys nor foreign keys and you want 981 00:57:15,910 --> 00:57:20,010 to make sure that they have-- they always have a value, that they 982 00:57:20,010 --> 00:57:22,170 are never null, effectively. 983 00:57:22,170 --> 00:57:25,010 984 00:57:25,010 --> 00:57:27,320 So that is not null. 985 00:57:27,320 --> 00:57:28,730 So we could keep going here. 986 00:57:28,730 --> 00:57:34,190 We also had one called unique that makes sure every value, 987 00:57:34,190 --> 00:57:37,100 every row in this column is unique. 988 00:57:37,100 --> 00:57:38,820 Where could we apply this? 989 00:57:38,820 --> 00:57:42,140 I could try to apply it, let's say, to the name of a station. 990 00:57:42,140 --> 00:57:44,820 Like, station should have unique names. 991 00:57:44,820 --> 00:57:47,600 So I'll say not null and unique. 992 00:57:47,600 --> 00:57:50,220 Now this column has two constraints. 993 00:57:50,220 --> 00:57:51,680 The first, not null. 994 00:57:51,680 --> 00:57:53,720 It should always have a value. 995 00:57:53,720 --> 00:57:55,190 The second, unique. 996 00:57:55,190 --> 00:57:58,220 The value shouldn't repeat throughout this column. 997 00:57:58,220 --> 00:58:01,100 Line I might leave without this constraint. 998 00:58:01,100 --> 00:58:04,370 I could imagine two stations being on the same line, like both on blue, 999 00:58:04,370 --> 00:58:07,170 I'll allow that in this case. 1000 00:58:07,170 --> 00:58:12,560 Now again, we could try to apply unique to our primary keys or our foreign keys 1001 00:58:12,560 --> 00:58:15,920 as I just did here, but it's already taken 1002 00:58:15,920 --> 00:58:19,520 care of for us using this primary key constraint. 1003 00:58:19,520 --> 00:58:24,050 A primary key, again, is always going to be unique and never null. 1004 00:58:24,050 --> 00:58:29,720 So we'll take advantage of that already using our primary key and foreign key 1005 00:58:29,720 --> 00:58:32,490 constraints here. 1006 00:58:32,490 --> 00:58:33,270 OK. 1007 00:58:33,270 --> 00:58:36,180 So we've seen unique and not null. 1008 00:58:36,180 --> 00:58:39,510 And I might argue, we're at the point where this schema seems 1009 00:58:39,510 --> 00:58:44,400 to be fairly optimized, at least using our column constraints, our table 1010 00:58:44,400 --> 00:58:47,560 constraints, our type affinities and so on. 1011 00:58:47,560 --> 00:58:55,440 So let's ask then, what questions do we have on not null and unique, if any? 1012 00:58:55,440 --> 00:58:58,770 AUDIENCE: So basically, to recap if I understood correctly, it's 1013 00:58:58,770 --> 00:59:01,620 not precisely about not null and unique but about 1014 00:59:01,620 --> 00:59:06,060 the concept of the key, labeling key immediately 1015 00:59:06,060 --> 00:59:10,135 gives the attribute of not null, unique, and to be referenced, right? 1016 00:59:10,135 --> 00:59:11,260 CARTER ZENKE: That is true. 1017 00:59:11,260 --> 00:59:14,640 So when you use a primary key or a foreign key constraint, 1018 00:59:14,640 --> 00:59:17,880 there are other constraints that go along with that constraint. 1019 00:59:17,880 --> 00:59:23,410 A primary key, for instance, must not be null, it must be unique, and so on. 1020 00:59:23,410 --> 00:59:25,290 So it would be redundant to apply that again, 1021 00:59:25,290 --> 00:59:29,010 to say that this primary key should be unique or not null. 1022 00:59:29,010 --> 00:59:31,160 Good clarification there. 1023 00:59:31,160 --> 00:59:32,000 OK. 1024 00:59:32,000 --> 00:59:36,590 So I think we're at the point where this schema is pretty well set for us, 1025 00:59:36,590 --> 00:59:42,380 and we're going to need to think about how we use this in the real world. 1026 00:59:42,380 --> 00:59:48,630 Like, if this is our schema here, we have riders and stations, well, 1027 00:59:48,630 --> 00:59:49,940 what could we do? 1028 00:59:49,940 --> 00:59:52,340 I mean, riders tend to-- 1029 00:59:52,340 --> 00:59:54,710 I mean, they could register for the subway. 1030 00:59:54,710 --> 00:59:56,900 And riders, well, they do visit stations. 1031 00:59:56,900 --> 00:59:59,360 But I think if we applied this to the real world, 1032 00:59:59,360 --> 01:00:02,630 we'd see that this isn't quite how it's actually done. 1033 01:00:02,630 --> 01:00:05,510 Like, riders don't really register that often. 1034 01:00:05,510 --> 01:00:08,570 If a rider who's new to the city comes in, they want to ride the subway, 1035 01:00:08,570 --> 01:00:10,760 they should be able to ride, too. 1036 01:00:10,760 --> 01:00:15,110 So it turns out at least here in Boston, the MBTA doesn't really 1037 01:00:15,110 --> 01:00:20,360 track riders, per se, but they do track what we call CharlieCards. 1038 01:00:20,360 --> 01:00:23,780 If you want to ride a subway, you get a CharlieCard. 1039 01:00:23,780 --> 01:00:26,330 A CharlieCard allows you access to the subway, 1040 01:00:26,330 --> 01:00:28,910 keeps track of your fare, your balance and so on, 1041 01:00:28,910 --> 01:00:31,730 and allows you to swipe in to certain stations. 1042 01:00:31,730 --> 01:00:33,800 So when we come back from a break here, we'll 1043 01:00:33,800 --> 01:00:37,550 see how we can actually implement these CharlieCards instead of riders 1044 01:00:37,550 --> 01:00:40,460 to make our system more efficient and more in line with what 1045 01:00:40,460 --> 01:00:42,140 happens in the real world. 1046 01:00:42,140 --> 01:00:45,220 We'll see you all in just a few. 1047 01:00:45,220 --> 01:00:46,940 And we're back. 1048 01:00:46,940 --> 01:00:49,750 So we saw last time, this challenge of designing 1049 01:00:49,750 --> 01:00:53,080 a system for the Massachusetts Bay Transportation Authority 1050 01:00:53,080 --> 01:00:55,540 to represent riders and stations-- people 1051 01:00:55,540 --> 01:00:58,480 who get on a subway to go around Boston. 1052 01:00:58,480 --> 01:01:03,460 But as we also saw, we learned that the MBTA doesn't 1053 01:01:03,460 --> 01:01:09,100 keep track of riders themselves, they keep track of CharlieCards, this card 1054 01:01:09,100 --> 01:01:11,650 that a rider might carry in their pocket. 1055 01:01:11,650 --> 01:01:17,230 And they can scan this card at a station to enter or even to exit in some cases, 1056 01:01:17,230 --> 01:01:21,650 to make sure that their fare is counted as they enter that station. 1057 01:01:21,650 --> 01:01:27,520 So let's think now how to improve our schema to represent not just riders, 1058 01:01:27,520 --> 01:01:31,540 like me, But CharlieCards themselves people might carry around 1059 01:01:31,540 --> 01:01:34,120 when they enter the subway station. 1060 01:01:34,120 --> 01:01:37,120 Well, we saw before we had riders and cards. 1061 01:01:37,120 --> 01:01:43,030 But our goal now is to remove riders from the picture, focus only on cards. 1062 01:01:43,030 --> 01:01:47,470 Well cards, as we've seen, might make us swipe at a station. 1063 01:01:47,470 --> 01:01:51,790 If I enter Harvard station, I might swipe my CharlieCard 1064 01:01:51,790 --> 01:01:54,380 to enter that station. 1065 01:01:54,380 --> 01:01:59,270 And we could see that a card would have maybe many swipes involved. 1066 01:01:59,270 --> 01:02:04,520 Like if I swipe at Harvard, I might also swipe it MIT or swipe it Park Street, 1067 01:02:04,520 --> 01:02:05,810 and so on. 1068 01:02:05,810 --> 01:02:10,410 We could see that a swipe can only happen at a single station at a time, 1069 01:02:10,410 --> 01:02:10,910 though. 1070 01:02:10,910 --> 01:02:14,990 Like if I swipe at Park Street, I'm only swiping at Park Street here, 1071 01:02:14,990 --> 01:02:20,050 and similarly, a swipe might only involve one, single card. 1072 01:02:20,050 --> 01:02:23,040 So if we think about these entities and how they relate, 1073 01:02:23,040 --> 01:02:27,810 we could also think about what kinds of columns you have inside of each entity. 1074 01:02:27,810 --> 01:02:30,730 In this case, I would argue we have something a bit like this. 1075 01:02:30,730 --> 01:02:36,150 We could say that a card has a ID for its primary key, in this case. 1076 01:02:36,150 --> 01:02:42,360 Similarly, this card makes a swipe, and this swipe has, itself, an ID, 1077 01:02:42,360 --> 01:02:46,260 as well as a type, some time that that swipe happened, 1078 01:02:46,260 --> 01:02:49,480 and an amount or a transaction that is involved. 1079 01:02:49,480 --> 01:02:54,090 So for example, let's say I swipe in at the Harvard Square Station. 1080 01:02:54,090 --> 01:02:59,880 That type of swipe might be to enter the station at some certain time. 1081 01:02:59,880 --> 01:03:03,510 Now associated with that swipe is also some amount 1082 01:03:03,510 --> 01:03:07,320 in dollars that happened to be subtracted from my card. 1083 01:03:07,320 --> 01:03:10,620 Like let's say the fare nowadays is $2.40, well, 1084 01:03:10,620 --> 01:03:15,830 that amount is subtracted from my card from this swipe. 1085 01:03:15,830 --> 01:03:20,270 Now, of course, I do all of this at a station, which has our same columns 1086 01:03:20,270 --> 01:03:23,630 from before, ID, name, and line. 1087 01:03:23,630 --> 01:03:28,130 So a similar idea now, but we're replacing riders with cards 1088 01:03:28,130 --> 01:03:31,190 and adding more information to these visits. 1089 01:03:31,190 --> 01:03:34,940 There are instead swipes that could be maybe entering the station, 1090 01:03:34,940 --> 01:03:39,240 exiting the station or just adding funds to my balance, for instance. 1091 01:03:39,240 --> 01:03:41,690 So let's see how we could improve our schema now 1092 01:03:41,690 --> 01:03:47,720 using some new SQL keywords to alter our tables and add some new ones, too. 1093 01:03:47,720 --> 01:03:49,790 I'll go back to my computer. 1094 01:03:49,790 --> 01:03:54,560 And let's see how we could try to alter the table we've already created. 1095 01:03:54,560 --> 01:03:58,910 We already have a riders table, a visits table and a stations table, 1096 01:03:58,910 --> 01:04:02,450 but we could learn some new statements, some new queries to actually 1097 01:04:02,450 --> 01:04:06,030 update and alter these tables as well. 1098 01:04:06,030 --> 01:04:08,100 The first one as we saw a little bit before 1099 01:04:08,100 --> 01:04:11,270 is this one called drop table, arguably the most dramatic thing 1100 01:04:11,270 --> 01:04:14,100 you can do to table, is just drop it, delete it like this. 1101 01:04:14,100 --> 01:04:18,080 So let's try just deleting the riders table from our database. 1102 01:04:18,080 --> 01:04:21,140 I'll go back to my-- 1103 01:04:21,140 --> 01:04:24,050 go back to my database here. 1104 01:04:24,050 --> 01:04:29,370 I'll type sqlite mbta.db to pull up this database again. 1105 01:04:29,370 --> 01:04:34,160 And now if I type dot schema, I'll see I have a riders, stations, and visits 1106 01:04:34,160 --> 01:04:34,790 table. 1107 01:04:34,790 --> 01:04:38,510 But no longer-- I don't want to have the riders table. 1108 01:04:38,510 --> 01:04:39,990 I want to remove that. 1109 01:04:39,990 --> 01:04:45,200 So I'll say, drop table riders, semicolon. 1110 01:04:45,200 --> 01:04:46,880 Now I'll hit enter. 1111 01:04:46,880 --> 01:04:48,650 No more riders table. 1112 01:04:48,650 --> 01:04:52,830 If I type dot schema, that's gone for my database. 1113 01:04:52,830 --> 01:04:54,990 Well, what could I do now? 1114 01:04:54,990 --> 01:04:58,740 I've dropped the table, but I'd still need to maybe update 1115 01:04:58,740 --> 01:05:00,660 visits to instead be swipes. 1116 01:05:00,660 --> 01:05:02,850 I could probably leave stations as is. 1117 01:05:02,850 --> 01:05:06,360 But I want to update a table or alter its schema, 1118 01:05:06,360 --> 01:05:09,370 I can use some new commands as well. 1119 01:05:09,370 --> 01:05:12,960 I'll use this one called alter table. 1120 01:05:12,960 --> 01:05:16,300 Alter table looks a bit like this. 1121 01:05:16,300 --> 01:05:22,320 I can use the alter table statement here and give it some table name. 1122 01:05:22,320 --> 01:05:27,990 Like let's say alter table visits, and then inside this visits table, 1123 01:05:27,990 --> 01:05:30,370 I get to ask, what do I want to do? 1124 01:05:30,370 --> 01:05:33,030 Do I want to, let say, rename the table? 1125 01:05:33,030 --> 01:05:34,240 I could do that. 1126 01:05:34,240 --> 01:05:37,890 I could also decide to add a column, rename a column, 1127 01:05:37,890 --> 01:05:40,440 or drop a column altogether. 1128 01:05:40,440 --> 01:05:43,530 Let's try first looking at rename to. 1129 01:05:43,530 --> 01:05:47,790 I want to rename this visits table to a swipes table, 1130 01:05:47,790 --> 01:05:51,510 representing not just a visit to the station, but a swipe of a card. 1131 01:05:51,510 --> 01:05:53,340 So let's try this one here. 1132 01:05:53,340 --> 01:05:57,240 I'll go back to my computer and I'll go back to SQLite. 1133 01:05:57,240 --> 01:06:01,740 And I'll say, I no longer want visits to be called visits. 1134 01:06:01,740 --> 01:06:06,330 I ideally want visits to actually be called swipes. 1135 01:06:06,330 --> 01:06:07,590 So let me try this. 1136 01:06:07,590 --> 01:06:13,500 I'll say alter table, as we saw before, visits 1137 01:06:13,500 --> 01:06:18,180 rename to swipes, like this, semicolon. 1138 01:06:18,180 --> 01:06:19,770 Now I'll hit enter. 1139 01:06:19,770 --> 01:06:24,212 And now if I type dot schema, again, I see-- 1140 01:06:24,212 --> 01:06:25,170 oops, kind of strange-- 1141 01:06:25,170 --> 01:06:28,500 I'll hit dot schema again, I'll see swipes, 1142 01:06:28,500 --> 01:06:32,730 no longer called visits, but now called swipes. 1143 01:06:32,730 --> 01:06:36,390 What we saw-- I'd ideally like to add a type of the swipe, 1144 01:06:36,390 --> 01:06:40,200 maybe I'm entering the station, maybe I'm exiting. 1145 01:06:40,200 --> 01:06:43,770 Maybe I'm just adding funds or depositing some funds. 1146 01:06:43,770 --> 01:06:47,940 So let me try adding, let's say, a new column to swipes. 1147 01:06:47,940 --> 01:06:51,630 I'll say alter table, add column-- 1148 01:06:51,630 --> 01:06:57,910 alter table, swipes, I'm going to name that table, swipes, like this. 1149 01:06:57,910 --> 01:07:00,840 And then let's add a column-- 1150 01:07:00,840 --> 01:07:03,510 add a column called type. 1151 01:07:03,510 --> 01:07:06,600 And this will have the type affinity text. 1152 01:07:06,600 --> 01:07:08,580 I'll hit semicolon, enter. 1153 01:07:08,580 --> 01:07:12,060 And now if I type dot schema, I'll see-- 1154 01:07:12,060 --> 01:07:16,820 well, I have a column called a ttpe. 1155 01:07:16,820 --> 01:07:18,270 So clearly, I made a typo here. 1156 01:07:18,270 --> 01:07:20,570 I had rider ID, station ID. 1157 01:07:20,570 --> 01:07:26,120 This new column down below called ttpe, I kind of want to fix this, right? 1158 01:07:26,120 --> 01:07:28,430 I don't want ttpe, I want type. 1159 01:07:28,430 --> 01:07:31,760 So probably good chance to use my rename column. 1160 01:07:31,760 --> 01:07:34,250 I'll come back here and I'll try that. 1161 01:07:34,250 --> 01:07:37,610 I'll instead do alter table-- 1162 01:07:37,610 --> 01:07:43,160 alter table, swipes, and I will rename a column now. 1163 01:07:43,160 --> 01:07:51,170 I'll say rename the column, ttpe to type, spelled correctly now. 1164 01:07:51,170 --> 01:07:55,220 Hit semicolon, clear my terminal, type dot schema. 1165 01:07:55,220 --> 01:08:01,070 And now I see over here that type has been added to my table of swipes. 1166 01:08:01,070 --> 01:08:06,170 I see rider ID, station ID, and now a new column called type. 1167 01:08:06,170 --> 01:08:11,630 So through alter table, we go ahead and add new columns, rename them even, 1168 01:08:11,630 --> 01:08:15,650 or if I wanted to just drop the column altogether. 1169 01:08:15,650 --> 01:08:19,160 Let's say I add this column type and I change my mind, 1170 01:08:19,160 --> 01:08:20,580 I don't want it here anymore. 1171 01:08:20,580 --> 01:08:24,260 I go back to my computer and I could try dropping a particular column. 1172 01:08:24,260 --> 01:08:28,850 Let me try in this case, alter table, swipes. 1173 01:08:28,850 --> 01:08:33,859 And now let me drop column type, semicolon. 1174 01:08:33,859 --> 01:08:34,640 Hit enter. 1175 01:08:34,640 --> 01:08:40,529 And now if I type dot schema, I'm back to where I began. 1176 01:08:40,529 --> 01:08:45,330 So these are the commands we can use to alter our tables, to improve them 1177 01:08:45,330 --> 01:08:48,450 if you make a mistake during our first create table command, 1178 01:08:48,450 --> 01:08:52,930 or if you want to add more functionality later down the line. 1179 01:08:52,930 --> 01:08:58,050 So ideally, I could keep using alter table, add table, create table, drop 1180 01:08:58,050 --> 01:08:58,930 table, and so on. 1181 01:08:58,930 --> 01:09:01,290 But what I want to do here is just start from scratch. 1182 01:09:01,290 --> 01:09:04,029 Like I have stations and swipes and so on. 1183 01:09:04,029 --> 01:09:07,899 Why don't I just go back to my schema file and use that instead? 1184 01:09:07,899 --> 01:09:11,910 So what I'll do is I'll drop table for stations and-- 1185 01:09:11,910 --> 01:09:12,779 oops, semicolon. 1186 01:09:12,779 --> 01:09:17,490 I will drop table for swipes now, semicolon. 1187 01:09:17,490 --> 01:09:22,439 I'll type dot schema, and now I'll see nothing in here. 1188 01:09:22,439 --> 01:09:26,760 I'll quit and I'll type code schema.sql. 1189 01:09:26,760 --> 01:09:29,970 Let me just start from scratch using this schema.sql. 1190 01:09:29,970 --> 01:09:35,790 So we no longer want to have riders, we only want to have cards. 1191 01:09:35,790 --> 01:09:38,670 So I could just rename this table here. 1192 01:09:38,670 --> 01:09:39,930 I'll call it cards. 1193 01:09:39,930 --> 01:09:41,609 Create table, cards. 1194 01:09:41,609 --> 01:09:47,520 Now cards don't have a name, they only have some unique ID in this case. 1195 01:09:47,520 --> 01:09:50,460 Now I'll leave stations just as it is. 1196 01:09:50,460 --> 01:09:51,870 This seems perfectly fine to me. 1197 01:09:51,870 --> 01:09:55,800 Stations have an ID, a name and a line with these constraints 1198 01:09:55,800 --> 01:09:58,690 applied to the name and line columns. 1199 01:09:58,690 --> 01:10:00,720 But now the visits table. 1200 01:10:00,720 --> 01:10:04,500 Well, the visits table is no longer a visit to the station per se, 1201 01:10:04,500 --> 01:10:07,570 it's more a swipe of the card at that station. 1202 01:10:07,570 --> 01:10:10,950 So let's now say visits becomes swipes. 1203 01:10:10,950 --> 01:10:15,370 And among these new columns to add are the following. 1204 01:10:15,370 --> 01:10:18,690 I want to have not just an ID for each swipe, 1205 01:10:18,690 --> 01:10:22,230 not an ID for rider ID and station ID and so on, 1206 01:10:22,230 --> 01:10:26,310 but also I want to have a type of swipe. 1207 01:10:26,310 --> 01:10:30,870 Am I entering the station, exiting, or depositing some funds? 1208 01:10:30,870 --> 01:10:34,980 So I'll say the type of this is going to be a new column, 1209 01:10:34,980 --> 01:10:40,200 and the data type, this whole will be text, enter, exit, 1210 01:10:40,200 --> 01:10:42,900 or deposit for funds. 1211 01:10:42,900 --> 01:10:45,270 Now let me try another column, two. 1212 01:10:45,270 --> 01:10:47,700 I'll include a date time. 1213 01:10:47,700 --> 01:10:49,260 A date time is like a timestamp. 1214 01:10:49,260 --> 01:10:51,810 What time did this swipe actually happen? 1215 01:10:51,810 --> 01:10:54,570 And I'll make this type numeric. 1216 01:10:54,570 --> 01:11:00,650 Numeric can store all kinds of dates and times for me in this table. 1217 01:11:00,650 --> 01:11:02,950 Now let me add one final column. 1218 01:11:02,950 --> 01:11:04,990 This one will be an amount. 1219 01:11:04,990 --> 01:11:09,910 I'll also use numeric for this kind of column here. 1220 01:11:09,910 --> 01:11:13,150 And I'll say that this column called amount 1221 01:11:13,150 --> 01:11:18,500 can store, in this case integers or real numbers, like floats and so on. 1222 01:11:18,500 --> 01:11:24,310 I'll probably decide on that when I actually add some data to this table. 1223 01:11:24,310 --> 01:11:28,650 So here we've updated our schema to represent that diagram we saw before. 1224 01:11:28,650 --> 01:11:31,380 I have cards, I have stations, and I have 1225 01:11:31,380 --> 01:11:36,240 swipes that have some type associated with them, enter, exit, deposit, 1226 01:11:36,240 --> 01:11:39,750 some date, time, and some amount that was charged to me 1227 01:11:39,750 --> 01:11:41,340 while I made this swipe. 1228 01:11:41,340 --> 01:11:44,070 Either I added some funds, in which case amount is positive, 1229 01:11:44,070 --> 01:11:50,000 or I subtracted some funds, entering exiting, in this case from the station. 1230 01:11:50,000 --> 01:11:50,990 All right. 1231 01:11:50,990 --> 01:11:53,390 So I have these tables now. 1232 01:11:53,390 --> 01:11:56,750 And now I want to probably apply some of those same column 1233 01:11:56,750 --> 01:11:58,490 constraints we saw before. 1234 01:11:58,490 --> 01:12:02,840 Like here, it's fine, but I also want to make sure I'm not adding some data 1235 01:12:02,840 --> 01:12:05,280 that I don't want to add to this table. 1236 01:12:05,280 --> 01:12:09,230 So I could go back to my old friends, these column constraints. 1237 01:12:09,230 --> 01:12:14,450 And we saw before, we had default and not null, unique and check. 1238 01:12:14,450 --> 01:12:20,400 We've used not null and unique, but we haven't check or default. 1239 01:12:20,400 --> 01:12:22,970 So let's start using more than just not null and unique 1240 01:12:22,970 --> 01:12:27,720 and also focus on check and default, what they can do for us here. 1241 01:12:27,720 --> 01:12:29,670 I'll go back to my schema. 1242 01:12:29,670 --> 01:12:32,970 And let me just make sure that I'm making all the columns that I 1243 01:12:32,970 --> 01:12:35,610 want to be required actually required. 1244 01:12:35,610 --> 01:12:39,990 I'll go into my swipes table and I'll say that type-- 1245 01:12:39,990 --> 01:12:41,700 I mean, this should be required. 1246 01:12:41,700 --> 01:12:46,380 I should know whether I entered, exited, or deposited some funds. 1247 01:12:46,380 --> 01:12:48,750 So I'll say text not null. 1248 01:12:48,750 --> 01:12:52,590 Similarly, for the timestamp, the time this swipe happened, 1249 01:12:52,590 --> 01:12:54,970 I want that to be not null as well. 1250 01:12:54,970 --> 01:12:57,660 I must know what time I swiped the card. 1251 01:12:57,660 --> 01:13:00,270 And also, it makes sense for there to always be 1252 01:13:00,270 --> 01:13:02,820 some amount associated with this swipe. 1253 01:13:02,820 --> 01:13:07,450 Either I added some funds to my balance or I removed some funds overall. 1254 01:13:07,450 --> 01:13:11,260 So I'll make this not null as well. 1255 01:13:11,260 --> 01:13:14,910 Well, let's see what we could do here with default. 1256 01:13:14,910 --> 01:13:18,900 Default gives me some default value, some initial value 1257 01:13:18,900 --> 01:13:25,320 to add to this column if I don't specify what kind of value to add, 1258 01:13:25,320 --> 01:13:27,840 what be good for date time. 1259 01:13:27,840 --> 01:13:30,390 Date time here is, again, the timestamp. 1260 01:13:30,390 --> 01:13:35,080 The time I swiped this card to enter, let's say, Harvard station. 1261 01:13:35,080 --> 01:13:40,210 Well, if I want this to always have the current time, 1262 01:13:40,210 --> 01:13:43,690 I could use default. I could say the default 1263 01:13:43,690 --> 01:13:48,220 value for this column is this special value here, 1264 01:13:48,220 --> 01:13:52,160 CURRENT TIMESTAMP, in all caps. 1265 01:13:52,160 --> 01:13:57,470 CURRENT TIMESTAMP will show me the year, the month, the day, the hour, 1266 01:13:57,470 --> 01:14:02,600 the minute, the second, all in one value, and insert that into my table. 1267 01:14:02,600 --> 01:14:08,330 So as soon as I add new row, if I don't supply a time for this datetime column, 1268 01:14:08,330 --> 01:14:15,860 I'll instead get back the current time exactly as it is represented in SQLite. 1269 01:14:15,860 --> 01:14:17,760 Now what can I do further than this? 1270 01:14:17,760 --> 01:14:21,750 I could also try to add a-- my very own check. 1271 01:14:21,750 --> 01:14:26,780 Maybe I want to make sure that the amounts here are never equal to zero. 1272 01:14:26,780 --> 01:14:31,220 Like, nobody should ever be able to make a $0 transaction or $0 swipe. 1273 01:14:31,220 --> 01:14:33,860 They're always being charged some money or they're always 1274 01:14:33,860 --> 01:14:36,050 depositing some money, in this case. 1275 01:14:36,050 --> 01:14:41,540 So I could say, amount here has my very own check on this column. 1276 01:14:41,540 --> 01:14:45,980 And inside check, I can actually write my very own expression to check for. 1277 01:14:45,980 --> 01:14:51,170 I could say, for example, amount is not equal to zero. 1278 01:14:51,170 --> 01:14:54,500 Using those same operators we saw back in week zero, 1279 01:14:54,500 --> 01:15:01,810 this will ensure that any value inside amount will not be equal to zero. 1280 01:15:01,810 --> 01:15:03,760 Let's try also for type. 1281 01:15:03,760 --> 01:15:06,730 I mean, type can only have a few values. 1282 01:15:06,730 --> 01:15:11,170 We saw enter, exit, deposit some funds. 1283 01:15:11,170 --> 01:15:16,060 I could make sure that only those values are included inside my type column. 1284 01:15:16,060 --> 01:15:20,500 I could say check that type is in some list of values. 1285 01:15:20,500 --> 01:15:23,560 Going back to week one here, we talk about in. 1286 01:15:23,560 --> 01:15:31,460 I could say, maybe the type is in enter exit or is in deposit. 1287 01:15:31,460 --> 01:15:34,570 So now when I have this table called swipes, 1288 01:15:34,570 --> 01:15:38,740 I'm representing what I'm actually doing when I go to Harvard station. 1289 01:15:38,740 --> 01:15:42,633 I have here a visit for myself, my very own ID 1290 01:15:42,633 --> 01:15:44,050 here, which will update in second. 1291 01:15:44,050 --> 01:15:47,290 I have a station ID where I'm actually going to visit. 1292 01:15:47,290 --> 01:15:52,270 I have a type that I'm going to use to enter, exit, or deposit 1293 01:15:52,270 --> 01:15:54,100 some funds at this station. 1294 01:15:54,100 --> 01:15:56,710 I'm doing it at a certain time, and I have an amount 1295 01:15:56,710 --> 01:16:00,200 associated with this transaction. 1296 01:16:00,200 --> 01:16:02,500 Now, there's one thing to fix here, which 1297 01:16:02,500 --> 01:16:07,040 is that we're still talking about riders inside our swipes table. 1298 01:16:07,040 --> 01:16:08,360 So let's fix that here, too. 1299 01:16:08,360 --> 01:16:12,460 I'll go back to my computer, and let's try fixing this. 1300 01:16:12,460 --> 01:16:19,120 I have a rider ID inside of my swipes table, but no longer do I have riders, 1301 01:16:19,120 --> 01:16:20,680 I have cards. 1302 01:16:20,680 --> 01:16:25,780 So let me say that this is now a card ID, and down below in my foreign key, 1303 01:16:25,780 --> 01:16:32,650 I'll say that this card ID column references the ID column in cards. 1304 01:16:32,650 --> 01:16:35,140 And I think this should represent everything 1305 01:16:35,140 --> 01:16:39,800 I want to represent about swipes at the station. 1306 01:16:39,800 --> 01:16:44,690 So let me ask now what questions we have about this new schema 1307 01:16:44,690 --> 01:16:47,270 and the constraints we've applied. 1308 01:16:47,270 --> 01:16:50,210 AUDIENCE: I wonder, how could you delete or drop 1309 01:16:50,210 --> 01:16:56,510 the table of riders when you use ID as a foreign key? 1310 01:16:56,510 --> 01:17:00,128 I try to do that, but I got an error. 1311 01:17:00,128 --> 01:17:03,170 CARTER ZENKE: Yeah, so you're getting into some more advanced stuff here. 1312 01:17:03,170 --> 01:17:06,500 And suffice to say for now, my foreign key constraints 1313 01:17:06,500 --> 01:17:09,830 aren't actually being checked right now, but yours might be. 1314 01:17:09,830 --> 01:17:14,090 If you try to drop a table that actually has some data that's 1315 01:17:14,090 --> 01:17:17,900 referenced by a foreign key, SQLite will warn you, perhaps to tell you, 1316 01:17:17,900 --> 01:17:21,840 you can't do that because this ID is referenced from this table over there. 1317 01:17:21,840 --> 01:17:26,780 So in that case, best to delete the value that has that foreign key 1318 01:17:26,780 --> 01:17:30,470 and then proceed with dropping that table altogether. 1319 01:17:30,470 --> 01:17:31,910 Good question there. 1320 01:17:31,910 --> 01:17:34,630 Yeah, let's take one more. 1321 01:17:34,630 --> 01:17:43,570 AUDIENCE: How might this syntax send the other SQL languages, same as SQLite 1322 01:17:43,570 --> 01:17:46,580 languages, how much is syntax same? 1323 01:17:46,580 --> 01:17:48,080 CARTER ZENKE: Yeah, a good question. 1324 01:17:48,080 --> 01:17:52,000 So here we're using the SQLite database management system. 1325 01:17:52,000 --> 01:17:55,060 It is similar to but distinct in some ways 1326 01:17:55,060 --> 01:17:57,820 from others like MySQL or PostgreSQL. 1327 01:17:57,820 --> 01:18:00,220 I would say that most of what you're doing here 1328 01:18:00,220 --> 01:18:04,822 could also be ported to MySQL and PostgreSQL with a few exceptions 1329 01:18:04,822 --> 01:18:07,030 that you might need to treat on a case by case basis. 1330 01:18:07,030 --> 01:18:10,220 In fact, the developers of SQLite built things 1331 01:18:10,220 --> 01:18:14,800 so that it would be easy to port their database schemas to another schema 1332 01:18:14,800 --> 01:18:18,140 like MySQL or PostgreSQL as well. 1333 01:18:18,140 --> 01:18:21,500 But good question there. 1334 01:18:21,500 --> 01:18:23,960 All right, let's take one more question here. 1335 01:18:23,960 --> 01:18:30,260 AUDIENCE: Imagine if we put in the ID, we don't put any data type, 1336 01:18:30,260 --> 01:18:32,090 it's going to give us-- 1337 01:18:32,090 --> 01:18:37,032 in this schema, it's going to give us a text or-- 1338 01:18:37,032 --> 01:18:38,490 CARTER ZENKE: Yeah, great question. 1339 01:18:38,490 --> 01:18:42,980 So you're asking if I didn't tell SQLite what kind of type affinity 1340 01:18:42,980 --> 01:18:46,220 a column had, what type of affinity would it actually have? 1341 01:18:46,220 --> 01:18:47,360 A great question. 1342 01:18:47,360 --> 01:18:52,460 In this case, by default, SQLite gives the numeric type affinity, 1343 01:18:52,460 --> 01:18:56,250 where numeric can store integers or real values. 1344 01:18:56,250 --> 01:18:59,180 But if you give it like a string of text, like let's say red line, 1345 01:18:59,180 --> 01:19:04,980 it will store that for you, too, kind of non-intuitively, but it will. 1346 01:19:04,980 --> 01:19:08,310 But the default type affinity is numeric in this case, 1347 01:19:08,310 --> 01:19:10,340 if you don't otherwise specify. 1348 01:19:10,340 --> 01:19:11,900 Good question. 1349 01:19:11,900 --> 01:19:17,410 OK, let's come back then and let's focus on wrapping up on a final few pieces 1350 01:19:17,410 --> 01:19:17,910 here. 1351 01:19:17,910 --> 01:19:20,810 So we've seen some table constraints, which 1352 01:19:20,810 --> 01:19:23,360 we apply primary keys and foreign keys. 1353 01:19:23,360 --> 01:19:26,360 We saw column constraints, where we could make sure 1354 01:19:26,360 --> 01:19:28,970 that certain values were given to us through not null. 1355 01:19:28,970 --> 01:19:31,730 We could also make sure that the-- 1356 01:19:31,730 --> 01:19:33,925 let's say the value is in some list of values, 1357 01:19:33,925 --> 01:19:37,460 they're using check or making sure that it's not some value, 1358 01:19:37,460 --> 01:19:39,320 also using check as well. 1359 01:19:39,320 --> 01:19:44,930 Default allows us to specify a given value for every new insertion of rows 1360 01:19:44,930 --> 01:19:45,560 here. 1361 01:19:45,560 --> 01:19:49,550 And this is actually pretty important to have. 1362 01:19:49,550 --> 01:19:53,750 Not just schemas that have column names and let's 1363 01:19:53,750 --> 01:19:57,320 say type affinities as well, but also constraints, 1364 01:19:57,320 --> 01:20:00,210 make sure the data we insert works well for us. 1365 01:20:00,210 --> 01:20:02,960 And there's actually a story behind this person 1366 01:20:02,960 --> 01:20:07,820 who's on the CharlieCard, this person who's on the subway. 1367 01:20:07,820 --> 01:20:11,420 His name is Charlie, and he's perhaps the most famous subway 1368 01:20:11,420 --> 01:20:13,880 rider in all of Boston. 1369 01:20:13,880 --> 01:20:17,900 Back in the 19-- let's say 1900s, the band 1370 01:20:17,900 --> 01:20:21,290 called The Kingston Trio wrote a song about this man named Charlie. 1371 01:20:21,290 --> 01:20:26,030 Charlie, supposedly, got on at the Kendall Square station where MIT is, 1372 01:20:26,030 --> 01:20:27,950 and he made for Jamaica Plain. 1373 01:20:27,950 --> 01:20:30,320 But once he got to Jamaica Plane, the conductor 1374 01:20:30,320 --> 01:20:34,040 asked him for one more nickel, and well, he didn't have that nickel. 1375 01:20:34,040 --> 01:20:36,860 So he got stuck on the train for years, and years, and years. 1376 01:20:36,860 --> 01:20:38,750 He couldn't get off of the subway. 1377 01:20:38,750 --> 01:20:42,532 So keep in mind, Charlie, when you're writing your own database constraints, 1378 01:20:42,532 --> 01:20:44,240 making sure that if you get on the train, 1379 01:20:44,240 --> 01:20:46,610 you're able to get off of it at some point, right? 1380 01:20:46,610 --> 01:20:49,080 Don't end up like Charlie in this case. 1381 01:20:49,080 --> 01:20:53,720 So with this in mind, you're able to design your very own database 1382 01:20:53,720 --> 01:20:56,840 schemas that keep not just certain columns involved, 1383 01:20:56,840 --> 01:20:59,210 but also type affinities for those columns, types-- 1384 01:20:59,210 --> 01:21:01,790 those data types-- types the columns can store. 1385 01:21:01,790 --> 01:21:04,940 You're also able to apply constraints to those columns to make sure 1386 01:21:04,940 --> 01:21:07,130 that the data you're inserting is that you actually 1387 01:21:07,130 --> 01:21:09,380 want to have in that column. 1388 01:21:09,380 --> 01:21:13,100 Next time, we'll focus on actually adding data to our columns, 1389 01:21:13,100 --> 01:21:16,260 actually write data to a database file to insert, 1390 01:21:16,260 --> 01:21:18,720 update, and delete that data altogether. 1391 01:21:18,720 --> 01:21:22,480 So with that in mind, we'll see you next time. 1392 01:21:22,480 --> 01:21:24,000117327

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.