Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:04,910
[MUSIC PLAYING]
2
00:00:04,910 --> 00:00:17,155
3
00:00:17,155 --> 00:00:18,780
CARTER ZENKE: Well, hello, one and all.
4
00:00:18,780 --> 00:00:22,890
And welcome back to CS50's
Introduction to Databases with SQL.
5
00:00:22,890 --> 00:00:24,360
My name is Carter Zenke.
6
00:00:24,360 --> 00:00:26,820
And last we left off, we
learned about relating--
7
00:00:26,820 --> 00:00:31,360
that is, how to have multiple tables
in our database for people, places,
8
00:00:31,360 --> 00:00:34,500
and things, and how to have them
relate to one another in the way
9
00:00:34,500 --> 00:00:36,580
you might do in the real world.
10
00:00:36,580 --> 00:00:38,550
Now, today, we'll take a step forward.
11
00:00:38,550 --> 00:00:41,370
And we'll talk about how to
put you in the driver's seat,
12
00:00:41,370 --> 00:00:45,870
designing your very own database
schemas to organize your data.
13
00:00:45,870 --> 00:00:49,890
Now, we'll pick up where we left off,
which is with this database of books.
14
00:00:49,890 --> 00:00:52,050
So we had this database.
15
00:00:52,050 --> 00:00:56,760
It was full of books that have been
longlisted for the International Booker
16
00:00:56,760 --> 00:00:57,450
Prize.
17
00:00:57,450 --> 00:01:01,740
To be longlisted means to be
nominated for some prize, let's say.
18
00:01:01,740 --> 00:01:05,190
So we had the past five years
of books in this database.
19
00:01:05,190 --> 00:01:10,170
And we worked on improving this database
over time from week 0 to week 1.
20
00:01:10,170 --> 00:01:13,080
Now, this week, we'll actually
take a look underneath the hood
21
00:01:13,080 --> 00:01:17,980
and see what commands we had used
to create these varied databases.
22
00:01:17,980 --> 00:01:21,660
So let's reveal now what
we had done all along.
23
00:01:21,660 --> 00:01:24,010
I'll go back to my terminal here.
24
00:01:24,010 --> 00:01:26,040
And if you remember,
I could use a command
25
00:01:26,040 --> 00:01:29,730
to open up a database file,
which was this command here,
26
00:01:29,730 --> 00:01:34,450
SQL sqlite3, and then the name
of the file I want to open.
27
00:01:34,450 --> 00:01:35,910
So let's try this.
28
00:01:35,910 --> 00:01:39,420
I'll go back to week 0, to
my week 0 folder like this.
29
00:01:39,420 --> 00:01:43,035
And I'll open up my long list
database, week0/longlist.db.
30
00:01:43,035 --> 00:01:46,170
31
00:01:46,170 --> 00:01:47,430
I'll hit Enter.
32
00:01:47,430 --> 00:01:49,720
Now, I'm in my sqlite prompt.
33
00:01:49,720 --> 00:01:52,230
So I could work on
typing some SQL commands,
34
00:01:52,230 --> 00:01:58,030
SQL statements or queries inside
of this terminal prompt here.
35
00:01:58,030 --> 00:02:01,660
So if I want to get a feel for
what's inside this database,
36
00:02:01,660 --> 00:02:04,230
we saw I could use a
command called select.
37
00:02:04,230 --> 00:02:07,860
So I'll select now some
rows from this table.
38
00:02:07,860 --> 00:02:14,370
I'll say SELECT, let's say, the
title and also the author columns
39
00:02:14,370 --> 00:02:19,740
from my longlist table,
semicolon, and Enter.
40
00:02:19,740 --> 00:02:23,760
Now, I'll see all the titles
and all the authors that
41
00:02:23,760 --> 00:02:27,030
were inside of this long list table.
42
00:02:27,030 --> 00:02:29,250
But I only want to peek here.
43
00:02:29,250 --> 00:02:32,760
I only want to see roughly
what kind of data is inside.
44
00:02:32,760 --> 00:02:36,130
So I could probably
improve this command here.
45
00:02:36,130 --> 00:02:43,530
I could instead say SELECT the author
and title columns from longlist--
46
00:02:43,530 --> 00:02:49,470
from longlist, but limit now to
the first five rows we saw before.
47
00:02:49,470 --> 00:02:52,440
Semicolon, and I'll hit
Enter on this query.
48
00:02:52,440 --> 00:02:56,590
And now, I'll see only
the top five rows.
49
00:02:56,590 --> 00:03:02,860
So I'm able now to see what kind
of data is inside my database.
50
00:03:02,860 --> 00:03:08,790
But what I can't yet see is what
command was used to create this table
51
00:03:08,790 --> 00:03:11,950
and what kind of data could
be stored inside of it.
52
00:03:11,950 --> 00:03:16,140
So let's reveal now what was going
on underneath the hood all this time.
53
00:03:16,140 --> 00:03:20,040
I'll say, this new command, a
sqlite command-- not a SQL keyword,
54
00:03:20,040 --> 00:03:24,580
but sqlite command
called .schema, .schema.
55
00:03:24,580 --> 00:03:28,350
Now, if I hit Enter,
I'll see the following--
56
00:03:28,350 --> 00:03:30,840
the command, the
statement, the query that
57
00:03:30,840 --> 00:03:34,260
was used to create this longlist table.
58
00:03:34,260 --> 00:03:37,980
And notice how I have many
columns inside this table.
59
00:03:37,980 --> 00:03:43,480
I have an ISBN column, a title
column, an author column, and so on.
60
00:03:43,480 --> 00:03:47,340
And each column seems to
have some kind of data
61
00:03:47,340 --> 00:03:52,080
that could be stored inside of it
like text, or integers, or real,
62
00:03:52,080 --> 00:03:56,270
or floating point values, or
decimals, if you're familiar.
63
00:03:56,270 --> 00:04:01,510
So this is how we created the
very first version of longlist.db.
64
00:04:01,510 --> 00:04:04,165
But let's also see how
we created the second.
65
00:04:04,165 --> 00:04:09,190
So I'll type .quit to leave
this version of longlist.db.
66
00:04:09,190 --> 00:04:12,650
And now, let me open up the
next version we had created,
67
00:04:12,650 --> 00:04:13,910
which is more relational.
68
00:04:13,910 --> 00:04:17,350
It had tables inside of it that
could relate to one another.
69
00:04:17,350 --> 00:04:21,100
So I'm going to type
sqlite3, then long--
70
00:04:21,100 --> 00:04:26,320
sqlite3, and go to week 1,
and then type longlist.db.
71
00:04:26,320 --> 00:04:27,700
I'll hit Enter.
72
00:04:27,700 --> 00:04:31,720
Now, I'm on my next
version of longlist.db.
73
00:04:31,720 --> 00:04:32,960
Well, what could I do?
74
00:04:32,960 --> 00:04:36,050
I could type SELECT and look
at some of the tables in here
75
00:04:36,050 --> 00:04:38,200
to see what kind of data is inside.
76
00:04:38,200 --> 00:04:44,350
I could perhaps say SELECT maybe
the names from the authors table
77
00:04:44,350 --> 00:04:48,220
here, from the authors
table, and hit semicolon.
78
00:04:48,220 --> 00:04:54,280
Now, I'll see all the names of authors
that are inside of the authors table.
79
00:04:54,280 --> 00:04:55,930
I could do the same thing for books.
80
00:04:55,930 --> 00:04:58,150
I could maybe look at
the titles of books.
81
00:04:58,150 --> 00:05:05,350
I could say SELECT title from the
books table, semicolon, and Enter.
82
00:05:05,350 --> 00:05:10,490
Now, I see all of the titles
that are inside my books table.
83
00:05:10,490 --> 00:05:14,230
But what I haven't seen yet is
the schema of this database.
84
00:05:14,230 --> 00:05:17,080
It is the way it is
organized and the commands
85
00:05:17,080 --> 00:05:19,700
that were used to create these tables.
86
00:05:19,700 --> 00:05:21,730
So let me work on that now.
87
00:05:21,730 --> 00:05:24,460
I'll clear my terminal using Control-L.
88
00:05:24,460 --> 00:05:28,090
And now, let me type .schema again.
89
00:05:28,090 --> 00:05:34,420
I'll type .schema to see what commands
were used to create this database.
90
00:05:34,420 --> 00:05:37,630
Hit Enter, and I can see--
91
00:05:37,630 --> 00:05:39,670
I mean, there are quite
a lot of commands here.
92
00:05:39,670 --> 00:05:42,610
Now, if this feels overwhelming, I
mean, I'd be right there with you.
93
00:05:42,610 --> 00:05:45,610
This is a lot of commands
to parse through and read.
94
00:05:45,610 --> 00:05:49,850
So there's probably a
better way to do this.
95
00:05:49,850 --> 00:05:55,660
And one way to try is to type .schema,
and then give it some table name.
96
00:05:55,660 --> 00:05:58,930
Let's say, want to understand the
schema for just the books table,
97
00:05:58,930 --> 00:06:00,770
like just that for now.
98
00:06:00,770 --> 00:06:04,060
So I'll say .schema, and
then the table name--
99
00:06:04,060 --> 00:06:05,720
books, in this case.
100
00:06:05,720 --> 00:06:07,840
Then I can hit Enter.
101
00:06:07,840 --> 00:06:11,500
Now, I'll see the
schema, the organization,
102
00:06:11,500 --> 00:06:15,860
the command that we used to create
the books table, in this case.
103
00:06:15,860 --> 00:06:18,250
And notice again, we
have several columns--
104
00:06:18,250 --> 00:06:22,420
ID, ISBN, title,
publisher ID, and so on.
105
00:06:22,420 --> 00:06:26,380
Each one has their own kind of
data they could support or take
106
00:06:26,380 --> 00:06:30,590
into this column like
integers, text, and so on.
107
00:06:30,590 --> 00:06:33,700
So again, what we'll do
today is have you all
108
00:06:33,700 --> 00:06:36,640
learn how to write your
very own create table
109
00:06:36,640 --> 00:06:41,050
commands to build your very own
databases that represent what you
110
00:06:41,050 --> 00:06:44,360
want to represent in the real world.
111
00:06:44,360 --> 00:06:47,860
So let me exit this prompt here.
112
00:06:47,860 --> 00:06:51,370
And let me propose that we'll
have a bit of a design challenge
113
00:06:51,370 --> 00:06:55,600
today to actually try to represent a
real world entity with some database
114
00:06:55,600 --> 00:06:56,240
here.
115
00:06:56,240 --> 00:06:59,050
And if you're not
already familiar, Boston
116
00:06:59,050 --> 00:07:02,110
is perhaps famous for being
among the first cities
117
00:07:02,110 --> 00:07:05,560
to have a subway system
in the United States.
118
00:07:05,560 --> 00:07:09,730
So here is a picture from the
late 1800s of a subway being
119
00:07:09,730 --> 00:07:11,950
built in Boston's city streets.
120
00:07:11,950 --> 00:07:14,170
Underneath the streets
here, there would be
121
00:07:14,170 --> 00:07:18,100
trolley cars that would go and
transport people across Boston.
122
00:07:18,100 --> 00:07:20,410
Here's another picture of
a trolley actually working
123
00:07:20,410 --> 00:07:21,790
underneath the streets.
124
00:07:21,790 --> 00:07:23,920
So people would go down underneath.
125
00:07:23,920 --> 00:07:25,360
They would hop on a trolley.
126
00:07:25,360 --> 00:07:29,950
They'd be able to different parts of
Boston, perhaps from Harvard to MIT
127
00:07:29,950 --> 00:07:33,550
or downtown up to, let's say,
Braintree or down to Braintree,
128
00:07:33,550 --> 00:07:36,070
which is more south of
Boston, for example.
129
00:07:36,070 --> 00:07:39,280
One of the famous stops
is the Park Street stop,
130
00:07:39,280 --> 00:07:41,530
which is right down in
the middle of Boston,
131
00:07:41,530 --> 00:07:45,010
one of the central hubs
of this subway system.
132
00:07:45,010 --> 00:07:50,480
And now, these photos are all from,
let's say, the early 1900s, late 1800s,
133
00:07:50,480 --> 00:07:51,230
and so on.
134
00:07:51,230 --> 00:07:54,070
But the subway has gotten a
lot more modern since then.
135
00:07:54,070 --> 00:07:59,000
And actually, now, we have several lines
that span the entire city and beyond.
136
00:07:59,000 --> 00:08:02,740
So here, we have the Red Line, of
which Harvard and MIT are a part.
137
00:08:02,740 --> 00:08:05,800
We have the green line, which
brings you kind of west to east--
138
00:08:05,800 --> 00:08:09,680
the blue line, the
orange line, and so on.
139
00:08:09,680 --> 00:08:14,790
So many more lines and stations
have been added to this system.
140
00:08:14,790 --> 00:08:18,460
It's a big design challenge to
represent all of these stations,
141
00:08:18,460 --> 00:08:24,560
all of these lines, and all of these
people who might ride this subway, too.
142
00:08:24,560 --> 00:08:31,220
So the question then becomes, how
can we create a schema for this data?
143
00:08:31,220 --> 00:08:36,350
And again, by schema, we mean what
kinds of tables should we have,
144
00:08:36,350 --> 00:08:40,610
what kinds of columns might those
tables have, and what kind of data
145
00:08:40,610 --> 00:08:44,159
should we put in each of
those columns, for instance.
146
00:08:44,159 --> 00:08:46,320
So let me propose this.
147
00:08:46,320 --> 00:08:49,790
Let's say we start just first
with names and stations.
148
00:08:49,790 --> 00:08:53,750
So Charlie here, our very
first rider on this system,
149
00:08:53,750 --> 00:08:56,810
is going to be at the
Kendall at MIT station.
150
00:08:56,810 --> 00:08:58,910
So this is what this
table represents now.
151
00:08:58,910 --> 00:09:01,370
But what more could we have?
152
00:09:01,370 --> 00:09:05,750
Well, we might also want to have maybe
what Charlie is doing at that station.
153
00:09:05,750 --> 00:09:09,050
Maybe he's entering the
station, for instance.
154
00:09:09,050 --> 00:09:11,390
And if you're familiar
with the subway system,
155
00:09:11,390 --> 00:09:17,550
you often have to pay to get onto a
train or get onto the station itself.
156
00:09:17,550 --> 00:09:23,570
So let's say Charlie pays some fare
to enter into the Kendall/MIT station.
157
00:09:23,570 --> 00:09:26,900
Well, back in the mid 1900s,
the fare was only about a dime.
158
00:09:26,900 --> 00:09:27,800
It was $0.10.
159
00:09:27,800 --> 00:09:33,820
So we'll say Charlie paid $0.10
to enter the Kendall/MIT station.
160
00:09:33,820 --> 00:09:36,140
And now, this seems pretty good.
161
00:09:36,140 --> 00:09:41,300
But if I am the Transit Authority,
the person who runs the subway system,
162
00:09:41,300 --> 00:09:45,890
I probably want to know, does Charlie
have enough money to get on the train?
163
00:09:45,890 --> 00:09:48,730
And if so I want to make sure
that, OK, well, Charlie actually
164
00:09:48,730 --> 00:09:50,590
could get on this train.
165
00:09:50,590 --> 00:09:53,420
So let's say not only
does Charlie pay a fare.
166
00:09:53,420 --> 00:09:56,570
He has some remaining
balance afterwards.
167
00:09:56,570 --> 00:10:00,160
So Charlie here has gotten
onto the Kendall/MIT stop.
168
00:10:00,160 --> 00:10:05,330
He's paid the fare of
$0.10 and has $0.05 left.
169
00:10:05,330 --> 00:10:08,750
OK, so here's a bit of a table.
170
00:10:08,750 --> 00:10:11,030
We probably add more information to it.
171
00:10:11,030 --> 00:10:15,230
Let's say Charlie then leaves
at the Jamaica Plain stop.
172
00:10:15,230 --> 00:10:18,650
And the fare to leave is
about a nickel, $0.05.
173
00:10:18,650 --> 00:10:21,410
And now, Charlie has no cents left over.
174
00:10:21,410 --> 00:10:27,680
So again, Charlie paid $0.10 to get on,
had $0.05 left, paid $0.05 to get off,
175
00:10:27,680 --> 00:10:31,930
and now has no remaining
balance here anymore.
176
00:10:31,930 --> 00:10:32,440
OK.
177
00:10:32,440 --> 00:10:34,040
So that's Charlie's story.
178
00:10:34,040 --> 00:10:35,860
Let's look at Alice, though.
179
00:10:35,860 --> 00:10:38,530
Let's say Alice gets
on at the Harvard stop.
180
00:10:38,530 --> 00:10:42,310
They too pay $0.10 to get
on at the Harvard stop.
181
00:10:42,310 --> 00:10:45,370
And they have a remaining
balance of $0.20.
182
00:10:45,370 --> 00:10:49,510
Alice will go, let's say, to Park
Street, get off at Park Street,
183
00:10:49,510 --> 00:10:51,160
pay the nickel to leave.
184
00:10:51,160 --> 00:10:55,430
And now, they'll have a
balance of $0.15 at the end.
185
00:10:55,430 --> 00:10:56,620
Let's go to Bob.
186
00:10:56,620 --> 00:10:59,260
Bob enters the Alewife station.
187
00:10:59,260 --> 00:11:00,940
They pay $0.10.
188
00:11:00,940 --> 00:11:02,800
They have remaining balance of $0.30.
189
00:11:02,800 --> 00:11:08,337
And let's say they leave at Park Street
and have a fare of $0.10 to leave
190
00:11:08,337 --> 00:11:09,670
because it's a further distance.
191
00:11:09,670 --> 00:11:14,490
Now, they'll have a running
balance of $0.20 overall.
192
00:11:14,490 --> 00:11:18,840
So this table is OK, I might admit.
193
00:11:18,840 --> 00:11:23,100
I mean, last time, we learned about
having what we called primary keys
194
00:11:23,100 --> 00:11:24,340
and foreign keys.
195
00:11:24,340 --> 00:11:26,220
So it seems like that's missing here.
196
00:11:26,220 --> 00:11:28,020
Let's go ahead and add that here.
197
00:11:28,020 --> 00:11:32,940
I'll give each row a unique ID, so
I can know who entered, who exited,
198
00:11:32,940 --> 00:11:35,580
and so on, give that a unique ID here.
199
00:11:35,580 --> 00:11:41,950
But I might even say that this table
could really be improved substantially.
200
00:11:41,950 --> 00:11:49,410
And I want to ask you what redundancies
or inefficiencies do you see here?
201
00:11:49,410 --> 00:11:52,680
If we're trying to represent
riders and stations,
202
00:11:52,680 --> 00:11:56,990
what can we improve about this design?
203
00:11:56,990 --> 00:12:01,740
AUDIENCE: So probably the redundancy
will be the names and the stations,
204
00:12:01,740 --> 00:12:02,240
too.
205
00:12:02,240 --> 00:12:06,760
For example, if Charlie
will go to the train daily,
206
00:12:06,760 --> 00:12:10,875
then he will become most
of the names in the data.
207
00:12:10,875 --> 00:12:12,250
CARTER ZENKE: Yeah, a good point.
208
00:12:12,250 --> 00:12:13,450
If I'm hearing what you're
saying, [? Loren, ?]
209
00:12:13,450 --> 00:12:16,420
let me show you some examples
that I highlighted here.
210
00:12:16,420 --> 00:12:20,290
One example could be, to your
point, about these names.
211
00:12:20,290 --> 00:12:24,370
These names seem to be telling
us the name of a person.
212
00:12:24,370 --> 00:12:27,640
But here, we have only three names--
213
00:12:27,640 --> 00:12:29,680
Charlie, Alice, and Bob.
214
00:12:29,680 --> 00:12:32,170
Well, my name is Carter.
215
00:12:32,170 --> 00:12:35,110
And what if somebody
else named Carter also
216
00:12:35,110 --> 00:12:37,810
tried to get on and leave at some stop?
217
00:12:37,810 --> 00:12:39,820
Well, I wouldn't be able
to know which Carter
218
00:12:39,820 --> 00:12:44,210
was which or which Charlie was which,
which Alice was which, and so on.
219
00:12:44,210 --> 00:12:47,920
So we probably need a way to represent
people and their names a little better
220
00:12:47,920 --> 00:12:50,220
here, too.
221
00:12:50,220 --> 00:12:55,870
What other ideas do we have for how
to improve the design of this table?
222
00:12:55,870 --> 00:13:00,330
AUDIENCE: Yes, I think we can have
a singular ID for a singular person.
223
00:13:00,330 --> 00:13:02,998
That way, we'll be better able
to track their activities.
224
00:13:02,998 --> 00:13:03,790
CARTER ZENKE: Nice.
225
00:13:03,790 --> 00:13:06,555
So we probably have
an ID for each person,
226
00:13:06,555 --> 00:13:10,830
a bit what we learned about last week,
putting people in their own table
227
00:13:10,830 --> 00:13:14,890
and giving them their own
unique ID, a primary key.
228
00:13:14,890 --> 00:13:16,380
Let's show that here.
229
00:13:16,380 --> 00:13:18,150
I'll go to some slides.
230
00:13:18,150 --> 00:13:21,345
And I'll pick out one
that shows us just riders.
231
00:13:21,345 --> 00:13:23,220
So to your point,
[? Soqanya, ?] we could try
232
00:13:23,220 --> 00:13:25,830
to have maybe a table for just riders.
233
00:13:25,830 --> 00:13:29,130
And maybe to simplify, this
table has only two columns.
234
00:13:29,130 --> 00:13:32,850
It has a column for ID
and a column for name.
235
00:13:32,850 --> 00:13:38,250
So here, we have Charlie, Alice,
and Bob all in their own table.
236
00:13:38,250 --> 00:13:42,270
Well, let me propose to you, we
could do the same thing for stations.
237
00:13:42,270 --> 00:13:44,760
Let's say we have a
table of stations now.
238
00:13:44,760 --> 00:13:48,030
And we give each one
their very own ID as well,
239
00:13:48,030 --> 00:13:50,400
our own primary key for this table.
240
00:13:50,400 --> 00:13:52,540
We have Harvard,
Kendall, and Park Street.
241
00:13:52,540 --> 00:13:57,130
We can differentiate between
them using their IDs here.
242
00:13:57,130 --> 00:14:00,000
So a few improvements could be made.
243
00:14:00,000 --> 00:14:04,560
And as we're making these improvements,
splitting one table into many,
244
00:14:04,560 --> 00:14:08,010
and debating what kind of
data to store in each table,
245
00:14:08,010 --> 00:14:12,180
the process we're going through
is one called normalizing.
246
00:14:12,180 --> 00:14:14,640
We're normalizing our data.
247
00:14:14,640 --> 00:14:18,210
To normalize means to reduce
redundancies, effectively,
248
00:14:18,210 --> 00:14:19,530
to take a table--
249
00:14:19,530 --> 00:14:22,230
take one table, for instance,
split up into multiple,
250
00:14:22,230 --> 00:14:26,550
and have each entity be
part of its very own table.
251
00:14:26,550 --> 00:14:31,590
Some academics in the world have named
different normal forms, quote unquote.
252
00:14:31,590 --> 00:14:36,120
There's like first normal form,
second normal form, third normal form,
253
00:14:36,120 --> 00:14:39,420
this progression of making your
day more and more efficient.
254
00:14:39,420 --> 00:14:41,160
You can use those as heuristics.
255
00:14:41,160 --> 00:14:43,770
But end of the day, a few
principles might apply.
256
00:14:43,770 --> 00:14:48,750
First, take your entities, like
in this case stations and riders,
257
00:14:48,750 --> 00:14:51,840
and put them each in their own table.
258
00:14:51,840 --> 00:14:56,760
And if you add more data, make sure that
if I were to add a column, let's say,
259
00:14:56,760 --> 00:15:00,780
to riders, it is only a
data point about riders--
260
00:15:00,780 --> 00:15:05,190
not about stations, not about
fares, only about riders.
261
00:15:05,190 --> 00:15:08,220
And that process can help
us make a data set that
262
00:15:08,220 --> 00:15:14,050
is more dynamic, more easy to reproduce,
and more easy to write queries on.
263
00:15:14,050 --> 00:15:18,090
So that is the process
here of normalizing.
264
00:15:18,090 --> 00:15:18,960
OK.
265
00:15:18,960 --> 00:15:25,830
So if we have now riders and stations,
we want to represent them in our table.
266
00:15:25,830 --> 00:15:28,050
Well, we could use what
we learned about relating
267
00:15:28,050 --> 00:15:30,960
last week to ask, how
could we actually represent
268
00:15:30,960 --> 00:15:33,790
these riders and these stations?
269
00:15:33,790 --> 00:15:36,640
So let's say here I can
have riders and stations.
270
00:15:36,640 --> 00:15:40,330
I want to make sure that I have the
right relationship between them.
271
00:15:40,330 --> 00:15:42,360
Well, if you're familiar
with subways, we
272
00:15:42,360 --> 00:15:45,870
might say that a rider
goes to one station.
273
00:15:45,870 --> 00:15:50,070
And this big T here is the symbol
for a station here in Boston,
274
00:15:50,070 --> 00:15:52,890
for the T's that we
call it, for the subway.
275
00:15:52,890 --> 00:15:56,010
So a rider might go to one station.
276
00:15:56,010 --> 00:15:58,530
But of course, that might
not be the full picture.
277
00:15:58,530 --> 00:16:01,660
A rider also gets off
at some station on.
278
00:16:01,660 --> 00:16:06,990
So a rider could be associated with
not just one station, but multiple.
279
00:16:06,990 --> 00:16:10,380
And if you're familiar, at least with
any subway system or the Boston one,
280
00:16:10,380 --> 00:16:12,660
too, it can often get pretty busy.
281
00:16:12,660 --> 00:16:16,890
And so riders might not just go
to, of course, one station or two.
282
00:16:16,890 --> 00:16:22,750
Stations could also have multiple riders
that are on a particular station here.
283
00:16:22,750 --> 00:16:28,290
So to recap, one rider might be
associated with more than one station.
284
00:16:28,290 --> 00:16:33,030
They might get on at this first
one and get off at this later one.
285
00:16:33,030 --> 00:16:37,350
But each station could presumably
have more than one rider.
286
00:16:37,350 --> 00:16:41,130
Each station here could have
rider A or rider B, the rider
287
00:16:41,130 --> 00:16:47,050
up here or the rider down below, and
even many more than that as well.
288
00:16:47,050 --> 00:16:50,550
So to put it in the
language of our ER diagrams,
289
00:16:50,550 --> 00:16:53,160
our entity relation
diagrams from last week,
290
00:16:53,160 --> 00:16:57,840
we could look at it bit like this,
where we have riders and stations.
291
00:16:57,840 --> 00:17:02,550
Riders visit stations, and
they're associated like this.
292
00:17:02,550 --> 00:17:07,540
A rider must have at least one
station associated with them.
293
00:17:07,540 --> 00:17:09,270
That's what this horizontal line means.
294
00:17:09,270 --> 00:17:13,140
If they aren't at a station, they
aren't really a rider, right?
295
00:17:13,140 --> 00:17:16,960
A rider though could have many
stations associated with them.
296
00:17:16,960 --> 00:17:19,470
That's what this three
prongs down here means.
297
00:17:19,470 --> 00:17:22,440
They could have one,
two, three, four, they
298
00:17:22,440 --> 00:17:26,290
could have many stations
they get on and get off of.
299
00:17:26,290 --> 00:17:31,030
Now a station could have
anywhere between zero riders,
300
00:17:31,030 --> 00:17:35,290
if it's maybe out of commission or
isn't very popular, upwards to many.
301
00:17:35,290 --> 00:17:39,730
It could have two, three, four, five,
even hundreds of riders associated
302
00:17:39,730 --> 00:17:42,070
with this particular station.
303
00:17:42,070 --> 00:17:47,350
So here is our entity relation
diagram for these particular riders
304
00:17:47,350 --> 00:17:50,030
and these stations here.
305
00:17:50,030 --> 00:17:55,660
So let me ask, what questions do we have
on these relationships between riders
306
00:17:55,660 --> 00:17:59,700
and stations and how to
design this table so far?
307
00:17:59,700 --> 00:18:01,450
AUDIENCE: Then I want
to ask that you have
308
00:18:01,450 --> 00:18:06,370
used the same ID for stations
and riders so that maybe
309
00:18:06,370 --> 00:18:08,895
give us a problem in coding?
310
00:18:08,895 --> 00:18:10,520
CARTER ZENKE: Yeah, a good observation.
311
00:18:10,520 --> 00:18:14,950
So you might have noticed that in the
riders table and in the stations table,
312
00:18:14,950 --> 00:18:17,140
I gave the same kind of ID.
313
00:18:17,140 --> 00:18:19,602
Like I had one, two,
three for each of them.
314
00:18:19,602 --> 00:18:21,310
And let me just show
you that again here.
315
00:18:21,310 --> 00:18:24,950
I'll come back to some slides,
and I'll show you again,
316
00:18:24,950 --> 00:18:30,730
the riders table where we had
Charlie, Alice, and Bob, ID 1, 2, 3.
317
00:18:30,730 --> 00:18:31,870
Same for the stations.
318
00:18:31,870 --> 00:18:36,730
We had stations Harvard,
Kendall, Park Street, ID 1, 2, 3.
319
00:18:36,730 --> 00:18:39,950
And to your question,
isn't that a problem?
320
00:18:39,950 --> 00:18:43,540
Well, I would argue in
this case, it's not,
321
00:18:43,540 --> 00:18:48,460
so long as we keep clear that
these IDs are for stations
322
00:18:48,460 --> 00:18:50,800
and these IDs are for riders.
323
00:18:50,800 --> 00:18:54,550
And we'll see how to do that
using our SQL keywords later on.
324
00:18:54,550 --> 00:18:59,500
But again, so long as we have an
ID just for our riders and an ID
325
00:18:59,500 --> 00:19:02,770
just for our stations, we
can keep these separate
326
00:19:02,770 --> 00:19:05,560
even if they might have the same values.
327
00:19:05,560 --> 00:19:07,990
But a great question here.
328
00:19:07,990 --> 00:19:10,440
Let's take just one more.
329
00:19:10,440 --> 00:19:12,870
AUDIENCE: Regarding the
entity relationship diagram,
330
00:19:12,870 --> 00:19:17,340
how is it possible for a station to
have a possibility of zero riders,
331
00:19:17,340 --> 00:19:21,303
but riders must compulsorily
have at least one station.
332
00:19:21,303 --> 00:19:22,720
CARTER ZENKE: Yeah, good question.
333
00:19:22,720 --> 00:19:25,470
So this might be up to you
and how you formulate it,
334
00:19:25,470 --> 00:19:28,380
but for me, let me show
that diagram again.
335
00:19:28,380 --> 00:19:32,280
I'll go back to over here.
336
00:19:32,280 --> 00:19:36,540
In my mind, to be a rider,
you have to visit a station.
337
00:19:36,540 --> 00:19:39,780
If you aren't visiting a station,
you aren't really a rider, right?
338
00:19:39,780 --> 00:19:43,740
Now presumably, there are stations
that were built but aren't really being
339
00:19:43,740 --> 00:19:46,560
used right now or aren't
really in service yet.
340
00:19:46,560 --> 00:19:49,230
That could be a station
that has no visitors.
341
00:19:49,230 --> 00:19:50,910
So you could argue--
342
00:19:50,910 --> 00:19:54,090
let's make sure every station
has at least one rider
343
00:19:54,090 --> 00:19:56,910
and every rider may or may
not have to visit a station.
344
00:19:56,910 --> 00:19:59,640
For that I would say, we
could probably reasonably
345
00:19:59,640 --> 00:20:03,300
disagree there and talk about how we
could represent the diagram here, too.
346
00:20:03,300 --> 00:20:09,940
But a great observation and a good
other critique of this system here.
347
00:20:09,940 --> 00:20:11,150
All right.
348
00:20:11,150 --> 00:20:16,000
So let's now turn to representing
this in our database.
349
00:20:16,000 --> 00:20:19,180
I'll go back to my
computer and we'll learn
350
00:20:19,180 --> 00:20:23,440
about this new SQL
keyword, SQL statement,
351
00:20:23,440 --> 00:20:26,200
this one called Create Table.
352
00:20:26,200 --> 00:20:31,400
Create table allows us to, as the name
suggests, create a brand new table.
353
00:20:31,400 --> 00:20:34,450
So let's do just that
in our new database
354
00:20:34,450 --> 00:20:37,480
to represent riders and stations.
355
00:20:37,480 --> 00:20:42,820
I'll go into my terminal, and I
want to make a brand new database.
356
00:20:42,820 --> 00:20:49,120
I'll call this one mbta.db, because
mbta stands for the Massachusetts Bay
357
00:20:49,120 --> 00:20:52,630
Transportation Authority, the people
who run the subway, essentially.
358
00:20:52,630 --> 00:20:57,610
So I'll do sqlite3,
mbta.db, hit Enter, and I'll
359
00:20:57,610 --> 00:21:04,220
type Y to say yes, I want to
create this brand new database.
360
00:21:04,220 --> 00:21:09,010
Now if I type dot schema, I see nothing.
361
00:21:09,010 --> 00:21:10,150
But that's expected.
362
00:21:10,150 --> 00:21:11,530
I don't have any tables yet.
363
00:21:11,530 --> 00:21:13,870
I have nothing inside this database.
364
00:21:13,870 --> 00:21:17,350
It's up to me to create
these tables myself.
365
00:21:17,350 --> 00:21:22,240
So what I'll do is clear my terminal,
and let's start first with riders.
366
00:21:22,240 --> 00:21:23,890
I might create a table for riders.
367
00:21:23,890 --> 00:21:26,440
I'll say, create table.
368
00:21:26,440 --> 00:21:29,500
And now I have to give
that table some name.
369
00:21:29,500 --> 00:21:31,960
I might call it riders here.
370
00:21:31,960 --> 00:21:36,310
And then in parentheses,
like this, I can specify what
371
00:21:36,310 --> 00:21:39,500
columns should be part of this table.
372
00:21:39,500 --> 00:21:40,540
So let's start first.
373
00:21:40,540 --> 00:21:44,320
I'll hit Enter here and
continue this query.
374
00:21:44,320 --> 00:21:47,770
Now, all by convention, I'll
just indent four spaces--
375
00:21:47,770 --> 00:21:52,760
one, two, three, four, and I'll
give an ID to each of these riders
376
00:21:52,760 --> 00:21:54,430
as we saw before.
377
00:21:54,430 --> 00:21:58,180
I'll say ID here is one of my columns.
378
00:21:58,180 --> 00:22:04,300
Now to create a new column, I'll follow
this up with a comma and hit Enter.
379
00:22:04,300 --> 00:22:08,350
I'll again, by convention, for
style, just indent four spaces.
380
00:22:08,350 --> 00:22:09,730
And what's my next column?
381
00:22:09,730 --> 00:22:12,160
Perhaps a name for these riders.
382
00:22:12,160 --> 00:22:17,530
I'll give this column the name,
name, and I'll leave it at that.
383
00:22:17,530 --> 00:22:22,660
Once I'm done adding columns, I
no longer need to have a comma.
384
00:22:22,660 --> 00:22:26,320
I could simply close out
this query, this statement.
385
00:22:26,320 --> 00:22:29,740
I could hit Enter here,
say close in parentheses
386
00:22:29,740 --> 00:22:33,400
to close the top parentheses
here, semicolon, hit Enter.
387
00:22:33,400 --> 00:22:36,730
And now nothing seems to happen.
388
00:22:36,730 --> 00:22:38,330
But that's actually a good sign.
389
00:22:38,330 --> 00:22:41,290
So let me type dot schema,
hit Enter, and I'll
390
00:22:41,290 --> 00:22:44,650
see the result of that statement before.
391
00:22:44,650 --> 00:22:48,820
Create table, if it doesn't
already exist, riders, and riders
392
00:22:48,820 --> 00:22:53,590
has inside of it two
columns, ID and name.
393
00:22:53,590 --> 00:22:54,740
OK, let's keep going.
394
00:22:54,740 --> 00:22:56,800
Let's make one for stations, too.
395
00:22:56,800 --> 00:22:59,380
I'll clear my terminal
and I'll say, create me
396
00:22:59,380 --> 00:23:02,740
a table called stations and include--
397
00:23:02,740 --> 00:23:04,330
actually, not station.
398
00:23:04,330 --> 00:23:09,140
If you ever want to, let's say
fix this kind of table here,
399
00:23:09,140 --> 00:23:14,500
let me try closing the
parentheses, hit semicolon, Enter.
400
00:23:14,500 --> 00:23:15,850
I'll get a syntax error.
401
00:23:15,850 --> 00:23:17,050
I can restart.
402
00:23:17,050 --> 00:23:22,180
I'll do Control L. Now I'll do
create table, stations, plural.
403
00:23:22,180 --> 00:23:24,250
Open parentheses, Enter.
404
00:23:24,250 --> 00:23:25,870
I'll indent by four spaces.
405
00:23:25,870 --> 00:23:27,430
One, two, three, four.
406
00:23:27,430 --> 00:23:31,870
And now, I'll similarly include
an ID for each of these stations.
407
00:23:31,870 --> 00:23:36,220
I'll say ID comma, and then
what all should I have?
408
00:23:36,220 --> 00:23:40,840
Well, stations tend to have a
name, like the Kendall MIT station,
409
00:23:40,840 --> 00:23:43,480
the Harvard station,
the Park Street Station.
410
00:23:43,480 --> 00:23:48,760
So I'll say, I'll give each of
these their very own name, comma.
411
00:23:48,760 --> 00:23:50,770
What else do stations have?
412
00:23:50,770 --> 00:23:54,220
Well, they might also have
a line that they're on.
413
00:23:54,220 --> 00:23:58,700
Let's say it's the red line, or the
blue line, or the green line and so on.
414
00:23:58,700 --> 00:24:03,720
I'll have a space for them to write
in their line that they're a part of.
415
00:24:03,720 --> 00:24:06,160
OK, and maybe I'll leave it
at that to keep it simple.
416
00:24:06,160 --> 00:24:09,490
I'll say, stations have
an ID, a name and a line.
417
00:24:09,490 --> 00:24:10,750
Now I'll close this out.
418
00:24:10,750 --> 00:24:15,430
I'll say end parentheses, semicolon,
hit Enter, and nothing seems to happen.
419
00:24:15,430 --> 00:24:20,770
But if I type dot schema, I'll
now see my riders and my stations
420
00:24:20,770 --> 00:24:24,870
tables inside of my database.
421
00:24:24,870 --> 00:24:26,760
Now one more step here.
422
00:24:26,760 --> 00:24:31,920
We have riders, we have stations,
but we have to relate to them.
423
00:24:31,920 --> 00:24:34,800
We have to relate them using
this many to many relationships,
424
00:24:34,800 --> 00:24:37,420
as we saw last week.
425
00:24:37,420 --> 00:24:40,950
So let me try making
a table to implement
426
00:24:40,950 --> 00:24:43,110
this many to many relationship.
427
00:24:43,110 --> 00:24:47,400
And if you remember, we might call
this kind of table a junction table,
428
00:24:47,400 --> 00:24:50,040
an associative entity, a join table.
429
00:24:50,040 --> 00:24:53,730
It has a lot of names, but
it looks a bit like this.
430
00:24:53,730 --> 00:24:58,710
I'll create this new table to
represent, let's say, visits.
431
00:24:58,710 --> 00:25:01,110
A rider visits a station.
432
00:25:01,110 --> 00:25:03,780
So I'll call this table visits.
433
00:25:03,780 --> 00:25:09,840
And inside, I'll make sure it
has two columns, one for a rider
434
00:25:09,840 --> 00:25:17,130
ID to represent a rider, and
one, let's say, for a station ID,
435
00:25:17,130 --> 00:25:19,020
to represent a station.
436
00:25:19,020 --> 00:25:24,630
Now when I see a rider ID next
to a station ID in this table,
437
00:25:24,630 --> 00:25:27,870
I'll know the rider with
that certain ID visited
438
00:25:27,870 --> 00:25:30,180
the station with that certain ID.
439
00:25:30,180 --> 00:25:31,500
So I'll close this out.
440
00:25:31,500 --> 00:25:36,330
I'll say end, parentheses here,
semicolon, Enter, and finally, clear
441
00:25:36,330 --> 00:25:38,940
my terminal, type dot schema.
442
00:25:38,940 --> 00:25:45,000
And I can see, I have riders, stations,
and visits between riders and stations
443
00:25:45,000 --> 00:25:48,630
in this associative entity, this
junction table or a joined table.
444
00:25:48,630 --> 00:25:51,990
Up to you what you might
want to call it in this case.
445
00:25:51,990 --> 00:25:55,840
Now what questions do we have?
446
00:25:55,840 --> 00:25:58,480
AUDIENCE: Why we have
not use the primary key
447
00:25:58,480 --> 00:26:00,563
and secondary key in this table?
448
00:26:00,563 --> 00:26:01,730
CARTER ZENKE: Good question.
449
00:26:01,730 --> 00:26:03,605
So we're going to get
there in just a minute.
450
00:26:03,605 --> 00:26:07,120
But if I look back at my terminal
here, my schema, I'll see,
451
00:26:07,120 --> 00:26:09,340
I really just have column names.
452
00:26:09,340 --> 00:26:13,840
And we saw before and we typed
dot schema on our longlist.db,
453
00:26:13,840 --> 00:26:16,150
we had more than just column names.
454
00:26:16,150 --> 00:26:19,400
We had column names, we
had perhaps data types,
455
00:26:19,400 --> 00:26:21,807
we had primary keys and foreign keys.
456
00:26:21,807 --> 00:26:23,390
So we'll get to that in just a minute.
457
00:26:23,390 --> 00:26:28,300
But suffice to say for now, we're
going to keep improving this over time.
458
00:26:28,300 --> 00:26:30,580
Let's take one more.
459
00:26:30,580 --> 00:26:33,300
AUDIENCE: Is it required to put spaces--
460
00:26:33,300 --> 00:26:37,422
the four spaces indents, or
that's just for the visual look?
461
00:26:37,422 --> 00:26:38,880
CARTER ZENKE: Yeah, great question.
462
00:26:38,880 --> 00:26:43,260
Is it required to have these four
spaces before each column name.
463
00:26:43,260 --> 00:26:46,720
And in fact, no, it's not, but
it makes the code more readable.
464
00:26:46,720 --> 00:26:49,590
So I could put this all in one line--
465
00:26:49,590 --> 00:26:51,270
I shouldn't, but I could.
466
00:26:51,270 --> 00:26:55,620
And if I have instead this new
line followed by four spaces,
467
00:26:55,620 --> 00:27:00,390
I can make this more readable for
myself and for my colleagues, too.
468
00:27:00,390 --> 00:27:02,060
Good question.
469
00:27:02,060 --> 00:27:07,430
OK, so to our earlier
point, there are things
470
00:27:07,430 --> 00:27:09,680
that are missing from this schema.
471
00:27:09,680 --> 00:27:13,800
Like, we have column names,
but as we saw before,
472
00:27:13,800 --> 00:27:16,550
we should ideally
specify what kind of data
473
00:27:16,550 --> 00:27:20,870
should be able to go into
each of these columns.
474
00:27:20,870 --> 00:27:24,990
And for that, we'll need some
new ideas to talk about here.
475
00:27:24,990 --> 00:27:30,950
So let's focus now on this idea
of data types and storage classes.
476
00:27:30,950 --> 00:27:36,680
Data types and storage classes are
two very similar but distinct IDs,
477
00:27:36,680 --> 00:27:40,230
and they're distinct in a way
we'll talk about in just a minute.
478
00:27:40,230 --> 00:27:45,770
Now SQLite has five storage
classes, five kind of storage--
479
00:27:45,770 --> 00:27:49,520
kind of type, so to speak,
of values that can hold.
480
00:27:49,520 --> 00:27:51,920
So let's talk about the first
one, null, for instance.
481
00:27:51,920 --> 00:27:54,230
Null in this case means nothing.
482
00:27:54,230 --> 00:27:57,170
There's nothing that actually
is inside of this value.
483
00:27:57,170 --> 00:27:59,990
It's kind of a central value
to mean nothing is here.
484
00:27:59,990 --> 00:28:04,700
Integer means a whole number,
like 1, 2, 3, 4, or 5.
485
00:28:04,700 --> 00:28:10,880
Real talks about decimals like floating
points like 1.2 or 2.4 and so on.
486
00:28:10,880 --> 00:28:12,740
Text is used for characters.
487
00:28:12,740 --> 00:28:19,740
And blob, kind of a funny one, blob
stands for Binary Large Object,
488
00:28:19,740 --> 00:28:24,570
and it represents the data exactly
as I give it to this value.
489
00:28:24,570 --> 00:28:30,800
If I tell it to store 101010, it
will store exactly 101010 in binary.
490
00:28:30,800 --> 00:28:34,220
So useful for storing, in this
case like images and video files
491
00:28:34,220 --> 00:28:37,160
and audio files, things
that have some structure
492
00:28:37,160 --> 00:28:40,030
we don't want to mess around with.
493
00:28:40,030 --> 00:28:44,290
Now let's focus on this
idea of a storage class.
494
00:28:44,290 --> 00:28:50,830
These, I'll say, are storage classes
and not data types in SQLite.
495
00:28:50,830 --> 00:28:58,390
Now a storage class like integer can
comprise, can hold several data types.
496
00:28:58,390 --> 00:29:01,480
Notice how there are
seven different data
497
00:29:01,480 --> 00:29:05,380
types that could be stored under
this integer storage class.
498
00:29:05,380 --> 00:29:10,960
We have a 6-byte integer, 2-byte
integer, a 8 and 0 and so on.
499
00:29:10,960 --> 00:29:13,750
It could be any of
these particular types,
500
00:29:13,750 --> 00:29:19,330
but each of these types under the
umbrella of this integer storage class.
501
00:29:19,330 --> 00:29:23,740
And SQLite itself will
take care of making sure
502
00:29:23,740 --> 00:29:25,750
it uses the appropriate data type.
503
00:29:25,750 --> 00:29:30,910
Like if I give a very large number,
like let's say 4 billion or 6 billion
504
00:29:30,910 --> 00:29:35,200
or even bigger than that, it
will probably use a longer--
505
00:29:35,200 --> 00:29:39,010
that is, a bigger byte integer
to store that kind of value.
506
00:29:39,010 --> 00:29:42,610
If I give it a smaller
one, like 1, 2, 3, or 4,
507
00:29:42,610 --> 00:29:47,140
it will probably use a 1-byte
or a 2-byte integer for that.
508
00:29:47,140 --> 00:29:50,560
But SQLite's idea is
that I, as a programmer,
509
00:29:50,560 --> 00:29:54,910
shouldn't have to care if I use an
8-byte or a 1-byte or a 2-byte integer,
510
00:29:54,910 --> 00:29:58,180
I just care that I'm using
integers, whole numbers,
511
00:29:58,180 --> 00:30:03,670
and they give me a storage class to
use any of these up to their choice
512
00:30:03,670 --> 00:30:05,960
here as well.
513
00:30:05,960 --> 00:30:10,930
Now let's look at a few examples of
values in SQLite that we could store.
514
00:30:10,930 --> 00:30:14,590
Well, we have perhaps the
red line as some text.
515
00:30:14,590 --> 00:30:16,690
And because this is
characters, it's quoted,
516
00:30:16,690 --> 00:30:22,330
we could use the text storage class to
represent this particular value here.
517
00:30:22,330 --> 00:30:25,630
We could have maybe an image,
and to the earlier point,
518
00:30:25,630 --> 00:30:29,350
we could say, well, this image
might be best represented
519
00:30:29,350 --> 00:30:33,490
using a blob, a binary large
object, to keep all of these pixels
520
00:30:33,490 --> 00:30:36,790
exactly as they are in this image.
521
00:30:36,790 --> 00:30:40,330
But we do get some choice,
some interesting design
522
00:30:40,330 --> 00:30:43,720
challenges when we look
at the idea of fares.
523
00:30:43,720 --> 00:30:50,230
So let's say to our point earlier,
fares are $0.10 back in the 1950s or so.
524
00:30:50,230 --> 00:30:55,960
Well, $0.10 we could store as an
integer, which seems just fine.
525
00:30:55,960 --> 00:30:58,480
But this could get confused.
526
00:30:58,480 --> 00:31:00,790
I'm talking about dollars here or cents?
527
00:31:00,790 --> 00:31:03,205
Maybe it would be better,
let's say, if I did this.
528
00:31:03,205 --> 00:31:07,610
A dollar sign, 0.10.
529
00:31:07,610 --> 00:31:09,680
And what might that be stored as?
530
00:31:09,680 --> 00:31:11,630
Well, probably text, right?
531
00:31:11,630 --> 00:31:14,840
I could say this dollar
sign isn't really a number,
532
00:31:14,840 --> 00:31:18,140
but now I have to include it, so I'll
say this will be quoted, essentially,
533
00:31:18,140 --> 00:31:22,358
as dollar sign 0.10.
534
00:31:22,358 --> 00:31:23,900
Now there's some downsides here, too.
535
00:31:23,900 --> 00:31:25,550
Like let's say I have--
536
00:31:25,550 --> 00:31:26,840
I want to add these up.
537
00:31:26,840 --> 00:31:28,580
Well, I can't add up text.
538
00:31:28,580 --> 00:31:33,140
Like, what does it mean to say dollar
sign 0.10 plus dollar sign 0.20.
539
00:31:33,140 --> 00:31:35,630
I can't do math with text.
540
00:31:35,630 --> 00:31:41,090
So maybe it'll be better after all if
I used a real or a decimal like this,
541
00:31:41,090 --> 00:31:43,970
0.10.
542
00:31:43,970 --> 00:31:46,880
But I mean even here, you'll
run into some problems.
543
00:31:46,880 --> 00:31:50,780
If you are familiar with how the
number is represented in binary,
544
00:31:50,780 --> 00:31:54,380
you might know that decimal
values or floating point values
545
00:31:54,380 --> 00:31:57,020
can't be perfectly,
precisely represented.
546
00:31:57,020 --> 00:32:04,850
And if I talk about 0.10, the
computer might store 0.10000056789.
547
00:32:04,850 --> 00:32:09,450
It could get very wonky out to the many,
many decimal digits down below here.
548
00:32:09,450 --> 00:32:13,460
So trade offs and challenges overall.
549
00:32:13,460 --> 00:32:15,080
Let's look at these three, though.
550
00:32:15,080 --> 00:32:17,480
I have the first one
to store as an integer.
551
00:32:17,480 --> 00:32:23,000
I'm trying to store fares here,
second one as text, and the third one
552
00:32:23,000 --> 00:32:27,320
as a floating point or
a real in this case.
553
00:32:27,320 --> 00:32:30,350
Let me ask for some
opinions here, which one
554
00:32:30,350 --> 00:32:36,297
would you use and why, for trying
represent fares in this case?
555
00:32:36,297 --> 00:32:37,130
AUDIENCE: Thank you.
556
00:32:37,130 --> 00:32:41,700
I prefer using integers
because of course,
557
00:32:41,700 --> 00:32:44,690
I need to get the
calculation very accurately.
558
00:32:44,690 --> 00:32:45,830
That's my point of view.
559
00:32:45,830 --> 00:32:50,030
Well, sometimes I can
use float but, you know,
560
00:32:50,030 --> 00:32:52,310
like you said before,
it can get very wonky
561
00:32:52,310 --> 00:32:59,815
if I really need that kind of precision,
I don't really recommend using floats.
562
00:32:59,815 --> 00:33:01,190
CARTER ZENKE: Yeah, a good point.
563
00:33:01,190 --> 00:33:04,820
So if I go back to some slides here,
you might argue for the integer
564
00:33:04,820 --> 00:33:07,940
because you know you can
precisely represent integers.
565
00:33:07,940 --> 00:33:12,290
And let's say I want to add up
fares over a lot, a lot of riders.
566
00:33:12,290 --> 00:33:15,020
This might be useful for me because
I know that each number will
567
00:33:15,020 --> 00:33:16,820
be perfectly, precisely represented.
568
00:33:16,820 --> 00:33:20,720
I can do lots of big kind of
math with this number here.
569
00:33:20,720 --> 00:33:24,260
To your point, this decimal
might kind of, as you said,
570
00:33:24,260 --> 00:33:26,840
get wonky later on towards
the later decimal points.
571
00:33:26,840 --> 00:33:30,980
I might get some unexpected
results if we add up these overall.
572
00:33:30,980 --> 00:33:36,380
Let me ask, though, are there any
proponents of this floating point
573
00:33:36,380 --> 00:33:40,120
value or a real value?
574
00:33:40,120 --> 00:33:42,930
AUDIENCE: So I think the--
575
00:33:42,930 --> 00:33:48,810
I think the float is the
number of, for example,
576
00:33:48,810 --> 00:33:56,190
for each pair, like the
answers like truncation
577
00:33:56,190 --> 00:33:58,875
probably suggest by the comments there.
578
00:33:58,875 --> 00:34:00,250
CARTER ZENKE: Yeah, a good point.
579
00:34:00,250 --> 00:34:02,083
So if you talk about
using this float value,
580
00:34:02,083 --> 00:34:06,491
I mean, one thing we could say for
it is that this decimal could be--
581
00:34:06,491 --> 00:34:09,449
it's more accurate to say, like,
where you're working with dollars now,
582
00:34:09,449 --> 00:34:13,259
and we could have maybe $0.10,
which is only 0.1 of a dollar.
583
00:34:13,259 --> 00:34:15,030
I totally hear that point as well.
584
00:34:15,030 --> 00:34:17,280
And the point we're making
here is that they're really
585
00:34:17,280 --> 00:34:20,580
just trade offs among
these data storage classes
586
00:34:20,580 --> 00:34:23,699
to use, whether you're using
integers or real values,
587
00:34:23,699 --> 00:34:27,730
it just depends on your use case
and what you're designing for.
588
00:34:27,730 --> 00:34:30,900
So be sure to read up on trade
offs among these data types,
589
00:34:30,900 --> 00:34:35,420
and determine for yourself
which one should you best use.
590
00:34:35,420 --> 00:34:36,409
OK.
591
00:34:36,409 --> 00:34:41,090
So we have now these storage
classes to store values,
592
00:34:41,090 --> 00:34:47,210
and it turns out that columns can be
made to store certain storage classes
593
00:34:47,210 --> 00:34:49,460
or prioritize certain classes.
594
00:34:49,460 --> 00:34:53,750
And the key distinction here
is that columns in SQLite
595
00:34:53,750 --> 00:34:57,440
don't always store one particular type.
596
00:34:57,440 --> 00:35:01,250
Instead, they have type
affinities, meaning
597
00:35:01,250 --> 00:35:07,280
that they'll try to convert some value
you insert into a given cell or given
598
00:35:07,280 --> 00:35:11,660
row to the type they
have the affinity for.
599
00:35:11,660 --> 00:35:16,490
Now there are, let's say five
type affinities in SQLite--
600
00:35:16,490 --> 00:35:17,270
text.
601
00:35:17,270 --> 00:35:19,610
Columns can be of the
type affinity text,
602
00:35:19,610 --> 00:35:22,520
meaning they store just
characters at the end of the day.
603
00:35:22,520 --> 00:35:27,950
There's also numeric, which stores
either integer values or real values,
604
00:35:27,950 --> 00:35:31,010
depending on which one
seems best to convert to.
605
00:35:31,010 --> 00:35:33,980
You have integer type affinity, which
means it can store whole numbers.
606
00:35:33,980 --> 00:35:36,990
Real, to store floating
points or decimal values.
607
00:35:36,990 --> 00:35:39,120
And we have blob here
again, our old friend
608
00:35:39,120 --> 00:35:42,030
to store binary exactly as we get it.
609
00:35:42,030 --> 00:35:44,750
So whereas before we were
talking about storage classes--
610
00:35:44,750 --> 00:35:47,090
those are associated
with individual values--
611
00:35:47,090 --> 00:35:51,210
type affinities are now associated
with individual columns.
612
00:35:51,210 --> 00:35:55,460
So let's see some example of
how this might work in SQLite.
613
00:35:55,460 --> 00:35:59,000
Let's say I have this
table of fares, and we've
614
00:35:59,000 --> 00:36:02,720
decided to store fares as integers.
615
00:36:02,720 --> 00:36:09,470
Well, if I said that this column called
amount has the affinity for the text
616
00:36:09,470 --> 00:36:14,300
storage class, what would happen
is if I insert this integer, 10,
617
00:36:14,300 --> 00:36:16,490
it would look a bit like this later on.
618
00:36:16,490 --> 00:36:20,250
It would be converted to text,
it would be quoted in a sense,
619
00:36:20,250 --> 00:36:24,240
to represent-- it's now been
converted to some set of characters.
620
00:36:24,240 --> 00:36:27,050
Let's say I insert this value 25.
621
00:36:27,050 --> 00:36:31,250
Well, 25 has a storage class
right now of an integer.
622
00:36:31,250 --> 00:36:33,560
It is a whole number.
623
00:36:33,560 --> 00:36:38,480
But if I insert this into a
column that has the text affinity,
624
00:36:38,480 --> 00:36:44,250
it will be converted into, in this
case, text at the end of the day.
625
00:36:44,250 --> 00:36:45,920
Let's do the opposite.
626
00:36:45,920 --> 00:36:51,270
Let's say I have my fare as a
string, some text in this case.
627
00:36:51,270 --> 00:36:54,690
I want to insert it into
this column called amount,
628
00:36:54,690 --> 00:36:58,820
but now amount has the
integer type affinity.
629
00:36:58,820 --> 00:37:03,830
Well, if I insert 10, quote
unquote, into the column amount,
630
00:37:03,830 --> 00:37:10,850
I'll get back not 10, the text, but 10
the integer, because again, amount--
631
00:37:10,850 --> 00:37:16,310
this column here-- has an affinity
for the integer storage class.
632
00:37:16,310 --> 00:37:17,210
Let's try this.
633
00:37:17,210 --> 00:37:21,650
25, some text again I'll
insert it into this table.
634
00:37:21,650 --> 00:37:26,690
Now I'll have 25 as an integer.
635
00:37:26,690 --> 00:37:31,300
So this is how SQLite allows
us to give certain columns
636
00:37:31,300 --> 00:37:33,610
an affinity for certain
types, that they'll
637
00:37:33,610 --> 00:37:38,860
try to store values of that type, so
long as we insert values it could be
638
00:37:38,860 --> 00:37:41,900
feasibly converted to that type here.
639
00:37:41,900 --> 00:37:45,820
So let's go back to our schema
and try to improve it now,
640
00:37:45,820 --> 00:37:50,860
to use not just column names,
but also type affinities to store
641
00:37:50,860 --> 00:37:53,140
certain data of a certain type.
642
00:37:53,140 --> 00:37:58,000
Go back to my computer here, and
let's improve this once more.
643
00:37:58,000 --> 00:38:00,910
So I'll go over to my table.
644
00:38:00,910 --> 00:38:05,890
And now, I probably want
to improve the design here.
645
00:38:05,890 --> 00:38:08,680
And often, if I want to
improve this, I might just
646
00:38:08,680 --> 00:38:10,910
need to erase what I've already done.
647
00:38:10,910 --> 00:38:16,210
So let me introduce this new keyword,
this new statement called Drop Table.
648
00:38:16,210 --> 00:38:19,460
To drop a table means to delete
it, to remove it, effectively.
649
00:38:19,460 --> 00:38:23,080
So let me try doing this for
riders, stations, and visits.
650
00:38:23,080 --> 00:38:28,840
I'll type drop table
writers, semicolon, Enter.
651
00:38:28,840 --> 00:38:34,240
Nothing seems to happen, but if I
type dot schema, well, riders is gone.
652
00:38:34,240 --> 00:38:40,960
I'll try drop table stations, then
semicolon, hit Enter, and type,
653
00:38:40,960 --> 00:38:42,760
let's say dot schema again.
654
00:38:42,760 --> 00:38:44,320
No more stations.
655
00:38:44,320 --> 00:38:49,030
I'll try drop table
visits, semicolon, Enter,
656
00:38:49,030 --> 00:38:52,510
and then dot schema, and our table--
657
00:38:52,510 --> 00:38:54,160
our database is really gone.
658
00:38:54,160 --> 00:38:57,280
There are no more tables inside of it.
659
00:38:57,280 --> 00:39:02,370
So let me propose that instead
of working inside of the SQLite
660
00:39:02,370 --> 00:39:05,730
prompt, like typing out again, and
again, and again, create table riders,
661
00:39:05,730 --> 00:39:07,650
create table stations,
create table visits,
662
00:39:07,650 --> 00:39:10,650
let me be more efficient
about this and create myself
663
00:39:10,650 --> 00:39:14,790
a schema file that I could
reuse throughout this lesson
664
00:39:14,790 --> 00:39:18,870
and also later on while I'm
working on this database on my own.
665
00:39:18,870 --> 00:39:23,850
To do that, let me quit my
SQLite prompt here, and let me
666
00:39:23,850 --> 00:39:27,300
type something like code schema dot sql.
667
00:39:27,300 --> 00:39:31,620
I'm just creating this
file called schema.sql.
668
00:39:31,620 --> 00:39:36,060
Now a dot SQL file allows
me to type in SQL keywords
669
00:39:36,060 --> 00:39:38,790
and have them be syntax
highlighted, so I know
670
00:39:38,790 --> 00:39:42,040
what's going on inside of this file.
671
00:39:42,040 --> 00:39:44,130
So let's just try this once more.
672
00:39:44,130 --> 00:39:49,170
I'll type Create table
riders, and inside I'll
673
00:39:49,170 --> 00:39:53,820
say it has the ID column
of what type affinity?
674
00:39:53,820 --> 00:39:58,270
Well, IDs are whole numbers, so
perhaps integer in this case.
675
00:39:58,270 --> 00:40:02,160
I could say, ID has the
integer type affinity.
676
00:40:02,160 --> 00:40:05,010
Now let me say the
riders also have a name,
677
00:40:05,010 --> 00:40:07,710
and how could names be best represented?
678
00:40:07,710 --> 00:40:09,210
Maybe text, right?
679
00:40:09,210 --> 00:40:10,120
Characters here.
680
00:40:10,120 --> 00:40:12,750
So I'll say name and text.
681
00:40:12,750 --> 00:40:14,850
Now I'll include a
semicolon to say, this
682
00:40:14,850 --> 00:40:17,820
is the end of my create table statement.
683
00:40:17,820 --> 00:40:21,630
And before, remember how
I had to kind of air out
684
00:40:21,630 --> 00:40:25,560
or I had to like, backspace and
so on to improve the design?
685
00:40:25,560 --> 00:40:28,890
Here I can literally just point
and click and edit this file
686
00:40:28,890 --> 00:40:30,630
to improve my schema.
687
00:40:30,630 --> 00:40:35,130
And I'll later on apply
this in my database using
688
00:40:35,130 --> 00:40:37,418
a command that we'll see a bit later.
689
00:40:37,418 --> 00:40:38,460
So let's keep going here.
690
00:40:38,460 --> 00:40:44,040
I'll say create table stations,
and inside of the stations table,
691
00:40:44,040 --> 00:40:48,330
I'll make sure it has an
ID column of type integer,
692
00:40:48,330 --> 00:40:51,960
a name column that
probably stores some text,
693
00:40:51,960 --> 00:40:57,270
and a line column that also
stores some text, where name
694
00:40:57,270 --> 00:41:00,130
is the name of the station
like Kendell MIT, Harvard,
695
00:41:00,130 --> 00:41:05,640
and line is blue line or green line,
what line it's part of in our subway.
696
00:41:05,640 --> 00:41:07,290
Let me try now visits.
697
00:41:07,290 --> 00:41:08,850
I'll say, create table visits.
698
00:41:08,850 --> 00:41:13,620
And then I'll do rider ID,
which has what type affinity?
699
00:41:13,620 --> 00:41:15,030
Probably integer.
700
00:41:15,030 --> 00:41:19,680
And then I'll do station ID, which
has this same type of affinity.
701
00:41:19,680 --> 00:41:23,070
It's relying on integers,
whole numbers here for IDs.
702
00:41:23,070 --> 00:41:25,450
So my colon to finish this statement.
703
00:41:25,450 --> 00:41:29,040
Now this is my schema as a whole.
704
00:41:29,040 --> 00:41:31,830
I have riders, stations, and visits.
705
00:41:31,830 --> 00:41:36,760
But now I want to apply
this schema to my database.
706
00:41:36,760 --> 00:41:37,780
So what could I do?
707
00:41:37,780 --> 00:41:39,330
I could reopen it, let's say.
708
00:41:39,330 --> 00:41:43,510
I'll do sqlite3 mbta.db in my terminal.
709
00:41:43,510 --> 00:41:47,370
And now I want to read
in this schema.sql file,
710
00:41:47,370 --> 00:41:51,340
just run the commands that
are inside of this file.
711
00:41:51,340 --> 00:41:52,660
So what could I do?
712
00:41:52,660 --> 00:41:59,040
I could say dot read schema.sql,
where dot read is a command that
713
00:41:59,040 --> 00:42:01,860
says take whatever file you
give me, like schema.sql
714
00:42:01,860 --> 00:42:08,070
and read it into this database, running
any SQL keywords you come across there.
715
00:42:08,070 --> 00:42:09,450
I'll hit Enter.
716
00:42:09,450 --> 00:42:11,640
And now nothing seems to happen.
717
00:42:11,640 --> 00:42:17,658
But if I type dot schema, I
now see my schema improved.
718
00:42:17,658 --> 00:42:19,950
And this is helpful for me,
because what I could do now
719
00:42:19,950 --> 00:42:23,580
is I could just edit my schema.sql
file and rerun it and rerun
720
00:42:23,580 --> 00:42:29,770
it to make sure I now have these
tables being improved over time.
721
00:42:29,770 --> 00:42:30,730
OK.
722
00:42:30,730 --> 00:42:37,720
So this, then, is our new
representation of our database.
723
00:42:37,720 --> 00:42:42,280
We have riders, of course,
their own entity, and stations.
724
00:42:42,280 --> 00:42:47,170
They have an ID and a name, and
stations have an ID, a name, and a line.
725
00:42:47,170 --> 00:42:52,810
We've also now included these type
affinities-- integer, text, integer,
726
00:42:52,810 --> 00:42:56,320
text, to tell SQL what
kinds of storage classes
727
00:42:56,320 --> 00:43:00,490
could be put inside of
each of these columns.
728
00:43:00,490 --> 00:43:03,580
Now before we keep
improving this, let me ask,
729
00:43:03,580 --> 00:43:10,780
what questions do we have on these
storage classes and type affinities?
730
00:43:10,780 --> 00:43:12,400
AUDIENCE: That you would use--
731
00:43:12,400 --> 00:43:17,870
when you were creating the
table, the table was not in line.
732
00:43:17,870 --> 00:43:21,340
So when we search for
the authors of books,
733
00:43:21,340 --> 00:43:25,982
so it comes with a perfect table, so how
can we make a perfect table in the SQL?
734
00:43:25,982 --> 00:43:27,940
CARTER ZENKE: Yeah, do
you mind clarifying what
735
00:43:27,940 --> 00:43:30,160
you mean by like, the perfect table?
736
00:43:30,160 --> 00:43:33,760
AUDIENCE: So I mean that it
was arranged in something
737
00:43:33,760 --> 00:43:36,443
like in boxes, block boxes.
738
00:43:36,443 --> 00:43:37,610
CARTER ZENKE: Good question.
739
00:43:37,610 --> 00:43:41,770
So before we able to see
the results of our queries
740
00:43:41,770 --> 00:43:46,660
inside some boxes in our terminal,
and that is actually a mode of SQLite,
741
00:43:46,660 --> 00:43:51,370
I think type like dot mode table to
see your results in that version.
742
00:43:51,370 --> 00:43:54,937
Here though, we have no
data inside of our tables,
743
00:43:54,937 --> 00:43:56,770
so I can't really select
anything from them.
744
00:43:56,770 --> 00:44:01,300
Like, if I go to my
terminal here and I try
745
00:44:01,300 --> 00:44:04,690
to select some data
from this riders table,
746
00:44:04,690 --> 00:44:10,510
let me say select star from rider's,
semicolon, I won't get anything back.
747
00:44:10,510 --> 00:44:15,040
Next week, though, we'll see how to
insert, and add, and update, and delete
748
00:44:15,040 --> 00:44:17,990
data inside of these
tables, at which point,
749
00:44:17,990 --> 00:44:21,460
you could write, select star
from riders and see some data
750
00:44:21,460 --> 00:44:23,560
you've inserted yourself.
751
00:44:23,560 --> 00:44:25,120
Great question.
752
00:44:25,120 --> 00:44:27,580
Let's take one more here.
753
00:44:27,580 --> 00:44:28,360
AUDIENCE: Yes.
754
00:44:28,360 --> 00:44:33,130
I would like to know if you have
a column of the type boolean.
755
00:44:33,130 --> 00:44:36,730
CARTER ZENKE: Yeah, do we have a
Boolean type affinity, let's say.
756
00:44:36,730 --> 00:44:39,850
So here, we don't, at least in SQLite.
757
00:44:39,850 --> 00:44:45,610
Some other DBMS', Database Management
Systems might have bool or Boolean,
758
00:44:45,610 --> 00:44:47,110
true or false, right?
759
00:44:47,110 --> 00:44:48,700
Let me show you this instead.
760
00:44:48,700 --> 00:44:52,840
If I go to my terminal, I can
see-- if I type dot schema,
761
00:44:52,840 --> 00:44:56,320
I haven't used Boolean, there's
no need for me in this case,
762
00:44:56,320 --> 00:44:58,150
but I have used integer.
763
00:44:58,150 --> 00:45:02,350
And integer for SQLite can
kind of serve the same purpose.
764
00:45:02,350 --> 00:45:06,550
I could have the integer 0
or the integer 1 to be true
765
00:45:06,550 --> 00:45:09,880
or to be false or true, respectively,
0 being false and true being 1,
766
00:45:09,880 --> 00:45:11,140
I believe in this case.
767
00:45:11,140 --> 00:45:13,970
But good question.
768
00:45:13,970 --> 00:45:14,750
OK.
769
00:45:14,750 --> 00:45:19,190
So to the earlier question,
like, we've improved our tables.
770
00:45:19,190 --> 00:45:21,950
We now have type
affinities for our columns.
771
00:45:21,950 --> 00:45:26,490
But we don't yet have this
ID we talked about last week,
772
00:45:26,490 --> 00:45:29,450
which was primary keys and foreign keys.
773
00:45:29,450 --> 00:45:34,280
This idea of trying to uniquely
represent each item in our own table
774
00:45:34,280 --> 00:45:36,770
using primary keys and
trying to reference
775
00:45:36,770 --> 00:45:40,800
that primary key from some
other table using foreign keys.
776
00:45:40,800 --> 00:45:43,820
So let's try to work on that now.
777
00:45:43,820 --> 00:45:48,980
And for this, we'll need this
new ID called a table constraint.
778
00:45:48,980 --> 00:45:54,500
In SQLite, you can apply what's called
a constraint to your entire table.
779
00:45:54,500 --> 00:45:59,300
A constraint means that some
values have to be a certain way.
780
00:45:59,300 --> 00:46:04,340
Like let's say for a primary
key, primary keys must be unique.
781
00:46:04,340 --> 00:46:07,730
They can't repeat, and they
must be, at least in our case,
782
00:46:07,730 --> 00:46:12,990
is going to be integers, to be able
to quickly add on to them over time.
783
00:46:12,990 --> 00:46:16,280
Similarly, for foreign
keys, well, a constraint
784
00:46:16,280 --> 00:46:19,220
is that if you have a
foreign key, you better
785
00:46:19,220 --> 00:46:22,850
find that value in some
other table, otherwise
786
00:46:22,850 --> 00:46:26,250
you violated this constraint
to having a foreign key.
787
00:46:26,250 --> 00:46:29,940
So we have two kinds of table
constraints, among others,
788
00:46:29,940 --> 00:46:33,500
but two of these are
primary key and foreign key.
789
00:46:33,500 --> 00:46:36,800
And we can apply these to
our table by applying them
790
00:46:36,800 --> 00:46:42,030
underneath the columns we tend to
say will be inside of our table.
791
00:46:42,030 --> 00:46:43,850
Let's try these two here now.
792
00:46:43,850 --> 00:46:47,840
So we come back to my terminal here so
we implement our very own primary key
793
00:46:47,840 --> 00:46:49,730
and foreign key constraints.
794
00:46:49,730 --> 00:46:52,920
We'll go back to my SQLite
terminal and clear my screen,
795
00:46:52,920 --> 00:46:55,670
and let's then pull
up our schema.sql file
796
00:46:55,670 --> 00:46:58,130
so we can keep modifying our schema.
797
00:46:58,130 --> 00:47:01,610
I can now see I have the riders
table, stations, and visits,
798
00:47:01,610 --> 00:47:06,080
and I have some columns that could
be primary keys or foreign keys,
799
00:47:06,080 --> 00:47:08,850
but I need to declare them as such.
800
00:47:08,850 --> 00:47:12,120
So here in the riders table,
what was our primary key?
801
00:47:12,120 --> 00:47:13,520
Well, it was ID.
802
00:47:13,520 --> 00:47:16,520
Every rider should have their
own, unique ID that should not
803
00:47:16,520 --> 00:47:19,460
be duplicated across any two riders.
804
00:47:19,460 --> 00:47:24,410
So to ensure that constraint is applied,
I could follow this up with a comma
805
00:47:24,410 --> 00:47:28,770
and then say, primary
key ID, just like this.
806
00:47:28,770 --> 00:47:32,690
Now ID is the primary key
of this rider's table.
807
00:47:32,690 --> 00:47:36,450
I can go down to stations and
ask, what was my primary key?
808
00:47:36,450 --> 00:47:40,520
Well, similarly, it was ID,
this ID column on line 8.
809
00:47:40,520 --> 00:47:45,590
So I'll type a comma followed
up with primary key ID,
810
00:47:45,590 --> 00:47:49,170
and now that ID column is a primary key.
811
00:47:49,170 --> 00:47:52,580
It has that constraint
applied in stations.
812
00:47:52,580 --> 00:47:56,900
But now if I get down to visits,
we'll have a few more options.
813
00:47:56,900 --> 00:48:01,700
Visits here actually doesn't have
its own ID column that I created.
814
00:48:01,700 --> 00:48:05,750
I instead have a rider ID
and a station ID column.
815
00:48:05,750 --> 00:48:07,110
So a few options here.
816
00:48:07,110 --> 00:48:10,730
One option is actually to
make a joint primary key.
817
00:48:10,730 --> 00:48:17,630
I could have a primary key composed of
two columns, both rider ID and station
818
00:48:17,630 --> 00:48:18,170
ID.
819
00:48:18,170 --> 00:48:20,780
If I applied that
constraint, that would mean
820
00:48:20,780 --> 00:48:24,110
that if I were to ever insert
a row that had the same rider
821
00:48:24,110 --> 00:48:26,990
ID and the same station
ID as another row,
822
00:48:26,990 --> 00:48:29,810
I would trigger a constraint violation.
823
00:48:29,810 --> 00:48:33,080
Every row that has a
rider ID and a station ID
824
00:48:33,080 --> 00:48:36,860
has to be unique in their
combination of those two values.
825
00:48:36,860 --> 00:48:39,770
To write that kind of scenario,
I could follow this up
826
00:48:39,770 --> 00:48:43,190
and I could say similarly,
the primary key of this table
827
00:48:43,190 --> 00:48:48,410
is not just rider ID like
this, but it's also station ID.
828
00:48:48,410 --> 00:48:52,080
Now this is a joint
primary key constraint.
829
00:48:52,080 --> 00:48:54,110
But if we think about
this logically, I mean,
830
00:48:54,110 --> 00:48:57,620
it kind of stands to reason that
somebody would visit a station more
831
00:48:57,620 --> 00:49:01,850
than once, and I don't want to make
sure that every combination of rider
832
00:49:01,850 --> 00:49:03,420
and station ID should be unique.
833
00:49:03,420 --> 00:49:06,600
I want people to able to visit
a station more than once.
834
00:49:06,600 --> 00:49:09,900
So maybe not the best design for this
table, but I could certainly use it.
835
00:49:09,900 --> 00:49:12,530
In other cases or other contexts.
836
00:49:12,530 --> 00:49:15,410
One other option would
be to do it before
837
00:49:15,410 --> 00:49:19,670
and to have maybe the ID
column here of type integer,
838
00:49:19,670 --> 00:49:23,670
and then down below, make that
our primary key, a bit like this.
839
00:49:23,670 --> 00:49:27,140
And now visits has its own ID column.
840
00:49:27,140 --> 00:49:33,395
But actually, SQLite by default will
give me its very own primary key,
841
00:49:33,395 --> 00:49:35,810
one called row ID.
842
00:49:35,810 --> 00:49:39,170
It's implicit, I can't see it,
but I actually could query it.
843
00:49:39,170 --> 00:49:41,810
I could query it for
row ID, all one word,
844
00:49:41,810 --> 00:49:46,070
and get back a unique primary key for
this table SQLite has automatically
845
00:49:46,070 --> 00:49:47,950
created for me.
846
00:49:47,950 --> 00:49:50,880
Now we have seen the
primary key options,
847
00:49:50,880 --> 00:49:52,420
what are our foreign key options?
848
00:49:52,420 --> 00:49:55,180
Well, it seems like
rider ID and station ID
849
00:49:55,180 --> 00:49:59,320
are the foreign keys of this table
where rider ID references the ID
850
00:49:59,320 --> 00:50:05,020
column in rider's, and station ID
references the ID column in stations.
851
00:50:05,020 --> 00:50:08,350
So to codify that, to make
that a reality in our schema,
852
00:50:08,350 --> 00:50:11,540
I could follow this up
with the foreign key.
853
00:50:11,540 --> 00:50:14,260
The foreign key of
this table is rider ID,
854
00:50:14,260 --> 00:50:19,690
and it references the rider's
table and the ID column
855
00:50:19,690 --> 00:50:22,840
inside of it using this syntax here.
856
00:50:22,840 --> 00:50:24,530
Now I could keep going.
857
00:50:24,530 --> 00:50:26,840
I could say I have more
than one foreign key.
858
00:50:26,840 --> 00:50:29,500
I also have a foreign key--
859
00:50:29,500 --> 00:50:32,020
foreign key called station ID.
860
00:50:32,020 --> 00:50:34,150
And that references,
like we said before,
861
00:50:34,150 --> 00:50:37,670
the stations table and the
ID column inside of it.
862
00:50:37,670 --> 00:50:40,060
So now here I have my completed schema.
863
00:50:40,060 --> 00:50:43,780
I have a primary key for the tables
I've declared an explicit column
864
00:50:43,780 --> 00:50:47,800
for a primary key, and now I
also have foreign key constraints
865
00:50:47,800 --> 00:50:51,710
for those columns that
should be foreign keys.
866
00:50:51,710 --> 00:50:55,570
So now let me ask, what questions
do we have on the schema
867
00:50:55,570 --> 00:50:59,508
or on primary keys and foreign keys?
868
00:50:59,508 --> 00:51:00,300
AUDIENCE: So, yeah.
869
00:51:00,300 --> 00:51:03,570
I just noticed that
whenever before we hadn't
870
00:51:03,570 --> 00:51:06,210
added the affinities and
the keys, we were not
871
00:51:06,210 --> 00:51:09,340
applying commas after each column name.
872
00:51:09,340 --> 00:51:11,295
So what is the difference there?
873
00:51:11,295 --> 00:51:12,670
CARTER ZENKE: Yeah, a good catch.
874
00:51:12,670 --> 00:51:15,128
Let me kind of show you what
this looks like in my terminal
875
00:51:15,128 --> 00:51:17,310
so you can see it live.
876
00:51:17,310 --> 00:51:20,430
You, I think, had noticed
that before we had,
877
00:51:20,430 --> 00:51:25,110
let's say, this primary key
ID constraint in riders,
878
00:51:25,110 --> 00:51:26,740
we had done something like this.
879
00:51:26,740 --> 00:51:28,290
Let me just copy paste that.
880
00:51:28,290 --> 00:51:30,450
And we had removed this last column--
881
00:51:30,450 --> 00:51:32,190
or comma from the name column.
882
00:51:32,190 --> 00:51:33,090
Is that right?
883
00:51:33,090 --> 00:51:38,290
And if that's the case, well, it's
just convention, just style here.
884
00:51:38,290 --> 00:51:41,100
So if I want to keep
adding some constraint
885
00:51:41,100 --> 00:51:45,000
or like a new line to my table,
I should include a comma.
886
00:51:45,000 --> 00:51:51,720
Here, this name column was the last
portion of my table I had specified.
887
00:51:51,720 --> 00:51:55,770
I have this column called name
that has type affinity text, right?
888
00:51:55,770 --> 00:51:58,590
But now, if I add this
new constraint, we'll
889
00:51:58,590 --> 00:52:03,360
have to follow it up with-- follow it
up after a comma from this new column
890
00:52:03,360 --> 00:52:04,290
here.
891
00:52:04,290 --> 00:52:10,230
Notice now, this constraint,
primary key ID, is the last--
892
00:52:10,230 --> 00:52:12,810
let's say attribute of
my table I specified.
893
00:52:12,810 --> 00:52:15,870
I no longer need to include
a comma at the end of it.
894
00:52:15,870 --> 00:52:20,430
So whatever is the last portion,
I should not have a comma after,
895
00:52:20,430 --> 00:52:23,460
but everything else, I should.
896
00:52:23,460 --> 00:52:25,800
Let's take one more question here, too.
897
00:52:25,800 --> 00:52:29,813
AUDIENCE: Would it be OK for the visits
table to have an ID column as well?
898
00:52:29,813 --> 00:52:31,230
CARTER ZENKE: Yeah, good question.
899
00:52:31,230 --> 00:52:35,490
Would it be OK for the visits
table to have an ID column as well.
900
00:52:35,490 --> 00:52:36,450
It certainly would be.
901
00:52:36,450 --> 00:52:39,870
We could define our very own
primary key for this table, too.
902
00:52:39,870 --> 00:52:42,210
So let me go back and show
you how that could work.
903
00:52:42,210 --> 00:52:44,880
I'll go to my visits table here.
904
00:52:44,880 --> 00:52:49,440
And I could try to add my own
primary key to this table.
905
00:52:49,440 --> 00:52:54,870
I could say ID, make a new column
here, make it a type affinity integer,
906
00:52:54,870 --> 00:52:55,900
like this.
907
00:52:55,900 --> 00:52:56,700
Let me scroll up.
908
00:52:56,700 --> 00:53:00,090
And now let me add some new constraint.
909
00:53:00,090 --> 00:53:03,450
I could say, because I've
made my very own primary key,
910
00:53:03,450 --> 00:53:06,310
I'll say primary key ID.
911
00:53:06,310 --> 00:53:12,510
Now, this table has a primary
key that I've created called ID.
912
00:53:12,510 --> 00:53:15,310
And this will be in place
of SQLite, so that it
913
00:53:15,310 --> 00:53:21,730
would have made called row ID
itself but hidden from my own view.
914
00:53:21,730 --> 00:53:24,450
OK, let's keep going then.
915
00:53:24,450 --> 00:53:27,570
We've seen table constraints,
we've seen type affinities,
916
00:53:27,570 --> 00:53:30,540
but we could probably do
more to improve the design
917
00:53:30,540 --> 00:53:32,740
of this table or this database.
918
00:53:32,740 --> 00:53:35,850
So let's introduce one
more kind of constraint,
919
00:53:35,850 --> 00:53:38,740
this one called a column constraint.
920
00:53:38,740 --> 00:53:42,900
So whereas a table constraint
applies to the table as a whole,
921
00:53:42,900 --> 00:53:47,160
a column constraint applies to
a particular column, let's say.
922
00:53:47,160 --> 00:53:51,930
Maybe I want to column to have
certain data inside of it.
923
00:53:51,930 --> 00:53:54,720
Maybe I want to make sure it
doesn't have null values, and so on.
924
00:53:54,720 --> 00:53:57,690
I could do that with column constraints.
925
00:53:57,690 --> 00:54:00,510
There are four in SQLite--
926
00:54:00,510 --> 00:54:07,440
check, default, not null, and unique,
and each one does something different.
927
00:54:07,440 --> 00:54:11,340
Check allows me to make my very
own check, like check to be sure
928
00:54:11,340 --> 00:54:14,250
that this amount is greater than 0.
929
00:54:14,250 --> 00:54:17,250
Or I could use default.
Default means if I
930
00:54:17,250 --> 00:54:20,310
don't supply a value
when I add a new row,
931
00:54:20,310 --> 00:54:23,640
it'll just use a default value instead.
932
00:54:23,640 --> 00:54:29,730
Not null means I can't insert null
or empty values into this column.
933
00:54:29,730 --> 00:54:31,680
And in fact, it's required.
934
00:54:31,680 --> 00:54:37,770
Unique means, I want to make sure
that every row in this column
935
00:54:37,770 --> 00:54:39,390
is a unique value.
936
00:54:39,390 --> 00:54:42,370
It doesn't appear twice in my data set.
937
00:54:42,370 --> 00:54:46,440
So let's try applying a few
of these to our schema here.
938
00:54:46,440 --> 00:54:48,270
Go back to my terminal.
939
00:54:48,270 --> 00:54:50,920
And now let me check out this.
940
00:54:50,920 --> 00:54:55,020
Well, I could try applying
the not null constraint
941
00:54:55,020 --> 00:54:59,190
when I know I want a column
to be required, effectively.
942
00:54:59,190 --> 00:55:01,530
Now where could I best apply that?
943
00:55:01,530 --> 00:55:05,790
Maybe I could apply that to
the name column in stations,
944
00:55:05,790 --> 00:55:07,870
like stations must have a name.
945
00:55:07,870 --> 00:55:11,140
So I'll say, the name
column cannot be null.
946
00:55:11,140 --> 00:55:14,310
It cannot be empty in here.
947
00:55:14,310 --> 00:55:17,280
Line also should probably be not null.
948
00:55:17,280 --> 00:55:19,710
A station must be part of some line.
949
00:55:19,710 --> 00:55:21,780
I can't have an empty value for line.
950
00:55:21,780 --> 00:55:26,240
So I'll say, too, this
should be not null.
951
00:55:26,240 --> 00:55:28,790
Now I could apply this up at name.
952
00:55:28,790 --> 00:55:31,610
I could say, riders
must have a name, too.
953
00:55:31,610 --> 00:55:32,310
Let me try that.
954
00:55:32,310 --> 00:55:34,220
I'll say text not null.
955
00:55:34,220 --> 00:55:35,630
Or I could leave it optional.
956
00:55:35,630 --> 00:55:39,020
I could say, maybe text
just on its own and let
957
00:55:39,020 --> 00:55:42,680
riders choose to supply a name or not.
958
00:55:42,680 --> 00:55:48,650
Now the question here is, should I apply
not null to my primary key columns,
959
00:55:48,650 --> 00:55:53,300
like ID not null or ID not null here?
960
00:55:53,300 --> 00:55:56,420
You might think that you
should, for thoroughness sake.
961
00:55:56,420 --> 00:56:01,940
Well it turns out that when you apply
the primary key table constraint down
962
00:56:01,940 --> 00:56:06,440
below here, this already ensures that
there are several constraints applied
963
00:56:06,440 --> 00:56:12,800
to this particular column called ID,
among them being that ID cannot be
964
00:56:12,800 --> 00:56:13,350
null.
965
00:56:13,350 --> 00:56:17,480
So no need to duplicate this and say
that this ID cannot be null when I
966
00:56:17,480 --> 00:56:23,240
already have it specified down
below that ID is a primary key.
967
00:56:23,240 --> 00:56:25,520
Let me check others here.
968
00:56:25,520 --> 00:56:29,800
You might also think, could I do
it for rider ID and station ID?
969
00:56:29,800 --> 00:56:32,470
Should I include not null here?
970
00:56:32,470 --> 00:56:36,490
Rider ID, not null,
station ID, not null.
971
00:56:36,490 --> 00:56:39,250
Well, that would be a good
thought, but again, we're
972
00:56:39,250 --> 00:56:43,540
taken care of by our table constraint
using our foreign key here.
973
00:56:43,540 --> 00:56:48,520
Again, this constraint will
say, if rider ID doesn't already
974
00:56:48,520 --> 00:56:53,380
exist in the ID column of rider's,
I can't insert that value.
975
00:56:53,380 --> 00:56:58,090
And we could probably presume that
if rider's ID is a primary key,
976
00:56:58,090 --> 00:57:01,130
well, null will not be
part of this column,
977
00:57:01,130 --> 00:57:06,430
and therefore, I already can't insert
null for rider ID or station ID.
978
00:57:06,430 --> 00:57:08,860
This would be in this case, redundant.
979
00:57:08,860 --> 00:57:12,910
So not null is good when you
have columns that are neither
980
00:57:12,910 --> 00:57:15,910
primary keys nor foreign
keys and you want
981
00:57:15,910 --> 00:57:20,010
to make sure that they have-- they
always have a value, that they
982
00:57:20,010 --> 00:57:22,170
are never null, effectively.
983
00:57:22,170 --> 00:57:25,010
984
00:57:25,010 --> 00:57:27,320
So that is not null.
985
00:57:27,320 --> 00:57:28,730
So we could keep going here.
986
00:57:28,730 --> 00:57:34,190
We also had one called unique
that makes sure every value,
987
00:57:34,190 --> 00:57:37,100
every row in this column is unique.
988
00:57:37,100 --> 00:57:38,820
Where could we apply this?
989
00:57:38,820 --> 00:57:42,140
I could try to apply it, let's
say, to the name of a station.
990
00:57:42,140 --> 00:57:44,820
Like, station should have unique names.
991
00:57:44,820 --> 00:57:47,600
So I'll say not null and unique.
992
00:57:47,600 --> 00:57:50,220
Now this column has two constraints.
993
00:57:50,220 --> 00:57:51,680
The first, not null.
994
00:57:51,680 --> 00:57:53,720
It should always have a value.
995
00:57:53,720 --> 00:57:55,190
The second, unique.
996
00:57:55,190 --> 00:57:58,220
The value shouldn't repeat
throughout this column.
997
00:57:58,220 --> 00:58:01,100
Line I might leave
without this constraint.
998
00:58:01,100 --> 00:58:04,370
I could imagine two stations being
on the same line, like both on blue,
999
00:58:04,370 --> 00:58:07,170
I'll allow that in this case.
1000
00:58:07,170 --> 00:58:12,560
Now again, we could try to apply unique
to our primary keys or our foreign keys
1001
00:58:12,560 --> 00:58:15,920
as I just did here,
but it's already taken
1002
00:58:15,920 --> 00:58:19,520
care of for us using this
primary key constraint.
1003
00:58:19,520 --> 00:58:24,050
A primary key, again, is always
going to be unique and never null.
1004
00:58:24,050 --> 00:58:29,720
So we'll take advantage of that already
using our primary key and foreign key
1005
00:58:29,720 --> 00:58:32,490
constraints here.
1006
00:58:32,490 --> 00:58:33,270
OK.
1007
00:58:33,270 --> 00:58:36,180
So we've seen unique and not null.
1008
00:58:36,180 --> 00:58:39,510
And I might argue, we're at the
point where this schema seems
1009
00:58:39,510 --> 00:58:44,400
to be fairly optimized, at least using
our column constraints, our table
1010
00:58:44,400 --> 00:58:47,560
constraints, our type
affinities and so on.
1011
00:58:47,560 --> 00:58:55,440
So let's ask then, what questions do
we have on not null and unique, if any?
1012
00:58:55,440 --> 00:58:58,770
AUDIENCE: So basically, to recap
if I understood correctly, it's
1013
00:58:58,770 --> 00:59:01,620
not precisely about not
null and unique but about
1014
00:59:01,620 --> 00:59:06,060
the concept of the key,
labeling key immediately
1015
00:59:06,060 --> 00:59:10,135
gives the attribute of not null,
unique, and to be referenced, right?
1016
00:59:10,135 --> 00:59:11,260
CARTER ZENKE: That is true.
1017
00:59:11,260 --> 00:59:14,640
So when you use a primary key
or a foreign key constraint,
1018
00:59:14,640 --> 00:59:17,880
there are other constraints that
go along with that constraint.
1019
00:59:17,880 --> 00:59:23,410
A primary key, for instance, must not
be null, it must be unique, and so on.
1020
00:59:23,410 --> 00:59:25,290
So it would be redundant
to apply that again,
1021
00:59:25,290 --> 00:59:29,010
to say that this primary key
should be unique or not null.
1022
00:59:29,010 --> 00:59:31,160
Good clarification there.
1023
00:59:31,160 --> 00:59:32,000
OK.
1024
00:59:32,000 --> 00:59:36,590
So I think we're at the point where
this schema is pretty well set for us,
1025
00:59:36,590 --> 00:59:42,380
and we're going to need to think about
how we use this in the real world.
1026
00:59:42,380 --> 00:59:48,630
Like, if this is our schema here,
we have riders and stations, well,
1027
00:59:48,630 --> 00:59:49,940
what could we do?
1028
00:59:49,940 --> 00:59:52,340
I mean, riders tend to--
1029
00:59:52,340 --> 00:59:54,710
I mean, they could
register for the subway.
1030
00:59:54,710 --> 00:59:56,900
And riders, well, they
do visit stations.
1031
00:59:56,900 --> 00:59:59,360
But I think if we applied
this to the real world,
1032
00:59:59,360 --> 01:00:02,630
we'd see that this isn't
quite how it's actually done.
1033
01:00:02,630 --> 01:00:05,510
Like, riders don't really
register that often.
1034
01:00:05,510 --> 01:00:08,570
If a rider who's new to the city comes
in, they want to ride the subway,
1035
01:00:08,570 --> 01:00:10,760
they should be able to ride, too.
1036
01:00:10,760 --> 01:00:15,110
So it turns out at least here in
Boston, the MBTA doesn't really
1037
01:00:15,110 --> 01:00:20,360
track riders, per se, but they do
track what we call CharlieCards.
1038
01:00:20,360 --> 01:00:23,780
If you want to ride a subway,
you get a CharlieCard.
1039
01:00:23,780 --> 01:00:26,330
A CharlieCard allows you
access to the subway,
1040
01:00:26,330 --> 01:00:28,910
keeps track of your fare,
your balance and so on,
1041
01:00:28,910 --> 01:00:31,730
and allows you to swipe
in to certain stations.
1042
01:00:31,730 --> 01:00:33,800
So when we come back
from a break here, we'll
1043
01:00:33,800 --> 01:00:37,550
see how we can actually implement
these CharlieCards instead of riders
1044
01:00:37,550 --> 01:00:40,460
to make our system more efficient
and more in line with what
1045
01:00:40,460 --> 01:00:42,140
happens in the real world.
1046
01:00:42,140 --> 01:00:45,220
We'll see you all in just a few.
1047
01:00:45,220 --> 01:00:46,940
And we're back.
1048
01:00:46,940 --> 01:00:49,750
So we saw last time, this
challenge of designing
1049
01:00:49,750 --> 01:00:53,080
a system for the Massachusetts
Bay Transportation Authority
1050
01:00:53,080 --> 01:00:55,540
to represent riders
and stations-- people
1051
01:00:55,540 --> 01:00:58,480
who get on a subway to go around Boston.
1052
01:00:58,480 --> 01:01:03,460
But as we also saw, we
learned that the MBTA doesn't
1053
01:01:03,460 --> 01:01:09,100
keep track of riders themselves, they
keep track of CharlieCards, this card
1054
01:01:09,100 --> 01:01:11,650
that a rider might
carry in their pocket.
1055
01:01:11,650 --> 01:01:17,230
And they can scan this card at a station
to enter or even to exit in some cases,
1056
01:01:17,230 --> 01:01:21,650
to make sure that their fare is
counted as they enter that station.
1057
01:01:21,650 --> 01:01:27,520
So let's think now how to improve our
schema to represent not just riders,
1058
01:01:27,520 --> 01:01:31,540
like me, But CharlieCards
themselves people might carry around
1059
01:01:31,540 --> 01:01:34,120
when they enter the subway station.
1060
01:01:34,120 --> 01:01:37,120
Well, we saw before we
had riders and cards.
1061
01:01:37,120 --> 01:01:43,030
But our goal now is to remove riders
from the picture, focus only on cards.
1062
01:01:43,030 --> 01:01:47,470
Well cards, as we've seen, might
make us swipe at a station.
1063
01:01:47,470 --> 01:01:51,790
If I enter Harvard station,
I might swipe my CharlieCard
1064
01:01:51,790 --> 01:01:54,380
to enter that station.
1065
01:01:54,380 --> 01:01:59,270
And we could see that a card would
have maybe many swipes involved.
1066
01:01:59,270 --> 01:02:04,520
Like if I swipe at Harvard, I might also
swipe it MIT or swipe it Park Street,
1067
01:02:04,520 --> 01:02:05,810
and so on.
1068
01:02:05,810 --> 01:02:10,410
We could see that a swipe can only
happen at a single station at a time,
1069
01:02:10,410 --> 01:02:10,910
though.
1070
01:02:10,910 --> 01:02:14,990
Like if I swipe at Park Street, I'm
only swiping at Park Street here,
1071
01:02:14,990 --> 01:02:20,050
and similarly, a swipe might
only involve one, single card.
1072
01:02:20,050 --> 01:02:23,040
So if we think about these
entities and how they relate,
1073
01:02:23,040 --> 01:02:27,810
we could also think about what kinds of
columns you have inside of each entity.
1074
01:02:27,810 --> 01:02:30,730
In this case, I would argue we
have something a bit like this.
1075
01:02:30,730 --> 01:02:36,150
We could say that a card has a ID
for its primary key, in this case.
1076
01:02:36,150 --> 01:02:42,360
Similarly, this card makes a swipe,
and this swipe has, itself, an ID,
1077
01:02:42,360 --> 01:02:46,260
as well as a type, some time
that that swipe happened,
1078
01:02:46,260 --> 01:02:49,480
and an amount or a
transaction that is involved.
1079
01:02:49,480 --> 01:02:54,090
So for example, let's say I swipe
in at the Harvard Square Station.
1080
01:02:54,090 --> 01:02:59,880
That type of swipe might be to enter
the station at some certain time.
1081
01:02:59,880 --> 01:03:03,510
Now associated with that
swipe is also some amount
1082
01:03:03,510 --> 01:03:07,320
in dollars that happened to
be subtracted from my card.
1083
01:03:07,320 --> 01:03:10,620
Like let's say the fare
nowadays is $2.40, well,
1084
01:03:10,620 --> 01:03:15,830
that amount is subtracted
from my card from this swipe.
1085
01:03:15,830 --> 01:03:20,270
Now, of course, I do all of this at
a station, which has our same columns
1086
01:03:20,270 --> 01:03:23,630
from before, ID, name, and line.
1087
01:03:23,630 --> 01:03:28,130
So a similar idea now, but we're
replacing riders with cards
1088
01:03:28,130 --> 01:03:31,190
and adding more information
to these visits.
1089
01:03:31,190 --> 01:03:34,940
There are instead swipes that could
be maybe entering the station,
1090
01:03:34,940 --> 01:03:39,240
exiting the station or just adding
funds to my balance, for instance.
1091
01:03:39,240 --> 01:03:41,690
So let's see how we could
improve our schema now
1092
01:03:41,690 --> 01:03:47,720
using some new SQL keywords to alter
our tables and add some new ones, too.
1093
01:03:47,720 --> 01:03:49,790
I'll go back to my computer.
1094
01:03:49,790 --> 01:03:54,560
And let's see how we could try to
alter the table we've already created.
1095
01:03:54,560 --> 01:03:58,910
We already have a riders table, a
visits table and a stations table,
1096
01:03:58,910 --> 01:04:02,450
but we could learn some new statements,
some new queries to actually
1097
01:04:02,450 --> 01:04:06,030
update and alter these tables as well.
1098
01:04:06,030 --> 01:04:08,100
The first one as we
saw a little bit before
1099
01:04:08,100 --> 01:04:11,270
is this one called drop table,
arguably the most dramatic thing
1100
01:04:11,270 --> 01:04:14,100
you can do to table, is just
drop it, delete it like this.
1101
01:04:14,100 --> 01:04:18,080
So let's try just deleting the
riders table from our database.
1102
01:04:18,080 --> 01:04:21,140
I'll go back to my--
1103
01:04:21,140 --> 01:04:24,050
go back to my database here.
1104
01:04:24,050 --> 01:04:29,370
I'll type sqlite mbta.db to
pull up this database again.
1105
01:04:29,370 --> 01:04:34,160
And now if I type dot schema, I'll see
I have a riders, stations, and visits
1106
01:04:34,160 --> 01:04:34,790
table.
1107
01:04:34,790 --> 01:04:38,510
But no longer-- I don't want
to have the riders table.
1108
01:04:38,510 --> 01:04:39,990
I want to remove that.
1109
01:04:39,990 --> 01:04:45,200
So I'll say, drop table
riders, semicolon.
1110
01:04:45,200 --> 01:04:46,880
Now I'll hit enter.
1111
01:04:46,880 --> 01:04:48,650
No more riders table.
1112
01:04:48,650 --> 01:04:52,830
If I type dot schema,
that's gone for my database.
1113
01:04:52,830 --> 01:04:54,990
Well, what could I do now?
1114
01:04:54,990 --> 01:04:58,740
I've dropped the table, but
I'd still need to maybe update
1115
01:04:58,740 --> 01:05:00,660
visits to instead be swipes.
1116
01:05:00,660 --> 01:05:02,850
I could probably leave stations as is.
1117
01:05:02,850 --> 01:05:06,360
But I want to update a
table or alter its schema,
1118
01:05:06,360 --> 01:05:09,370
I can use some new commands as well.
1119
01:05:09,370 --> 01:05:12,960
I'll use this one called alter table.
1120
01:05:12,960 --> 01:05:16,300
Alter table looks a bit like this.
1121
01:05:16,300 --> 01:05:22,320
I can use the alter table statement
here and give it some table name.
1122
01:05:22,320 --> 01:05:27,990
Like let's say alter table visits,
and then inside this visits table,
1123
01:05:27,990 --> 01:05:30,370
I get to ask, what do I want to do?
1124
01:05:30,370 --> 01:05:33,030
Do I want to, let say, rename the table?
1125
01:05:33,030 --> 01:05:34,240
I could do that.
1126
01:05:34,240 --> 01:05:37,890
I could also decide to add
a column, rename a column,
1127
01:05:37,890 --> 01:05:40,440
or drop a column altogether.
1128
01:05:40,440 --> 01:05:43,530
Let's try first looking at rename to.
1129
01:05:43,530 --> 01:05:47,790
I want to rename this visits
table to a swipes table,
1130
01:05:47,790 --> 01:05:51,510
representing not just a visit to
the station, but a swipe of a card.
1131
01:05:51,510 --> 01:05:53,340
So let's try this one here.
1132
01:05:53,340 --> 01:05:57,240
I'll go back to my computer
and I'll go back to SQLite.
1133
01:05:57,240 --> 01:06:01,740
And I'll say, I no longer want
visits to be called visits.
1134
01:06:01,740 --> 01:06:06,330
I ideally want visits to
actually be called swipes.
1135
01:06:06,330 --> 01:06:07,590
So let me try this.
1136
01:06:07,590 --> 01:06:13,500
I'll say alter table,
as we saw before, visits
1137
01:06:13,500 --> 01:06:18,180
rename to swipes, like this, semicolon.
1138
01:06:18,180 --> 01:06:19,770
Now I'll hit enter.
1139
01:06:19,770 --> 01:06:24,212
And now if I type dot
schema, again, I see--
1140
01:06:24,212 --> 01:06:25,170
oops, kind of strange--
1141
01:06:25,170 --> 01:06:28,500
I'll hit dot schema
again, I'll see swipes,
1142
01:06:28,500 --> 01:06:32,730
no longer called visits,
but now called swipes.
1143
01:06:32,730 --> 01:06:36,390
What we saw-- I'd ideally like
to add a type of the swipe,
1144
01:06:36,390 --> 01:06:40,200
maybe I'm entering the
station, maybe I'm exiting.
1145
01:06:40,200 --> 01:06:43,770
Maybe I'm just adding funds
or depositing some funds.
1146
01:06:43,770 --> 01:06:47,940
So let me try adding, let's
say, a new column to swipes.
1147
01:06:47,940 --> 01:06:51,630
I'll say alter table, add column--
1148
01:06:51,630 --> 01:06:57,910
alter table, swipes, I'm going to
name that table, swipes, like this.
1149
01:06:57,910 --> 01:07:00,840
And then let's add a column--
1150
01:07:00,840 --> 01:07:03,510
add a column called type.
1151
01:07:03,510 --> 01:07:06,600
And this will have the
type affinity text.
1152
01:07:06,600 --> 01:07:08,580
I'll hit semicolon, enter.
1153
01:07:08,580 --> 01:07:12,060
And now if I type dot schema, I'll see--
1154
01:07:12,060 --> 01:07:16,820
well, I have a column called a ttpe.
1155
01:07:16,820 --> 01:07:18,270
So clearly, I made a typo here.
1156
01:07:18,270 --> 01:07:20,570
I had rider ID, station ID.
1157
01:07:20,570 --> 01:07:26,120
This new column down below called ttpe,
I kind of want to fix this, right?
1158
01:07:26,120 --> 01:07:28,430
I don't want ttpe, I want type.
1159
01:07:28,430 --> 01:07:31,760
So probably good chance
to use my rename column.
1160
01:07:31,760 --> 01:07:34,250
I'll come back here and I'll try that.
1161
01:07:34,250 --> 01:07:37,610
I'll instead do alter table--
1162
01:07:37,610 --> 01:07:43,160
alter table, swipes, and I
will rename a column now.
1163
01:07:43,160 --> 01:07:51,170
I'll say rename the column, ttpe
to type, spelled correctly now.
1164
01:07:51,170 --> 01:07:55,220
Hit semicolon, clear my
terminal, type dot schema.
1165
01:07:55,220 --> 01:08:01,070
And now I see over here that type
has been added to my table of swipes.
1166
01:08:01,070 --> 01:08:06,170
I see rider ID, station ID, and
now a new column called type.
1167
01:08:06,170 --> 01:08:11,630
So through alter table, we go ahead
and add new columns, rename them even,
1168
01:08:11,630 --> 01:08:15,650
or if I wanted to just
drop the column altogether.
1169
01:08:15,650 --> 01:08:19,160
Let's say I add this column
type and I change my mind,
1170
01:08:19,160 --> 01:08:20,580
I don't want it here anymore.
1171
01:08:20,580 --> 01:08:24,260
I go back to my computer and I could
try dropping a particular column.
1172
01:08:24,260 --> 01:08:28,850
Let me try in this case,
alter table, swipes.
1173
01:08:28,850 --> 01:08:33,859
And now let me drop
column type, semicolon.
1174
01:08:33,859 --> 01:08:34,640
Hit enter.
1175
01:08:34,640 --> 01:08:40,529
And now if I type dot schema,
I'm back to where I began.
1176
01:08:40,529 --> 01:08:45,330
So these are the commands we can use
to alter our tables, to improve them
1177
01:08:45,330 --> 01:08:48,450
if you make a mistake during
our first create table command,
1178
01:08:48,450 --> 01:08:52,930
or if you want to add more
functionality later down the line.
1179
01:08:52,930 --> 01:08:58,050
So ideally, I could keep using alter
table, add table, create table, drop
1180
01:08:58,050 --> 01:08:58,930
table, and so on.
1181
01:08:58,930 --> 01:09:01,290
But what I want to do here
is just start from scratch.
1182
01:09:01,290 --> 01:09:04,029
Like I have stations
and swipes and so on.
1183
01:09:04,029 --> 01:09:07,899
Why don't I just go back to my
schema file and use that instead?
1184
01:09:07,899 --> 01:09:11,910
So what I'll do is I'll drop
table for stations and--
1185
01:09:11,910 --> 01:09:12,779
oops, semicolon.
1186
01:09:12,779 --> 01:09:17,490
I will drop table for
swipes now, semicolon.
1187
01:09:17,490 --> 01:09:22,439
I'll type dot schema, and
now I'll see nothing in here.
1188
01:09:22,439 --> 01:09:26,760
I'll quit and I'll type code schema.sql.
1189
01:09:26,760 --> 01:09:29,970
Let me just start from
scratch using this schema.sql.
1190
01:09:29,970 --> 01:09:35,790
So we no longer want to have
riders, we only want to have cards.
1191
01:09:35,790 --> 01:09:38,670
So I could just rename this table here.
1192
01:09:38,670 --> 01:09:39,930
I'll call it cards.
1193
01:09:39,930 --> 01:09:41,609
Create table, cards.
1194
01:09:41,609 --> 01:09:47,520
Now cards don't have a name, they
only have some unique ID in this case.
1195
01:09:47,520 --> 01:09:50,460
Now I'll leave stations just as it is.
1196
01:09:50,460 --> 01:09:51,870
This seems perfectly fine to me.
1197
01:09:51,870 --> 01:09:55,800
Stations have an ID, a name and
a line with these constraints
1198
01:09:55,800 --> 01:09:58,690
applied to the name and line columns.
1199
01:09:58,690 --> 01:10:00,720
But now the visits table.
1200
01:10:00,720 --> 01:10:04,500
Well, the visits table is no longer
a visit to the station per se,
1201
01:10:04,500 --> 01:10:07,570
it's more a swipe of the
card at that station.
1202
01:10:07,570 --> 01:10:10,950
So let's now say visits becomes swipes.
1203
01:10:10,950 --> 01:10:15,370
And among these new columns
to add are the following.
1204
01:10:15,370 --> 01:10:18,690
I want to have not just
an ID for each swipe,
1205
01:10:18,690 --> 01:10:22,230
not an ID for rider ID
and station ID and so on,
1206
01:10:22,230 --> 01:10:26,310
but also I want to have a type of swipe.
1207
01:10:26,310 --> 01:10:30,870
Am I entering the station,
exiting, or depositing some funds?
1208
01:10:30,870 --> 01:10:34,980
So I'll say the type of this
is going to be a new column,
1209
01:10:34,980 --> 01:10:40,200
and the data type, this whole
will be text, enter, exit,
1210
01:10:40,200 --> 01:10:42,900
or deposit for funds.
1211
01:10:42,900 --> 01:10:45,270
Now let me try another column, two.
1212
01:10:45,270 --> 01:10:47,700
I'll include a date time.
1213
01:10:47,700 --> 01:10:49,260
A date time is like a timestamp.
1214
01:10:49,260 --> 01:10:51,810
What time did this
swipe actually happen?
1215
01:10:51,810 --> 01:10:54,570
And I'll make this type numeric.
1216
01:10:54,570 --> 01:11:00,650
Numeric can store all kinds of dates
and times for me in this table.
1217
01:11:00,650 --> 01:11:02,950
Now let me add one final column.
1218
01:11:02,950 --> 01:11:04,990
This one will be an amount.
1219
01:11:04,990 --> 01:11:09,910
I'll also use numeric for
this kind of column here.
1220
01:11:09,910 --> 01:11:13,150
And I'll say that this
column called amount
1221
01:11:13,150 --> 01:11:18,500
can store, in this case integers or
real numbers, like floats and so on.
1222
01:11:18,500 --> 01:11:24,310
I'll probably decide on that when I
actually add some data to this table.
1223
01:11:24,310 --> 01:11:28,650
So here we've updated our schema to
represent that diagram we saw before.
1224
01:11:28,650 --> 01:11:31,380
I have cards, I have
stations, and I have
1225
01:11:31,380 --> 01:11:36,240
swipes that have some type associated
with them, enter, exit, deposit,
1226
01:11:36,240 --> 01:11:39,750
some date, time, and some
amount that was charged to me
1227
01:11:39,750 --> 01:11:41,340
while I made this swipe.
1228
01:11:41,340 --> 01:11:44,070
Either I added some funds, in
which case amount is positive,
1229
01:11:44,070 --> 01:11:50,000
or I subtracted some funds, entering
exiting, in this case from the station.
1230
01:11:50,000 --> 01:11:50,990
All right.
1231
01:11:50,990 --> 01:11:53,390
So I have these tables now.
1232
01:11:53,390 --> 01:11:56,750
And now I want to probably
apply some of those same column
1233
01:11:56,750 --> 01:11:58,490
constraints we saw before.
1234
01:11:58,490 --> 01:12:02,840
Like here, it's fine, but I also want
to make sure I'm not adding some data
1235
01:12:02,840 --> 01:12:05,280
that I don't want to add to this table.
1236
01:12:05,280 --> 01:12:09,230
So I could go back to my old
friends, these column constraints.
1237
01:12:09,230 --> 01:12:14,450
And we saw before, we had default
and not null, unique and check.
1238
01:12:14,450 --> 01:12:20,400
We've used not null and unique,
but we haven't check or default.
1239
01:12:20,400 --> 01:12:22,970
So let's start using more
than just not null and unique
1240
01:12:22,970 --> 01:12:27,720
and also focus on check and default,
what they can do for us here.
1241
01:12:27,720 --> 01:12:29,670
I'll go back to my schema.
1242
01:12:29,670 --> 01:12:32,970
And let me just make sure that
I'm making all the columns that I
1243
01:12:32,970 --> 01:12:35,610
want to be required actually required.
1244
01:12:35,610 --> 01:12:39,990
I'll go into my swipes table
and I'll say that type--
1245
01:12:39,990 --> 01:12:41,700
I mean, this should be required.
1246
01:12:41,700 --> 01:12:46,380
I should know whether I entered,
exited, or deposited some funds.
1247
01:12:46,380 --> 01:12:48,750
So I'll say text not null.
1248
01:12:48,750 --> 01:12:52,590
Similarly, for the timestamp,
the time this swipe happened,
1249
01:12:52,590 --> 01:12:54,970
I want that to be not null as well.
1250
01:12:54,970 --> 01:12:57,660
I must know what time I swiped the card.
1251
01:12:57,660 --> 01:13:00,270
And also, it makes sense
for there to always be
1252
01:13:00,270 --> 01:13:02,820
some amount associated with this swipe.
1253
01:13:02,820 --> 01:13:07,450
Either I added some funds to my balance
or I removed some funds overall.
1254
01:13:07,450 --> 01:13:11,260
So I'll make this not null as well.
1255
01:13:11,260 --> 01:13:14,910
Well, let's see what we
could do here with default.
1256
01:13:14,910 --> 01:13:18,900
Default gives me some default
value, some initial value
1257
01:13:18,900 --> 01:13:25,320
to add to this column if I don't
specify what kind of value to add,
1258
01:13:25,320 --> 01:13:27,840
what be good for date time.
1259
01:13:27,840 --> 01:13:30,390
Date time here is, again, the timestamp.
1260
01:13:30,390 --> 01:13:35,080
The time I swiped this card to
enter, let's say, Harvard station.
1261
01:13:35,080 --> 01:13:40,210
Well, if I want this to
always have the current time,
1262
01:13:40,210 --> 01:13:43,690
I could use default. I
could say the default
1263
01:13:43,690 --> 01:13:48,220
value for this column is
this special value here,
1264
01:13:48,220 --> 01:13:52,160
CURRENT TIMESTAMP, in all caps.
1265
01:13:52,160 --> 01:13:57,470
CURRENT TIMESTAMP will show me the
year, the month, the day, the hour,
1266
01:13:57,470 --> 01:14:02,600
the minute, the second, all in one
value, and insert that into my table.
1267
01:14:02,600 --> 01:14:08,330
So as soon as I add new row, if I don't
supply a time for this datetime column,
1268
01:14:08,330 --> 01:14:15,860
I'll instead get back the current time
exactly as it is represented in SQLite.
1269
01:14:15,860 --> 01:14:17,760
Now what can I do further than this?
1270
01:14:17,760 --> 01:14:21,750
I could also try to add
a-- my very own check.
1271
01:14:21,750 --> 01:14:26,780
Maybe I want to make sure that the
amounts here are never equal to zero.
1272
01:14:26,780 --> 01:14:31,220
Like, nobody should ever be able to
make a $0 transaction or $0 swipe.
1273
01:14:31,220 --> 01:14:33,860
They're always being charged
some money or they're always
1274
01:14:33,860 --> 01:14:36,050
depositing some money, in this case.
1275
01:14:36,050 --> 01:14:41,540
So I could say, amount here has
my very own check on this column.
1276
01:14:41,540 --> 01:14:45,980
And inside check, I can actually write
my very own expression to check for.
1277
01:14:45,980 --> 01:14:51,170
I could say, for example,
amount is not equal to zero.
1278
01:14:51,170 --> 01:14:54,500
Using those same operators
we saw back in week zero,
1279
01:14:54,500 --> 01:15:01,810
this will ensure that any value inside
amount will not be equal to zero.
1280
01:15:01,810 --> 01:15:03,760
Let's try also for type.
1281
01:15:03,760 --> 01:15:06,730
I mean, type can only have a few values.
1282
01:15:06,730 --> 01:15:11,170
We saw enter, exit, deposit some funds.
1283
01:15:11,170 --> 01:15:16,060
I could make sure that only those values
are included inside my type column.
1284
01:15:16,060 --> 01:15:20,500
I could say check that type
is in some list of values.
1285
01:15:20,500 --> 01:15:23,560
Going back to week one
here, we talk about in.
1286
01:15:23,560 --> 01:15:31,460
I could say, maybe the type is
in enter exit or is in deposit.
1287
01:15:31,460 --> 01:15:34,570
So now when I have this
table called swipes,
1288
01:15:34,570 --> 01:15:38,740
I'm representing what I'm actually
doing when I go to Harvard station.
1289
01:15:38,740 --> 01:15:42,633
I have here a visit for
myself, my very own ID
1290
01:15:42,633 --> 01:15:44,050
here, which will update in second.
1291
01:15:44,050 --> 01:15:47,290
I have a station ID where
I'm actually going to visit.
1292
01:15:47,290 --> 01:15:52,270
I have a type that I'm going to
use to enter, exit, or deposit
1293
01:15:52,270 --> 01:15:54,100
some funds at this station.
1294
01:15:54,100 --> 01:15:56,710
I'm doing it at a certain
time, and I have an amount
1295
01:15:56,710 --> 01:16:00,200
associated with this transaction.
1296
01:16:00,200 --> 01:16:02,500
Now, there's one thing
to fix here, which
1297
01:16:02,500 --> 01:16:07,040
is that we're still talking about
riders inside our swipes table.
1298
01:16:07,040 --> 01:16:08,360
So let's fix that here, too.
1299
01:16:08,360 --> 01:16:12,460
I'll go back to my computer,
and let's try fixing this.
1300
01:16:12,460 --> 01:16:19,120
I have a rider ID inside of my swipes
table, but no longer do I have riders,
1301
01:16:19,120 --> 01:16:20,680
I have cards.
1302
01:16:20,680 --> 01:16:25,780
So let me say that this is now a card
ID, and down below in my foreign key,
1303
01:16:25,780 --> 01:16:32,650
I'll say that this card ID column
references the ID column in cards.
1304
01:16:32,650 --> 01:16:35,140
And I think this should
represent everything
1305
01:16:35,140 --> 01:16:39,800
I want to represent about
swipes at the station.
1306
01:16:39,800 --> 01:16:44,690
So let me ask now what questions
we have about this new schema
1307
01:16:44,690 --> 01:16:47,270
and the constraints we've applied.
1308
01:16:47,270 --> 01:16:50,210
AUDIENCE: I wonder, how
could you delete or drop
1309
01:16:50,210 --> 01:16:56,510
the table of riders when
you use ID as a foreign key?
1310
01:16:56,510 --> 01:17:00,128
I try to do that, but I got an error.
1311
01:17:00,128 --> 01:17:03,170
CARTER ZENKE: Yeah, so you're getting
into some more advanced stuff here.
1312
01:17:03,170 --> 01:17:06,500
And suffice to say for now,
my foreign key constraints
1313
01:17:06,500 --> 01:17:09,830
aren't actually being checked
right now, but yours might be.
1314
01:17:09,830 --> 01:17:14,090
If you try to drop a table that
actually has some data that's
1315
01:17:14,090 --> 01:17:17,900
referenced by a foreign key, SQLite
will warn you, perhaps to tell you,
1316
01:17:17,900 --> 01:17:21,840
you can't do that because this ID is
referenced from this table over there.
1317
01:17:21,840 --> 01:17:26,780
So in that case, best to delete
the value that has that foreign key
1318
01:17:26,780 --> 01:17:30,470
and then proceed with dropping
that table altogether.
1319
01:17:30,470 --> 01:17:31,910
Good question there.
1320
01:17:31,910 --> 01:17:34,630
Yeah, let's take one more.
1321
01:17:34,630 --> 01:17:43,570
AUDIENCE: How might this syntax send
the other SQL languages, same as SQLite
1322
01:17:43,570 --> 01:17:46,580
languages, how much is syntax same?
1323
01:17:46,580 --> 01:17:48,080
CARTER ZENKE: Yeah, a good question.
1324
01:17:48,080 --> 01:17:52,000
So here we're using the SQLite
database management system.
1325
01:17:52,000 --> 01:17:55,060
It is similar to but
distinct in some ways
1326
01:17:55,060 --> 01:17:57,820
from others like MySQL or PostgreSQL.
1327
01:17:57,820 --> 01:18:00,220
I would say that most of
what you're doing here
1328
01:18:00,220 --> 01:18:04,822
could also be ported to MySQL and
PostgreSQL with a few exceptions
1329
01:18:04,822 --> 01:18:07,030
that you might need to treat
on a case by case basis.
1330
01:18:07,030 --> 01:18:10,220
In fact, the developers
of SQLite built things
1331
01:18:10,220 --> 01:18:14,800
so that it would be easy to port their
database schemas to another schema
1332
01:18:14,800 --> 01:18:18,140
like MySQL or PostgreSQL as well.
1333
01:18:18,140 --> 01:18:21,500
But good question there.
1334
01:18:21,500 --> 01:18:23,960
All right, let's take
one more question here.
1335
01:18:23,960 --> 01:18:30,260
AUDIENCE: Imagine if we put in the
ID, we don't put any data type,
1336
01:18:30,260 --> 01:18:32,090
it's going to give us--
1337
01:18:32,090 --> 01:18:37,032
in this schema, it's going
to give us a text or--
1338
01:18:37,032 --> 01:18:38,490
CARTER ZENKE: Yeah, great question.
1339
01:18:38,490 --> 01:18:42,980
So you're asking if I didn't tell
SQLite what kind of type affinity
1340
01:18:42,980 --> 01:18:46,220
a column had, what type of
affinity would it actually have?
1341
01:18:46,220 --> 01:18:47,360
A great question.
1342
01:18:47,360 --> 01:18:52,460
In this case, by default, SQLite
gives the numeric type affinity,
1343
01:18:52,460 --> 01:18:56,250
where numeric can store
integers or real values.
1344
01:18:56,250 --> 01:18:59,180
But if you give it like a string
of text, like let's say red line,
1345
01:18:59,180 --> 01:19:04,980
it will store that for you, too,
kind of non-intuitively, but it will.
1346
01:19:04,980 --> 01:19:08,310
But the default type affinity
is numeric in this case,
1347
01:19:08,310 --> 01:19:10,340
if you don't otherwise specify.
1348
01:19:10,340 --> 01:19:11,900
Good question.
1349
01:19:11,900 --> 01:19:17,410
OK, let's come back then and let's focus
on wrapping up on a final few pieces
1350
01:19:17,410 --> 01:19:17,910
here.
1351
01:19:17,910 --> 01:19:20,810
So we've seen some
table constraints, which
1352
01:19:20,810 --> 01:19:23,360
we apply primary keys and foreign keys.
1353
01:19:23,360 --> 01:19:26,360
We saw column constraints,
where we could make sure
1354
01:19:26,360 --> 01:19:28,970
that certain values were
given to us through not null.
1355
01:19:28,970 --> 01:19:31,730
We could also make sure that the--
1356
01:19:31,730 --> 01:19:33,925
let's say the value is
in some list of values,
1357
01:19:33,925 --> 01:19:37,460
they're using check or making
sure that it's not some value,
1358
01:19:37,460 --> 01:19:39,320
also using check as well.
1359
01:19:39,320 --> 01:19:44,930
Default allows us to specify a given
value for every new insertion of rows
1360
01:19:44,930 --> 01:19:45,560
here.
1361
01:19:45,560 --> 01:19:49,550
And this is actually
pretty important to have.
1362
01:19:49,550 --> 01:19:53,750
Not just schemas that have
column names and let's
1363
01:19:53,750 --> 01:19:57,320
say type affinities as
well, but also constraints,
1364
01:19:57,320 --> 01:20:00,210
make sure the data we
insert works well for us.
1365
01:20:00,210 --> 01:20:02,960
And there's actually a
story behind this person
1366
01:20:02,960 --> 01:20:07,820
who's on the CharlieCard, this
person who's on the subway.
1367
01:20:07,820 --> 01:20:11,420
His name is Charlie, and he's
perhaps the most famous subway
1368
01:20:11,420 --> 01:20:13,880
rider in all of Boston.
1369
01:20:13,880 --> 01:20:17,900
Back in the 19-- let's
say 1900s, the band
1370
01:20:17,900 --> 01:20:21,290
called The Kingston Trio wrote a
song about this man named Charlie.
1371
01:20:21,290 --> 01:20:26,030
Charlie, supposedly, got on at the
Kendall Square station where MIT is,
1372
01:20:26,030 --> 01:20:27,950
and he made for Jamaica Plain.
1373
01:20:27,950 --> 01:20:30,320
But once he got to Jamaica
Plane, the conductor
1374
01:20:30,320 --> 01:20:34,040
asked him for one more nickel, and
well, he didn't have that nickel.
1375
01:20:34,040 --> 01:20:36,860
So he got stuck on the train
for years, and years, and years.
1376
01:20:36,860 --> 01:20:38,750
He couldn't get off of the subway.
1377
01:20:38,750 --> 01:20:42,532
So keep in mind, Charlie, when you're
writing your own database constraints,
1378
01:20:42,532 --> 01:20:44,240
making sure that if
you get on the train,
1379
01:20:44,240 --> 01:20:46,610
you're able to get off of
it at some point, right?
1380
01:20:46,610 --> 01:20:49,080
Don't end up like Charlie in this case.
1381
01:20:49,080 --> 01:20:53,720
So with this in mind, you're able
to design your very own database
1382
01:20:53,720 --> 01:20:56,840
schemas that keep not just
certain columns involved,
1383
01:20:56,840 --> 01:20:59,210
but also type affinities
for those columns, types--
1384
01:20:59,210 --> 01:21:01,790
those data types-- types
the columns can store.
1385
01:21:01,790 --> 01:21:04,940
You're also able to apply constraints
to those columns to make sure
1386
01:21:04,940 --> 01:21:07,130
that the data you're
inserting is that you actually
1387
01:21:07,130 --> 01:21:09,380
want to have in that column.
1388
01:21:09,380 --> 01:21:13,100
Next time, we'll focus on actually
adding data to our columns,
1389
01:21:13,100 --> 01:21:16,260
actually write data to a
database file to insert,
1390
01:21:16,260 --> 01:21:18,720
update, and delete that data altogether.
1391
01:21:18,720 --> 01:21:22,480
So with that in mind,
we'll see you next time.
1392
01:21:22,480 --> 01:21:24,000117327
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.