Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,480 --> 00:00:00,780
All right.
2
00:00:00,780 --> 00:00:05,640
Time to talk about something really really important called normalization.
3
00:00:05,850 --> 00:00:10,860
Now this is a tricky one to wrap your head around at first so I encourage you to re watch this video
4
00:00:11,160 --> 00:00:14,000
as many times as it takes until it starts to really stick.
5
00:00:14,220 --> 00:00:21,410
So by definition normalization is the process of organizing the tables and columns in a relational database
6
00:00:21,870 --> 00:00:25,520
to reduce redundancy and preserve data integrity.
7
00:00:25,620 --> 00:00:30,290
So a lot of fancy words they're kind of tough to understand what that really means.
8
00:00:30,510 --> 00:00:33,900
But basically it's used to do three different things.
9
00:00:33,900 --> 00:00:40,230
Number one eliminate redundant data which helps to decrease table sizes and more importantly reduce
10
00:00:40,230 --> 00:00:43,500
processing speed and improve efficiency.
11
00:00:43,500 --> 00:00:49,140
Number two helps us minimize errors and anomalies when we make data modifications.
12
00:00:49,140 --> 00:00:53,240
So if we're to insert or update or delete records in our database.
13
00:00:53,460 --> 00:01:00,030
And number three it helps simplify queries and structure the database in a way that enables meaningful
14
00:01:00,090 --> 00:01:01,770
useful analysis.
15
00:01:01,770 --> 00:01:04,620
So still feels kind of over complicated.
16
00:01:04,680 --> 00:01:05,780
If you asked me.
17
00:01:05,790 --> 00:01:10,910
So my tip to remember what normalization is all about is to think of it this way.
18
00:01:10,980 --> 00:01:18,060
In a properly normalized database every table should serve it distinct and specific purpose.
19
00:01:18,180 --> 00:01:23,880
So you might have one table that only gives you information about products you have another that only
20
00:01:23,880 --> 00:01:27,090
gives you information about dates like a calendar table.
21
00:01:27,330 --> 00:01:33,610
You might have one that's only daily Transactional Records and another that's only about customers.
22
00:01:33,660 --> 00:01:38,940
Now this should sound pretty familiar because these are the exact type of tables that we're using here
23
00:01:39,000 --> 00:01:41,070
in this adventure works demo.
24
00:01:41,070 --> 00:01:47,070
So let me take a stab at visualizing why normalization is such an important concept consider a table
25
00:01:47,070 --> 00:01:48,150
like this.
26
00:01:48,150 --> 00:01:55,110
You've got transaction quantities here in the third column broken down by product ID and by date as
27
00:01:55,110 --> 00:02:02,430
well as all of this extra information about each product ID the brand the name the skew and the weight.
28
00:02:02,430 --> 00:02:08,100
And as you can see just from this small sample that we have multiple transactions or multiple quantity
29
00:02:08,100 --> 00:02:13,440
values per day and multiple quantity values per product ID.
30
00:02:13,530 --> 00:02:16,120
So this table is not normalized.
31
00:02:16,350 --> 00:02:19,020
It doesn't serve a single unique purpose.
32
00:02:19,020 --> 00:02:25,230
It's actually serving at least two purposes one providing the transaction quantity by date and product
33
00:02:25,230 --> 00:02:30,560
ID and to providing additional attributes about those products.
34
00:02:30,690 --> 00:02:32,910
Those are two different purposes.
35
00:02:32,970 --> 00:02:37,060
So what you end up with here are all of these duplicate rows.
36
00:02:37,080 --> 00:02:41,250
In any case where the same product ID appears more than once.
37
00:02:41,310 --> 00:02:46,420
So you see duplicate brand names product names duplicate Skewes and product weights.
38
00:02:46,760 --> 00:02:49,730
And you might be wondering OK that's not that big a deal.
39
00:02:49,740 --> 00:02:54,330
I'm still getting the information that I need in fact to have it all in one place in a single table
40
00:02:54,330 --> 00:02:54,990
which is great.
41
00:02:54,990 --> 00:02:56,960
So I don't see the downside here.
42
00:02:57,270 --> 00:03:03,480
Well imagine if we were dealing with 100 different products and each of those products on average sold
43
00:03:03,490 --> 00:03:05,160
10000 times a day.
44
00:03:05,310 --> 00:03:11,360
Now all of a sudden you're talking about a million duplicate rows for every single date in the data
45
00:03:11,360 --> 00:03:12,220
set.
46
00:03:12,300 --> 00:03:18,450
So you can see that with larger more complex models minor inefficiencies like this can become major
47
00:03:18,450 --> 00:03:19,690
major problems.
48
00:03:19,770 --> 00:03:21,710
As you scale up in size.
49
00:03:21,780 --> 00:03:28,830
So the way to avoid issues like this is to strip those product attribute columns out of this table and
50
00:03:28,830 --> 00:03:32,180
create a relationship with a single product Look-Up.
51
00:03:32,330 --> 00:03:38,580
And if that product look up contains a unique list of product IDs with those associated attributes then
52
00:03:38,580 --> 00:03:41,370
we can access the exact same information here.
53
00:03:41,550 --> 00:03:44,860
Well eliminating every one of those duplicate rows.
54
00:03:44,880 --> 00:03:50,070
So again this concept may still feel a little bit ambiguous but trust me we're going to get our hands
55
00:03:50,070 --> 00:03:50,730
dirty.
56
00:03:50,730 --> 00:03:55,140
We're going to do a ton of demos walk through a bunch of samples and this is going to start to feel
57
00:03:55,140 --> 00:03:58,970
much more natural as we continue through this section of the course.
58
00:03:58,980 --> 00:04:04,500
So next up we're going to talk about data tables and look up tables as our first step towards building
59
00:04:04,800 --> 00:04:06,510
a properly normalized model.
6118
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.