Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,620 --> 00:00:06,370
Next up we've got text specific tools and looking at the transform tab and the query editor.
2
00:00:06,490 --> 00:00:12,900
You'll notice that power be-I groups different sets of tools together based on their purpose or function.
3
00:00:12,910 --> 00:00:18,390
So in this case we'll find all of our text specific tools grouped together at the end of the ribbon.
4
00:00:18,730 --> 00:00:22,970
And within this group we have some really interesting powerful options.
5
00:00:23,050 --> 00:00:29,680
For one we can split up a column based on a specific character or delimiter or based on a number of
6
00:00:29,680 --> 00:00:30,830
characters.
7
00:00:31,000 --> 00:00:37,780
We can format any of our text columns using basic formatting options like lowercase uppercase proper
8
00:00:37,780 --> 00:00:40,800
case which is capitalizing the first letter of each word.
9
00:00:40,930 --> 00:00:47,440
Or we can use tools like trim which eliminates leading and trailing spaces or clean which does the same
10
00:00:47,440 --> 00:00:51,070
thing and also eliminates non principal characters.
11
00:00:51,070 --> 00:00:56,200
Now you might be thinking that those trim and clean options really aren't that helpful until you run
12
00:00:56,200 --> 00:01:00,510
into the case where you have one trailing space in your data set.
13
00:01:00,700 --> 00:01:05,550
And trust me this will drive you crazy the first time you experience it until you figure it out.
14
00:01:05,710 --> 00:01:11,950
And the problem is as human beings we are incapable of seeing a trailing space it's completely invisible
15
00:01:11,950 --> 00:01:16,030
to us and it looks exactly the same as another data point.
16
00:01:16,030 --> 00:01:23,020
Without that trailing space but to excel or in this case to power be-I it looks like a completely different
17
00:01:23,080 --> 00:01:24,300
and unique value.
18
00:01:24,310 --> 00:01:31,000
So in cases like that trim or clean can be great tools just to standardize and help avoid issues like
19
00:01:31,000 --> 00:01:35,440
that especially if you're working with really messy or unstructured text data.
20
00:01:35,440 --> 00:01:42,040
We also have some great extract tools we can extract a certain subset of characters from a string based
21
00:01:42,040 --> 00:01:43,390
on a specified length.
22
00:01:43,390 --> 00:01:49,630
We can extract the first or last number of characters or a range but where it gets really interesting
23
00:01:49,900 --> 00:01:51,370
is using delimiters.
24
00:01:51,580 --> 00:01:58,420
So we can tell power be-I that we want to return all of the characters before a specific delimiter or
25
00:01:58,420 --> 00:02:03,660
symbol or character or after it or between two distinct delimiters.
26
00:02:03,670 --> 00:02:09,010
There are also some advanced options as well that allow you to specify whether you search from the left
27
00:02:09,010 --> 00:02:10,630
side of the string or the right.
28
00:02:10,750 --> 00:02:15,930
And if you want to skip a certain number of instances of the delimiter before returning text.
29
00:02:16,060 --> 00:02:19,790
So some really great flexibility there with those extract tools.
30
00:02:19,990 --> 00:02:26,290
So you may have noticed that some of the tools here are greyed out inactive like merge columns or parce.
31
00:02:26,500 --> 00:02:32,440
And that brings up a really important point which is that this toolbar is completely dynamic based on
32
00:02:32,440 --> 00:02:33,600
what you've selected.
33
00:02:33,730 --> 00:02:39,400
So if you've only selected a single column the merge columns option is irrelevant so you can't even
34
00:02:39,400 --> 00:02:40,190
click it.
35
00:02:40,190 --> 00:02:45,250
You don't have to select multiple columns in order to activate that option and then taking that even
36
00:02:45,250 --> 00:02:52,030
further if you've selected a column that's numerical instead of text this entire group of tools might
37
00:02:52,030 --> 00:02:55,150
be replaced by a number of specific tools instead.
38
00:02:55,330 --> 00:03:00,760
So just remember that the entire ribbon the entire toolbar that you're seeing here will dynamically
39
00:03:00,760 --> 00:03:03,350
change based on your selections.
40
00:03:03,370 --> 00:03:09,700
Now one more very important point before we shift gears into power be-I any time you see this yellow
41
00:03:09,700 --> 00:03:12,230
box that says hey this is important.
42
00:03:12,310 --> 00:03:12,970
You guessed it.
43
00:03:13,000 --> 00:03:14,970
I'm about to talk about something important.
44
00:03:15,100 --> 00:03:17,960
So that means focus and pay attention.
45
00:03:17,980 --> 00:03:23,170
Stop playing with the dog or checking Facebook because this is something that is going to come up time
46
00:03:23,170 --> 00:03:24,500
and time again in the course.
47
00:03:24,730 --> 00:03:28,870
And it's really really important for you to fully grasp and understand.
48
00:03:28,870 --> 00:03:34,510
So what I want to talk about now is the difference between transform and add column because this is
49
00:03:34,510 --> 00:03:38,100
something that confused me for longer than I care to admit.
50
00:03:38,260 --> 00:03:40,960
And what I was noticing is that the same tools.
51
00:03:41,140 --> 00:03:46,620
In fact the same identical sets of tools kept popping up in different places.
52
00:03:46,780 --> 00:03:52,660
And for the longest time I just thought this was really confusing and redundant until I realized that
53
00:03:52,660 --> 00:03:57,890
the outcome is completely different depending on where you select that tool.
54
00:03:58,060 --> 00:04:04,120
So when you select a tool from within the transform tab you're essentially modifying or overwriting
55
00:04:04,300 --> 00:04:05,950
the column that you've selected.
56
00:04:05,950 --> 00:04:11,600
But when you choose a tool from the add column tab you're creating a brand new column to your table.
57
00:04:11,890 --> 00:04:18,250
So that may sound really obvious when I say it now but I guarantee as you're learning this tool and
58
00:04:18,250 --> 00:04:24,010
as you're playing around with the query editor you will at some point in time select the tool from the
59
00:04:24,010 --> 00:04:28,480
wrong tab and that's OK because obviously nothing's set in stone.
60
00:04:28,510 --> 00:04:33,340
It's as simple as just deleting that last applied step and you're back where you started.
61
00:04:33,520 --> 00:04:38,410
But it is something to keep in mind and hopefully that will help you at least understand where you're
62
00:04:38,410 --> 00:04:41,800
going wrong and why these tools appear in multiple places.
63
00:04:41,950 --> 00:04:48,280
So let's open up power be-I and get our hands dirty with some of these text tools.
64
00:04:48,310 --> 00:04:51,220
So here I am in my Adventure Works report.
65
00:04:51,340 --> 00:04:53,090
Go ahead and open it up as well.
66
00:04:53,300 --> 00:05:00,120
Going to head back into the get data option quick text ESV again this time instead of products.
67
00:05:00,140 --> 00:05:06,990
Let's go into Adventure Works customers and double click here we've got our preview.
68
00:05:07,150 --> 00:05:12,870
All of these default settings look good so I can go ahead and click it to launch the query editor.
69
00:05:14,690 --> 00:05:15,020
All right.
70
00:05:15,020 --> 00:05:20,630
So first things first here's our A.W. product lookup query that we created in the last lecture and now
71
00:05:20,630 --> 00:05:23,090
we see a second for Adventure Works.
72
00:05:23,090 --> 00:05:24,290
Customer.
73
00:05:24,290 --> 00:05:29,040
So let's go through those two steps that we do every time we get in here to the query editor.
74
00:05:29,130 --> 00:05:36,670
To start with the table name and for the sake of consistency let's go with A.W. underscore Whoops.
75
00:05:36,740 --> 00:05:44,150
Customer look up the last file that we loaded was called product look at and it was called product look
76
00:05:44,150 --> 00:05:51,260
up for a reason because it had all sorts of information about individual products where each row represented
77
00:05:51,290 --> 00:05:56,960
a unique product key and each column represented some attribute about that product.
78
00:05:57,260 --> 00:06:01,170
Now similarly here we're looking at customer data instead.
79
00:06:01,280 --> 00:06:06,950
So you can press Enter lock in my table name and as they look in the preview here as you can see I've
80
00:06:06,950 --> 00:06:11,880
got a customer key and then all sorts of attributes about my customers.
81
00:06:12,170 --> 00:06:14,630
So let's go ahead and finish our quick Q&A.
82
00:06:14,750 --> 00:06:17,050
We've got column headers that look correct.
83
00:06:17,240 --> 00:06:19,420
Let's look at the data values very quickly.
84
00:06:19,460 --> 00:06:25,440
Our key is a whole number which is good got names which are strings or text fields which are good.
85
00:06:25,520 --> 00:06:31,210
Got a birth date column which is a date represented by this little calendar icon in the top left.
86
00:06:31,490 --> 00:06:39,060
And then as we scroll through marital status gender e-mail all text.
87
00:06:39,270 --> 00:06:45,680
This one's interesting and actually recognize the annual income field as a currency or fixed decimal.
88
00:06:45,720 --> 00:06:47,540
That's fine we can leave it as is.
89
00:06:48,120 --> 00:06:53,830
And then we've got total children education level occupation and a homeowner.
90
00:06:53,880 --> 00:06:54,910
Yes no flag.
91
00:06:55,020 --> 00:06:55,760
All text.
92
00:06:55,800 --> 00:06:57,940
So check those boxes.
93
00:06:57,960 --> 00:06:59,130
Everything looks good.
94
00:06:59,280 --> 00:07:02,850
And now we can proceed to our table transformations.
95
00:07:02,850 --> 00:07:04,480
So first things first.
96
00:07:04,710 --> 00:07:10,470
Scroll over I've got my eye on these these named columns because they're all uppercase and that looks
97
00:07:10,470 --> 00:07:10,980
silly to me.
98
00:07:10,980 --> 00:07:13,870
So I'd like to make those proper case instead.
99
00:07:13,890 --> 00:07:21,210
So let's go ahead and select the prefix column go into add column Feiner text tools here.
100
00:07:21,510 --> 00:07:27,860
And what we want to do is go into the format tools and click capitalize each word which is proper case.
101
00:07:28,140 --> 00:07:29,640
When we do that.
102
00:07:30,010 --> 00:07:37,410
So it created a new column here at the end of our table which is essentially a duplicate version of
103
00:07:37,410 --> 00:07:39,240
our prefix column.
104
00:07:39,240 --> 00:07:45,920
But with the correct capitalization and you can see that the header name has been defaulted to capitalize
105
00:07:45,930 --> 00:07:48,770
each word which is just totally ridiculous.
106
00:07:48,840 --> 00:07:55,350
And as you probably have guessed by now the reason this happened was because we accessed that format
107
00:07:55,370 --> 00:08:02,340
tool through the add column tab when what we really wanted to do was simply overwrite the existing column
108
00:08:02,730 --> 00:08:04,060
and transform it.
109
00:08:04,230 --> 00:08:09,430
So we should have accessed the formatting tools through the transform tab instead.
110
00:08:09,630 --> 00:08:13,560
And so that's what I'm talking about it's very easy to make these kind of mistakes.
111
00:08:13,560 --> 00:08:18,960
So you really just have to be vigilant about knowing which tab you're in each time you create a calculate
112
00:08:19,060 --> 00:08:20,670
column but no worries.
113
00:08:20,670 --> 00:08:23,080
The good news is that it's a very easy fix.
114
00:08:23,160 --> 00:08:25,340
You can't control Z to undo here.
115
00:08:25,350 --> 00:08:31,050
The equivalent is just going into your applied steps and pressing the X to remove that last step that
116
00:08:31,050 --> 00:08:32,110
we just created.
117
00:08:32,340 --> 00:08:40,400
So now scroll back select the prefix column again make sure that Im in my transform tab and go through
118
00:08:40,400 --> 00:08:44,040
that same process format capitalize each word.
119
00:08:44,330 --> 00:08:48,390
And there you go it's overwritten the values into the format that I'd prefer.
120
00:08:48,620 --> 00:08:54,170
Now to do the same thing to the first and last name columns I could do them individually or I can select
121
00:08:54,170 --> 00:09:01,130
one hold the shift key click the other that will grab both in the same time and let's do the two for
122
00:09:01,130 --> 00:09:03,770
one special capitalized each word.
123
00:09:03,770 --> 00:09:07,760
So now I have a nice clean prefix first name and last name.
124
00:09:08,030 --> 00:09:14,360
So that's helpful but I'd really like to have the customer's full name accessible in a single field
125
00:09:14,390 --> 00:09:15,410
as well.
126
00:09:15,410 --> 00:09:22,010
So what that means is that I'd like to merge all three of these columns together and separate them with
127
00:09:22,010 --> 00:09:26,470
a space to create a brand new column that will name full name.
128
00:09:26,690 --> 00:09:33,950
So what I've done is select prefix held shift so that last name to grab all three and instead of clicking
129
00:09:33,950 --> 00:09:38,070
merge warning signs we're still in the transform tab.
130
00:09:38,120 --> 00:09:44,250
I need to make sure to go into add column instead and access the merge tool from here.
131
00:09:44,390 --> 00:09:49,650
So when I click merge columns it gives me a little dialog box with some options.
132
00:09:49,850 --> 00:09:55,880
In this case I do want a separator in between the values I want that to be a space and let's name our
133
00:09:55,880 --> 00:09:58,910
new column full name.
134
00:09:59,300 --> 00:10:00,570
So press OK.
135
00:10:01,390 --> 00:10:05,890
And you can see it's added a new step and I scroll to the right.
136
00:10:05,890 --> 00:10:11,000
There we go we've got our new full name column looks good we've got spaces between the names.
137
00:10:11,260 --> 00:10:13,470
And now one thing to note is kind of a side note.
138
00:10:13,590 --> 00:10:19,870
It and this step but it kind of gave it a default name that says inserted merged column.
139
00:10:19,870 --> 00:10:25,960
Now as I click through it's pretty easy to see exactly what column was merged and what was added.
140
00:10:25,960 --> 00:10:33,040
But if we want to make this even more clear I can always right click and rename that step and instead
141
00:10:33,040 --> 00:10:38,430
of just saying inserted merged column can actually type full name.
142
00:10:38,530 --> 00:10:40,810
That just makes it a little bit more explicit.
143
00:10:40,900 --> 00:10:41,650
What's going on.
144
00:10:41,650 --> 00:10:44,410
I inserted a new column and it's called full name.
145
00:10:44,440 --> 00:10:44,800
All right.
146
00:10:44,800 --> 00:10:48,190
So next up I want to make a couple more changes.
147
00:10:48,400 --> 00:10:55,090
I want to really focus on this email address column here and the first thing that I want to do is extract
148
00:10:55,240 --> 00:10:57,580
the username from this email.
149
00:10:57,670 --> 00:11:03,670
But the problem is I can't just say give me the first five characters because that would work for John
150
00:11:03,670 --> 00:11:06,380
24 but not Eugene 10.
151
00:11:06,550 --> 00:11:12,700
So it's got to be a dynamic number of characters from the left side of this email address based on the
152
00:11:12,700 --> 00:11:14,940
location of that at symbol.
153
00:11:14,950 --> 00:11:19,660
So this is where we're going to use that extract based on delimiter option.
154
00:11:19,780 --> 00:11:22,000
So go ahead and select e-mail address.
155
00:11:22,210 --> 00:11:25,810
I want to add a new column for username.
156
00:11:25,810 --> 00:11:28,380
I'm going to use the extract tools here.
157
00:11:28,420 --> 00:11:35,560
So again not just the default length not an explicit number of characters but I want the text before
158
00:11:35,590 --> 00:11:38,420
a delimiter so here you go.
159
00:11:38,440 --> 00:11:45,120
All I need to do is say the delimiter that you're looking for is that at sign we drill into the advanced
160
00:11:45,120 --> 00:11:45,970
options.
161
00:11:46,020 --> 00:11:50,100
This is where you can say you know search from the left or search from the right.
162
00:11:50,220 --> 00:11:53,630
You can skip a certain number of instances of the delimiter.
163
00:11:53,730 --> 00:11:55,320
This case we don't need any of that.
164
00:11:55,410 --> 00:11:59,010
All we want is the text before the at sign.
165
00:11:59,010 --> 00:12:06,930
So press OK add in the new column as you can see it didn't let me name the column at the same time but
166
00:12:06,930 --> 00:12:08,840
it did give me the values that I want.
167
00:12:09,030 --> 00:12:15,090
So what I can do here is just double click the column header and I can give it a name let's call it
168
00:12:15,260 --> 00:12:16,700
user name.
169
00:12:16,710 --> 00:12:21,480
Now let's do one more similar example but with one more complication.
170
00:12:21,600 --> 00:12:27,960
So going back to our email address what if this time instead of the username we want to return the domain
171
00:12:27,960 --> 00:12:35,230
name and the domain is anything that falls between the at symbol and the dot com at the end of the string.
172
00:12:35,430 --> 00:12:42,960
To do that I can use the same extract tools but this time I want the text between two delimiters so
173
00:12:42,960 --> 00:12:49,980
I can go into this option here and I can say my start delimiter is the at sign and my end delimiter
174
00:12:50,370 --> 00:12:52,830
is either the period or I can type.
175
00:12:52,830 --> 00:12:55,510
Dot com both will do the same thing.
176
00:12:55,640 --> 00:12:56,250
And when I press.
177
00:12:56,280 --> 00:12:57,700
OK.
178
00:12:57,720 --> 00:12:59,550
Same exact idea.
179
00:12:59,550 --> 00:13:04,760
It's given me a new column with the string or the characters between those two delimiters.
180
00:13:04,770 --> 00:13:10,030
It can change the column title to something like domain and press enter.
181
00:13:10,290 --> 00:13:11,010
And there you have it.
182
00:13:11,010 --> 00:13:15,700
So very very easy way to do some pretty complex stuff.
183
00:13:15,990 --> 00:13:19,170
So one last adjustment to make to this domain column here.
184
00:13:19,440 --> 00:13:26,430
I want to get rid of the dash and replace it with a space so to do that I'm going to transform the existing
185
00:13:26,430 --> 00:13:32,450
column not add a new one and you'll see this other grouping of tools that say any column.
186
00:13:32,610 --> 00:13:36,190
That means there are not text or number or date specific.
187
00:13:36,210 --> 00:13:39,160
They can be applied in a number of different circumstances.
188
00:13:39,240 --> 00:13:43,240
In this case I'll find the replace tool right here in the set.
189
00:13:43,320 --> 00:13:45,220
So it's as simple as it sounds.
190
00:13:45,420 --> 00:13:50,270
I can find the dash and I can replace it with a space press.
191
00:13:50,280 --> 00:13:51,370
OK.
192
00:13:51,660 --> 00:13:59,330
Now I've got adventure space works and why don't we go ahead and format that capitalized in proper case
193
00:13:59,660 --> 00:14:01,720
and we are all set.
194
00:14:01,730 --> 00:14:02,780
So there you go.
195
00:14:02,780 --> 00:14:04,680
We use the formatting options.
196
00:14:04,790 --> 00:14:07,580
We added columns we transformed columns.
197
00:14:07,580 --> 00:14:10,620
We use the extract tools based on delimiters.
198
00:14:10,670 --> 00:14:12,320
We merged columns.
199
00:14:12,440 --> 00:14:18,620
Some really good examples of how these different text functions can be used to enhance and clean and
200
00:14:18,620 --> 00:14:20,640
transform a data table.
201
00:14:20,870 --> 00:14:24,140
So I think we're all good with customers look up.
202
00:14:24,140 --> 00:14:30,920
The only thing left is to go back to our home tab and click close and apply to load that data into our
203
00:14:30,920 --> 00:14:39,420
file and as soon as that loads up we'll see that it's going to appear as another object right here in
204
00:14:39,420 --> 00:14:45,540
our relationships tab right next to our friend product look up we'll also see it available in our field
205
00:14:45,540 --> 00:14:49,950
list accessible in both the data view and the report view.
206
00:14:49,950 --> 00:14:53,040
So there you have it queery editing text tools.
21566
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.