Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:01,000
Instructor: Up next, I want to talk about
2
00:00:01,000 --> 00:00:04,000
grouping and aggregating data.
3
00:00:04,000 --> 00:00:06,000
So within the transform tab of the Query Editor
4
00:00:06,000 --> 00:00:09,000
you'll see this Group By option on the left
5
00:00:09,000 --> 00:00:11,000
and this allows you to aggregate
6
00:00:11,000 --> 00:00:14,000
or roll up your data at different levels of granularity.
7
00:00:14,000 --> 00:00:16,000
So one of the common examples of this
8
00:00:16,000 --> 00:00:18,000
would be doing something like transforming
9
00:00:18,000 --> 00:00:21,000
daily transactions into weekly or monthly
10
00:00:21,000 --> 00:00:25,000
or rolling up transaction level data by store, by product,
11
00:00:25,000 --> 00:00:27,000
brand or region, et cetera.
12
00:00:27,000 --> 00:00:31,000
So it's taking a really deep, really detailed table
13
00:00:31,000 --> 00:00:34,000
and rolling it up into a higher level summary.
14
00:00:34,000 --> 00:00:37,000
So let's take a quick look at an example.
15
00:00:37,000 --> 00:00:41,000
Here we've got order quantities by order date, product key,
16
00:00:41,000 --> 00:00:42,000
and customer key.
17
00:00:42,000 --> 00:00:44,000
And the important thing to note here
18
00:00:44,000 --> 00:00:48,000
is that we have multiple orders for a given product key.
19
00:00:48,000 --> 00:00:51,000
In this case the product number 214
20
00:00:51,000 --> 00:00:54,000
has been ordered multiple times on multiple days
21
00:00:54,000 --> 00:00:57,000
by multiple different customers.
22
00:00:57,000 --> 00:00:58,000
And if you wanted to transform this table
23
00:00:58,000 --> 00:01:01,000
into a summary of orders
24
00:01:01,000 --> 00:01:04,000
or order quantities rolled up by unique product keys,
25
00:01:04,000 --> 00:01:08,000
the Group By option is a great way to do that.
26
00:01:08,000 --> 00:01:09,000
So when you click the Group By button
27
00:01:09,000 --> 00:01:12,000
you'll see this dialogue box that looks something like this
28
00:01:12,000 --> 00:01:15,000
that basically allows you to tell Power BI,
29
00:01:15,000 --> 00:01:19,000
hey, I'd like to group this table by unique product keys.
30
00:01:19,000 --> 00:01:22,000
And the operation that I'd like to evaluate
31
00:01:22,000 --> 00:01:24,000
for those product keys is the sum
32
00:01:24,000 --> 00:01:26,000
of the order quantity column.
33
00:01:26,000 --> 00:01:30,000
In other words, you're taking all of these duplicate rows
34
00:01:30,000 --> 00:01:33,000
with the same product key and multiple order quantities
35
00:01:33,000 --> 00:01:36,000
and you're compressing them down to one.
36
00:01:36,000 --> 00:01:38,000
When you do that compression or aggregation
37
00:01:38,000 --> 00:01:41,000
how do we want to treat those order quantity values?
38
00:01:41,000 --> 00:01:44,000
Now in this case, we're taking a simple sum
39
00:01:44,000 --> 00:01:47,000
and we're gonna end up with a table like this,
40
00:01:47,000 --> 00:01:50,000
two columns, one containing unique distinct product keys,
41
00:01:50,000 --> 00:01:53,000
and the second containing the total quantity
42
00:01:53,000 --> 00:01:55,000
or the sum of order quantity values
43
00:01:55,000 --> 00:01:57,000
associated with each product key.
44
00:01:57,000 --> 00:02:01,000
So to recap, what we've essentially done is transform
45
00:02:01,000 --> 00:02:04,000
a daily transaction level table into a summary
46
00:02:04,000 --> 00:02:07,000
of total quantity rolled up by product keys.
47
00:02:07,000 --> 00:02:09,000
And the important thing to note here
48
00:02:09,000 --> 00:02:10,000
as you may have noticed,
49
00:02:10,000 --> 00:02:13,000
any fields that we aren't specifically calling out
50
00:02:13,000 --> 00:02:15,000
in our Group By settings are lost.
51
00:02:15,000 --> 00:02:17,000
They're not preserved in that final table
52
00:02:17,000 --> 00:02:21,000
because we've eliminated that level of granularity.
53
00:02:21,000 --> 00:02:24,000
Now, one more example, using the advanced option
54
00:02:24,000 --> 00:02:26,000
and let's consider that same exact table
55
00:02:26,000 --> 00:02:27,000
that we started with.
56
00:02:27,000 --> 00:02:30,000
We're using the same exact Group By option
57
00:02:30,000 --> 00:02:33,000
but this time we're looking at the advanced version.
58
00:02:33,000 --> 00:02:35,000
And really the only difference here between the basic
59
00:02:35,000 --> 00:02:39,000
and advanced is that advanced allows you to specify
60
00:02:39,000 --> 00:02:43,000
multiple columns or additional columns to group by.
61
00:02:43,000 --> 00:02:46,000
So this time, instead of just grouping by product key
62
00:02:46,000 --> 00:02:49,000
we're grouping by product key and customer key.
63
00:02:49,000 --> 00:02:52,000
And again, we're evaluating that same operation.
64
00:02:52,000 --> 00:02:55,000
The sum of the order quantity values, just like before.
65
00:02:56,000 --> 00:03:00,000
This time, what we end up with is a three column table
66
00:03:00,000 --> 00:03:03,000
with product key, customer key and total quantity.
67
00:03:03,000 --> 00:03:06,000
So to recap, we've again transformed
68
00:03:06,000 --> 00:03:08,000
that daily transaction level table
69
00:03:08,000 --> 00:03:11,000
now into a summary of total quantity
70
00:03:11,000 --> 00:03:14,000
aggregated by both product key and customer key.
71
00:03:14,000 --> 00:03:17,000
In other words, we're now looking at quantities
72
00:03:17,000 --> 00:03:20,000
by each unique combination of those two fields.
73
00:03:20,000 --> 00:03:22,000
And the best comparison that I can make to this,
74
00:03:22,000 --> 00:03:24,000
it's really just like creating a pivot table
75
00:03:24,000 --> 00:03:26,000
and pulling in the sum of order quantity
76
00:03:26,000 --> 00:03:28,000
into your values and your product key
77
00:03:28,000 --> 00:03:31,000
and customer fields into your row labels.
78
00:03:32,000 --> 00:03:34,000
All right, so that's grouping and aggregating data
79
00:03:34,000 --> 00:03:35,000
in a nutshell.
80
00:03:35,000 --> 00:03:36,000
Let's jump over to Power BI
81
00:03:36,000 --> 00:03:40,000
and we'll actually work through an example or two.
82
00:03:40,000 --> 00:03:42,000
All right, so back in my Adventure Works report.
83
00:03:42,000 --> 00:03:43,000
And instead of adding new data
84
00:03:43,000 --> 00:03:46,000
we're gonna jump right back to the Query Editor
85
00:03:46,000 --> 00:03:49,000
and we're gonna edit the sales data table.
86
00:03:50,000 --> 00:03:54,000
All right, so if we select our sales data 2022 table
87
00:03:54,000 --> 00:03:56,000
which contains daily sales records
88
00:03:56,000 --> 00:04:00,000
broken down by product and customer keys, right?
89
00:04:00,000 --> 00:04:01,000
Territory keys, right?
90
00:04:01,000 --> 00:04:04,000
This is a great candidate for using Group By
91
00:04:04,000 --> 00:04:06,000
and aggregation tools.
92
00:04:06,000 --> 00:04:08,000
So just like in our demo
93
00:04:08,000 --> 00:04:10,000
we're gonna select the product key here, all right?
94
00:04:10,000 --> 00:04:13,000
And let's say that we want to group this entire table
95
00:04:13,000 --> 00:04:15,000
which contains multiple product keys
96
00:04:15,000 --> 00:04:18,000
and multiple instances of product keys, right?
97
00:04:18,000 --> 00:04:21,000
And we can see that if I sort ascending, right
98
00:04:21,000 --> 00:04:24,000
we've got all of these different product keys
99
00:04:24,000 --> 00:04:27,000
that are repeating for different orders,
100
00:04:27,000 --> 00:04:29,000
and we could scroll down here
101
00:04:29,000 --> 00:04:31,000
might take a while if we scroll, but trust me, right,
102
00:04:31,000 --> 00:04:34,000
there's a lot of different product keys
103
00:04:34,000 --> 00:04:36,000
contained within this table.
104
00:04:36,000 --> 00:04:39,000
So the idea here is that we want to take this table,
105
00:04:39,000 --> 00:04:42,000
collapse these rows with multiple product keys
106
00:04:42,000 --> 00:04:44,000
into essentially a summary table.
107
00:04:44,000 --> 00:04:46,000
And just like the example on the slide
108
00:04:46,000 --> 00:04:49,000
we want to evaluate the sum of order quantity
109
00:04:49,000 --> 00:04:52,000
for each one of these product keys.
110
00:04:52,000 --> 00:04:54,000
So from here, what we wanna do is head up
111
00:04:54,000 --> 00:04:57,000
to the Transform tab, and then we're gonna click Group By.
112
00:04:57,000 --> 00:05:01,000
And we're gonna start with our basic option here.
113
00:05:01,000 --> 00:05:03,000
So we want to group by product key
114
00:05:03,000 --> 00:05:05,000
and for our new column name
115
00:05:05,000 --> 00:05:08,000
let's call this something like Total Quantity
116
00:05:10,000 --> 00:05:12,000
and the operation that we want to perform here
117
00:05:12,000 --> 00:05:13,000
we have a bunch of different options
118
00:05:13,000 --> 00:05:16,000
for some different statistics tools
119
00:05:16,000 --> 00:05:17,000
or aggregation functions.
120
00:05:17,000 --> 00:05:21,000
So sum, average, median min, max, et cetera, right?
121
00:05:21,000 --> 00:05:26,000
And we want to find the sum of our order quantity column.
122
00:05:26,000 --> 00:05:27,000
So we'll press OK,
123
00:05:28,000 --> 00:05:30,000
and just like our demo
124
00:05:30,000 --> 00:05:33,000
this collapsed our data into a two column table
125
00:05:33,000 --> 00:05:36,000
that's now aggregated by the quantity field
126
00:05:36,000 --> 00:05:39,000
with the associated product keys, right?
127
00:05:39,000 --> 00:05:42,000
And we have a unique list of product keys here
128
00:05:42,000 --> 00:05:45,000
which we can see again if we sort this column, right?
129
00:05:45,000 --> 00:05:48,000
So instead of having multiple repeated values
130
00:05:48,000 --> 00:05:50,000
and rows for product 214.
131
00:05:50,000 --> 00:05:54,000
We have a single row for 214, 215, 220,
132
00:05:54,000 --> 00:05:55,000
and so on.
133
00:05:55,000 --> 00:05:57,000
Now let's do one more example here
134
00:05:57,000 --> 00:06:01,000
and I'm actually gonna remove these applied steps here.
135
00:06:02,000 --> 00:06:04,000
So we'll select our product key column again.
136
00:06:05,000 --> 00:06:08,000
Come back up, we'll group by,
137
00:06:08,000 --> 00:06:11,000
and this time I wanna select the advanced button, right?
138
00:06:11,000 --> 00:06:14,000
So what this again allows us to do is add a grouping here
139
00:06:14,000 --> 00:06:17,000
and you can add any other sort of fields
140
00:06:17,000 --> 00:06:20,000
that you want to pull in from the table, right,
141
00:06:20,000 --> 00:06:24,000
by date, order number, customer key, territory key,
142
00:06:24,000 --> 00:06:25,000
you know, so on.
143
00:06:25,000 --> 00:06:28,000
And here I'm going to select customer key
144
00:06:28,000 --> 00:06:30,000
because I wanna see the total sales
145
00:06:30,000 --> 00:06:34,000
for every combination of product and customer.
146
00:06:34,000 --> 00:06:36,000
And again, we can call this total quantity.
147
00:06:38,000 --> 00:06:41,000
The operation we'll do a sum, and we're gonna base that
148
00:06:41,000 --> 00:06:44,000
on the order quantity column.
149
00:06:44,000 --> 00:06:45,000
We click OK.
150
00:06:46,000 --> 00:06:48,000
And now our result changes.
151
00:06:48,000 --> 00:06:51,000
We now have a three column table
152
00:06:51,000 --> 00:06:56,000
containing the unique combinations of product key,
153
00:06:56,000 --> 00:07:00,000
and we'll sort this, right of product key, customer key,
154
00:07:00,000 --> 00:07:02,000
and our total quantity, right?
155
00:07:02,000 --> 00:07:05,000
So now you can see that we have multiple instances
156
00:07:05,000 --> 00:07:09,000
of product key because we've added another dimension here,
157
00:07:09,000 --> 00:07:10,000
our customer key.
158
00:07:10,000 --> 00:07:12,000
So there you have it.
159
00:07:12,000 --> 00:07:13,000
And once again, let's go back
160
00:07:13,000 --> 00:07:16,000
and we'll remove these applied steps
161
00:07:16,000 --> 00:07:18,000
for grouping and sorting.
162
00:07:18,000 --> 00:07:19,000
And at this point,
163
00:07:19,000 --> 00:07:22,000
we actually don't need to save any of this
164
00:07:22,000 --> 00:07:24,000
because we don't wanna load any of these changes
165
00:07:24,000 --> 00:07:25,000
into the data model.
166
00:07:25,000 --> 00:07:28,000
We really haven't made any sort of significant changes.
167
00:07:28,000 --> 00:07:30,000
So no real action is needed here.
168
00:07:30,000 --> 00:07:33,000
All right, that's gonna wrap up your crash course
169
00:07:33,000 --> 00:07:35,000
on grouping and aggregating data.
13580
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.