Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
1
00:00:07,680 --> 00:00:10,240
Till now we have discussed the aggregate functions
1
2
00:00:10,496 --> 00:00:14,336
Which includes sum, count, average, Min Max functions
2
3
00:00:15,104 --> 00:00:17,920
These functions are mostly used along with the group by clause
3
4
00:00:19,456 --> 00:00:23,552
Group by clause is used to group the results by one or more columns
4
5
00:00:23,808 --> 00:00:25,344
It is very important clause
5
6
00:00:25,856 --> 00:00:31,232
I wanted to pay attention as this will be used a lot many times in your professional career
6
7
00:00:32,256 --> 00:00:34,304
To use group by this is a syntax
7
8
00:00:35,328 --> 00:00:36,608
We write select
8
9
00:00:37,120 --> 00:00:38,144
Give the column names
9
10
00:00:38,912 --> 00:00:40,960
We can use the aggregate functions also
10
11
00:00:41,216 --> 00:00:42,496
From the table name
11
12
00:00:42,752 --> 00:00:45,824
And then write group by a particular column name
12
13
00:00:46,336 --> 00:00:49,408
So what this mean is for example if you want to find out
13
14
00:00:50,176 --> 00:00:54,016
The number of customers who are living in a particular City or in a particular state
14
15
00:00:54,528 --> 00:00:59,904
You will then be grouping the customers by their city name or the state
15
16
00:01:00,160 --> 00:01:03,232
So if you are grouping them by city you will write group by
16
17
00:01:03,488 --> 00:01:04,768
City
17
18
00:01:05,280 --> 00:01:11,424
If you are grouping them by state you will write group by state and you will select the respective
18
19
00:01:11,680 --> 00:01:15,776
Column with respective aggregate functions. Let us see some examples
19
20
00:01:18,592 --> 00:01:19,872
So in the first example
20
21
00:01:20,128 --> 00:01:21,920
Suppose you want to find out
21
22
00:01:22,176 --> 00:01:25,504
How many customers live in which region of the country
22
23
00:01:27,040 --> 00:01:28,064
To Do that
23
24
00:01:28,576 --> 00:01:30,112
We'll select the region
24
25
00:01:30,624 --> 00:01:32,928
And the count of customer ID
25
26
00:01:34,208 --> 00:01:36,512
And this we will do from the table customer
26
27
00:01:36,768 --> 00:01:39,840
And will group this by the column region
27
28
00:01:40,096 --> 00:01:43,168
So if you can imagine this will group the regions
28
29
00:01:43,680 --> 00:01:45,728
Will get the name of the regions
29
30
00:01:47,008 --> 00:01:51,360
And alongside that, it will give us the count of customer ID in that region
30
31
00:01:52,128 --> 00:01:53,664
Let us go and write this query
31
32
00:01:55,712 --> 00:01:59,296
If you scroll this customer table
32
33
00:02:00,320 --> 00:02:04,416
The last column is the region column in which we have different regions
33
34
00:02:04,672 --> 00:02:05,440
Let us
34
35
00:02:05,696 --> 00:02:10,816
Find out the number of customers in each region using the group by clause
35
36
00:02:11,328 --> 00:02:11,840
So
36
37
00:02:12,096 --> 00:02:14,656
We will select
37
38
00:02:15,168 --> 00:02:15,936
The region
38
39
00:02:16,192 --> 00:02:21,568
This will give us the region names then will write the count
39
40
00:02:22,592 --> 00:02:25,152
Of customer IDs
40
41
00:02:26,432 --> 00:02:32,576
This will give us the number of customers, I'll give an alias of
41
42
00:02:32,832 --> 00:02:34,880
Customer count
42
43
00:02:36,928 --> 00:02:43,072
From a Table name which is customer and I will group it by
43
44
00:02:43,328 --> 00:02:47,680
The region column
44
45
00:02:49,728 --> 00:02:51,264
Run this query
45
46
00:02:53,824 --> 00:02:57,408
It ran successfully and we have the result, we have four regions
46
47
00:02:57,920 --> 00:03:00,480
And we have the count of customers in each region
47
48
00:03:01,248 --> 00:03:06,368
We have 134 customers in South region and 255 customers in West region
48
49
00:03:08,928 --> 00:03:13,024
So this was using group by along with the count aggregate function
49
50
00:03:13,280 --> 00:03:14,304
Now let us look at
50
51
00:03:14,816 --> 00:03:16,608
How to use group by along with the
51
52
00:03:16,864 --> 00:03:18,144
Sum aggregate function
52
53
00:03:18,656 --> 00:03:23,008
Suppose we want to find out the quantity of each product sold
53
54
00:03:24,032 --> 00:03:25,312
From the sales table
54
55
00:03:26,336 --> 00:03:27,360
How do we do that
55
56
00:03:28,640 --> 00:03:31,712
To Find out we will write select product ID
56
57
00:03:31,968 --> 00:03:34,016
And the we will have the aggregate function of sum
57
58
00:03:34,272 --> 00:03:37,344
Which is summing the quantity of that product ID sold
58
59
00:03:38,880 --> 00:03:40,160
From the sales table
59
60
00:03:40,416 --> 00:03:42,208
Grouped by their product ID
60
61
00:03:42,720 --> 00:03:44,768
And ordered in the descending order
61
62
00:03:46,304 --> 00:03:47,840
One thing to notice
62
63
00:03:48,096 --> 00:03:52,704
When I am ordering it in descending order, I am using the alias
63
64
00:03:52,960 --> 00:03:57,824
Once alias is defined it can be used further in the query wherever you want to
64
65
00:03:58,336 --> 00:04:04,480
Now let us write this query to find out quantity of each product sold select
65
66
00:04:05,248 --> 00:04:07,296
Product ID
66
67
00:04:09,856 --> 00:04:15,744
, Sum quantity
67
68
00:04:16,256 --> 00:04:22,399
Give it an alias as quantity sold
68
69
00:04:23,679 --> 00:04:25,215
From
69
70
00:04:25,471 --> 00:04:27,775
Sales table
70
71
00:04:28,287 --> 00:04:31,103
Grouped by
71
72
00:04:31,871 --> 00:04:35,967
group by product ID
72
73
00:04:38,783 --> 00:04:41,855
And ordered by
73
74
00:04:42,367 --> 00:04:44,927
Quantity sold
74
75
00:04:45,695 --> 00:04:51,839
In descending order so we will get
75
76
00:04:53,375 --> 00:04:56,447
The maximum sold product at the top
76
77
00:04:57,215 --> 00:05:02,847
So this is the product ID of the maximum sold product, it was sold 75 times
77
78
00:05:03,103 --> 00:05:03,615
Next
78
79
00:05:03,871 --> 00:05:05,919
Product was sold 70 times and so on
79
80
00:05:08,991 --> 00:05:15,135
IN the next example let us see how to use all the aggregate functions that we have learnt before in one single query
80
81
00:05:15,391 --> 00:05:17,439
And along with the group by clause
81
82
00:05:19,743 --> 00:05:23,071
Suppose we are creating a dashboard and trying to find out
82
83
00:05:23,327 --> 00:05:26,655
The top five customers who has spent most at our Store
83
84
00:05:26,911 --> 00:05:30,751
And Get an idea of what is their profile, how much they have spent
84
85
00:05:31,007 --> 00:05:32,287
What is the average spend
85
86
00:05:33,055 --> 00:05:36,383
What is the minimum sales value and what is the maximum sales value
86
87
00:05:37,151 --> 00:05:38,175
To do this
87
88
00:05:38,431 --> 00:05:40,479
We will write select customer ID
88
89
00:05:41,503 --> 00:05:42,783
Minimum sales value
89
90
00:05:43,551 --> 00:05:46,111
And then write the max sales value
90
91
00:05:46,367 --> 00:05:48,671
We'll find out the average sales value
91
92
00:05:48,927 --> 00:05:51,743
And the sum that is the total sales value
92
93
00:05:52,255 --> 00:05:57,631
From the sales table and group it by the customer ID and will order them in descending order
93
94
00:05:57,887 --> 00:06:04,031
We will put a limit of 5 so that the top 5 spenders only remain in the resultant set
94
95
00:06:05,823 --> 00:06:08,383
Let us go and run this query
95
96
00:06:10,943 --> 00:06:13,247
Select
96
97
00:06:14,527 --> 00:06:17,855
Customer ID
97
98
00:06:19,135 --> 00:06:21,951
And
98
99
00:06:22,719 --> 00:06:25,535
Find the minimum of the
99
100
00:06:25,791 --> 00:06:27,327
Sales
100
101
00:06:28,351 --> 00:06:30,911
As Minimum sales
101
102
00:06:36,799 --> 00:06:41,151
We'll find the max sales as maximum sale
102
103
00:06:47,039 --> 00:06:49,599
The average sales
103
104
00:06:53,951 --> 00:06:58,559
As Average
104
105
00:06:58,815 --> 00:07:00,863
Underscore sale
105
106
00:07:02,143 --> 00:07:05,983
Then the sum of all sales at the total sales
106
107
00:07:11,871 --> 00:07:14,175
Space
107
108
00:07:15,455 --> 00:07:18,783
Total sales
108
109
00:07:30,303 --> 00:07:36,447
From the sales table, group by
109
110
00:07:36,703 --> 00:07:39,263
Customer ID
110
111
00:07:43,871 --> 00:07:49,247
Ordered by total sales
111
112
00:07:49,759 --> 00:07:53,855
This is the alias given to sum sales
112
113
00:07:56,671 --> 00:07:58,207
And in the descending order
113
114
00:07:59,999 --> 00:08:06,143
And we will limit the number of records to 5
114
115
00:08:06,399 --> 00:08:10,239
So we want the top 5 spenders only
115
116
00:08:13,311 --> 00:08:17,151
You can see when I run this query I get top
116
117
00:08:17,407 --> 00:08:18,175
5 spenders
117
118
00:08:18,431 --> 00:08:20,479
The topper is the one who has spent
118
119
00:08:20,735 --> 00:08:23,039
25,043 units
119
120
00:08:23,807 --> 00:08:27,135
Average spend of this customer is 16069
120
121
00:08:28,159 --> 00:08:30,719
Minimum spend was 3.4 and maximum is
121
122
00:08:31,231 --> 00:08:33,535
22638
122
123
00:08:34,047 --> 00:08:38,911
So you can present this Dashboard of the highest spenders of your Store
123
124
00:08:43,263 --> 00:08:45,311
One thing you should note here is
124
125
00:08:45,823 --> 00:08:48,639
The columns that we are specifying
125
126
00:08:48,895 --> 00:08:49,919
While selecting
126
127
00:08:51,199 --> 00:08:52,479
these should be unique
127
128
00:08:52,991 --> 00:08:54,527
with the
128
129
00:08:54,783 --> 00:08:56,831
Column mentioned in the group by clause
129
130
00:08:58,111 --> 00:08:59,391
What I mean by this is
130
131
00:08:59,903 --> 00:09:01,183
For example
131
132
00:09:01,439 --> 00:09:05,023
When you are selecting the count of customers with the region
132
133
00:09:05,279 --> 00:09:09,375
If I had put region, age
133
134
00:09:10,655 --> 00:09:12,447
And group by region
134
135
00:09:12,703 --> 00:09:13,983
It could not have been
135
136
00:09:14,751 --> 00:09:20,383
Group together since there are several types of Ages of customers
136
137
00:09:20,639 --> 00:09:23,455
Within same region, So let us try and run this query
137
138
00:09:23,711 --> 00:09:24,991
And see the error
138
139
00:09:25,759 --> 00:09:26,527
SQL gives
139
140
00:09:28,319 --> 00:09:34,463
It is saying column customer. age must appear in group by clause or be used in an aggregate function
140
141
00:09:34,719 --> 00:09:35,231
What this means is
141
142
00:09:37,023 --> 00:09:41,631
What this means is either you can use the age as an aggregate function
142
143
00:09:41,887 --> 00:09:44,191
You can find the average age
143
144
00:09:44,447 --> 00:09:46,239
Or the minimum or maximum age
144
145
00:09:46,751 --> 00:09:48,031
You cannot use it
145
146
00:09:48,287 --> 00:09:49,055
As it is
146
147
00:09:49,311 --> 00:09:52,383
Since it is not unique in one particular region
147
148
00:09:53,407 --> 00:09:55,199
So If you want to find out average age
148
149
00:09:55,711 --> 00:09:58,015
Probably we should have an aggregate function
149
150
00:09:58,271 --> 00:10:00,831
So we should write average
150
151
00:10:03,135 --> 00:10:07,231
Age also here and run this query
151
152
00:10:17,471 --> 00:10:18,495
Now
152
153
00:10:19,263 --> 00:10:20,031
We have the
153
154
00:10:24,639 --> 00:10:25,151
Now
154
155
00:10:25,407 --> 00:10:27,199
If I expand this column of age
155
156
00:10:27,455 --> 00:10:30,271
The Average age in South is 45
156
157
00:10:30,783 --> 00:10:31,807
Whereas
157
158
00:10:32,319 --> 00:10:35,647
In All the other regions it is nearly 44
158
159
00:10:36,671 --> 00:10:39,487
So Remember to use aggregate functions along with group by
159
160
00:10:41,279 --> 00:10:42,559
Another important thing to note
160
161
00:10:43,583 --> 00:10:45,887
We have used only one column
161
162
00:10:46,143 --> 00:10:51,007
While specifying group by clause, you can also specify multiple columns
162
163
00:10:51,263 --> 00:10:57,407
So if I specify two different columns, it will create all the possible combinations of values in
163
164
00:10:57,663 --> 00:10:58,431
Both the columns
164
165
00:10:58,943 --> 00:10:59,711
And
165
166
00:10:59,967 --> 00:11:06,111
It will give us a table of all the possible combinations. So let us add
166
167
00:11:06,367 --> 00:11:07,647
this region
167
168
00:11:07,903 --> 00:11:10,719
Let us add the state also
168
169
00:11:14,559 --> 00:11:19,167
And we will group it by region and state
169
170
00:11:21,983 --> 00:11:23,519
Let us run this query
170
171
00:11:24,543 --> 00:11:25,567
So you can see
171
172
00:11:25,823 --> 00:11:28,383
The maximum customer count
172
173
00:11:28,639 --> 00:11:29,919
Is in the east region
173
174
00:11:30,175 --> 00:11:31,455
Of the New York State
174
175
00:11:32,735 --> 00:11:35,039
Next is the Texas state
175
176
00:11:35,807 --> 00:11:37,087
In the Central region
176
177
00:11:37,855 --> 00:11:41,439
So, It is creating groups of all the possible combinations of
177
178
00:11:41,695 --> 00:11:43,231
Regions and States
178
179
00:11:45,791 --> 00:11:48,607
In the next lecture, we will discuss the having keyword
14110
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.