Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:01,000
Instructor: Last but not least,
2
00:00:01,000 --> 00:00:04,000
we have arguably the most powerful AI visual of all,
3
00:00:04,000 --> 00:00:06,000
Key Influencers.
4
00:00:06,000 --> 00:00:08,000
And the purpose of the key influencer visual
5
00:00:08,000 --> 00:00:09,000
is to help you understand
6
00:00:09,000 --> 00:00:12,000
these statistically significant factors
7
00:00:12,000 --> 00:00:15,000
that drive specific metrics or outcomes.
8
00:00:15,000 --> 00:00:17,000
And this is quite a bit more complex
9
00:00:17,000 --> 00:00:19,000
because now we're not just visualizing
10
00:00:19,000 --> 00:00:22,000
or summarizing information, under the hood,
11
00:00:22,000 --> 00:00:26,000
Power BI is running linear and logistic regression models
12
00:00:26,000 --> 00:00:28,000
that are actually quantifying the relationships
13
00:00:28,000 --> 00:00:31,000
between the variables in our data set.
14
00:00:31,000 --> 00:00:34,000
And this can be used for a number of different purposes.
15
00:00:34,000 --> 00:00:37,000
We can analyze categorical outcomes, which are discreet,
16
00:00:37,000 --> 00:00:40,000
like an email being marked as spam
17
00:00:40,000 --> 00:00:44,000
or a customer review being marked as positive or negative,
18
00:00:44,000 --> 00:00:46,000
we can analyze continuous outcomes,
19
00:00:46,000 --> 00:00:49,000
like the factors that impact the price of a home,
20
00:00:49,000 --> 00:00:52,000
and we can also identify top segments
21
00:00:52,000 --> 00:00:55,000
based on different combinations of factors.
22
00:00:55,000 --> 00:00:57,000
So in this particular example,
23
00:00:57,000 --> 00:00:59,000
we're identifying factors that are highly correlated
24
00:00:59,000 --> 00:01:00,000
with owning a home,
25
00:01:00,000 --> 00:01:02,000
and here we see that parents
26
00:01:02,000 --> 00:01:07,000
are 1.59 times more likely to be homeowners, all else equal.
27
00:01:08,000 --> 00:01:09,000
And in our top segment view,
28
00:01:09,000 --> 00:01:11,000
we can identify customer segments
29
00:01:11,000 --> 00:01:15,000
where this particular outcome, home ownership, is likely.
30
00:01:15,000 --> 00:01:19,000
So for example, 93% of married customers
31
00:01:19,000 --> 00:01:22,000
with children and a bachelor's degree own a home
32
00:01:22,000 --> 00:01:26,000
compared to only 67.6% overall.
33
00:01:26,000 --> 00:01:29,000
And that could be an incredibly powerful tool
34
00:01:29,000 --> 00:01:30,000
in the real world
35
00:01:30,000 --> 00:01:32,000
for things like customer segmentation
36
00:01:32,000 --> 00:01:36,000
or defining ideal customer profiles or ICPs,
37
00:01:36,000 --> 00:01:38,000
for instance, we might run a similar analysis
38
00:01:38,000 --> 00:01:41,000
to analyze things like customer churn or retention,
39
00:01:41,000 --> 00:01:44,000
purchase probability, and so on,
40
00:01:44,000 --> 00:01:46,000
and use the results to help us do things
41
00:01:46,000 --> 00:01:49,000
like refine our marketing strategies or brand messaging
42
00:01:49,000 --> 00:01:52,000
in ways that we may have never even considered
43
00:01:52,000 --> 00:01:55,000
without this type of data-driven analysis.
44
00:01:55,000 --> 00:01:57,000
So let's go ahead and open up our AdventureWorks report
45
00:01:57,000 --> 00:01:59,000
and dig a little bit deeper
46
00:01:59,000 --> 00:02:01,000
into how this key influencer visual actually works.
47
00:02:02,000 --> 00:02:04,000
All right, so if you'd like to follow along,
48
00:02:04,000 --> 00:02:07,000
let's go ahead and add one more report page here.
49
00:02:07,000 --> 00:02:10,000
We're gonna call it Key Influencers,
50
00:02:10,000 --> 00:02:12,000
and we'll insert that visual
51
00:02:12,000 --> 00:02:14,000
right here in our AI visuals group,
52
00:02:14,000 --> 00:02:16,000
kind of looks like a lollipop chart.
53
00:02:16,000 --> 00:02:18,000
Let's drag it out.
54
00:02:18,000 --> 00:02:19,000
And our three build options
55
00:02:19,000 --> 00:02:21,000
are 'analyzed' and 'explained by'
56
00:02:21,000 --> 00:02:23,000
just like our decomposition tree.
57
00:02:23,000 --> 00:02:25,000
And a third for 'expand by,'
58
00:02:25,000 --> 00:02:27,000
which we'll talk about in just a bit.
59
00:02:27,000 --> 00:02:29,000
For now, we're gonna focus on these first two.
60
00:02:29,000 --> 00:02:31,000
And let's stick with our simple
61
00:02:31,000 --> 00:02:33,000
kind of categorical analysis,
62
00:02:33,000 --> 00:02:35,000
trying to understand what drives
63
00:02:35,000 --> 00:02:38,000
whether or not a customer owns a home.
64
00:02:38,000 --> 00:02:41,000
So that means for our analyze field,
65
00:02:41,000 --> 00:02:42,000
we can pull in homeowner,
66
00:02:42,000 --> 00:02:46,000
which remember, is a binary yes or no outcome,
67
00:02:46,000 --> 00:02:49,000
that means it's categorical, not continuous.
68
00:02:49,000 --> 00:02:50,000
And for 'explained by,'
69
00:02:50,000 --> 00:02:52,000
this is where we just drop in any fields
70
00:02:52,000 --> 00:02:56,000
that we think might impact the probability of a customer
71
00:02:56,000 --> 00:02:57,000
being a homeowner.
72
00:02:57,000 --> 00:02:58,000
So for instance,
73
00:02:58,000 --> 00:03:01,000
we might think that income has something to do with it,
74
00:03:01,000 --> 00:03:04,000
so we could pull in annual income,
75
00:03:04,000 --> 00:03:09,000
perhaps education is a factor as well, so education level,
76
00:03:09,000 --> 00:03:12,000
whether or not a customer is a parent,
77
00:03:12,000 --> 00:03:16,000
maybe related to whether or not they own a home.
78
00:03:16,000 --> 00:03:18,000
Similar story with marital status,
79
00:03:18,000 --> 00:03:20,000
we could pull that field in as well.
80
00:03:20,000 --> 00:03:22,000
And last but not least,
81
00:03:22,000 --> 00:03:25,000
maybe occupation has something to do with it too.
82
00:03:26,000 --> 00:03:30,000
That seems like a pretty good representative set of factors,
83
00:03:30,000 --> 00:03:31,000
but feel free to add other fields
84
00:03:31,000 --> 00:03:35,000
that you think might be correlated with home ownership.
85
00:03:35,000 --> 00:03:37,000
So now that we've built out our fields
86
00:03:37,000 --> 00:03:39,000
in our key influencers visual here,
87
00:03:39,000 --> 00:03:40,000
you see that we can toggle
88
00:03:40,000 --> 00:03:44,000
between the categorical outcomes 'yes' or 'no.'
89
00:03:44,000 --> 00:03:47,000
So in this case, we're interested in understanding
90
00:03:47,000 --> 00:03:49,000
what influences homeowner to be yes,
91
00:03:49,000 --> 00:03:51,000
so we can switch that toggle.
92
00:03:51,000 --> 00:03:52,000
And here we're seeing
93
00:03:52,000 --> 00:03:56,000
the statistically significant drivers of ownership,
94
00:03:56,000 --> 00:04:01,000
marital status, is married, is parent, is yes,
95
00:04:01,000 --> 00:04:04,000
annual income is between 30,000 and 120,000,
96
00:04:04,000 --> 00:04:06,000
education level is graduate degree,
97
00:04:06,000 --> 00:04:10,000
occupation is management, and so on and so forth.
98
00:04:10,000 --> 00:04:13,000
And we also see the strength of this impact as well.
99
00:04:13,000 --> 00:04:14,000
So in other words,
100
00:04:14,000 --> 00:04:17,000
when a customer is married all else equal,
101
00:04:17,000 --> 00:04:20,000
the likelihood of them owning a home
102
00:04:20,000 --> 00:04:23,000
increases by 1.62 times.
103
00:04:23,000 --> 00:04:27,000
And if we click any of these factors and expand our visual,
104
00:04:27,000 --> 00:04:28,000
now we see some additional detail
105
00:04:28,000 --> 00:04:31,000
here in the column chart on the right.
106
00:04:31,000 --> 00:04:33,000
And this is showing the percent of homeowners
107
00:04:33,000 --> 00:04:36,000
for each category within a given field,
108
00:04:36,000 --> 00:04:38,000
in this case, marital status.
109
00:04:38,000 --> 00:04:40,000
And what this red line shows
110
00:04:40,000 --> 00:04:43,000
is the average excluding the selected category,
111
00:04:43,000 --> 00:04:46,000
which in this case, is just marital status equals single.
112
00:04:46,000 --> 00:04:51,000
And this tells us that 51.12% of single customers own a home
113
00:04:52,000 --> 00:04:57,000
compared to 81.5% of married customers.
114
00:04:57,000 --> 00:04:59,000
And if we look at another example,
115
00:04:59,000 --> 00:05:01,000
this is the 'is parent' field,
116
00:05:01,000 --> 00:05:05,000
we see that 47.93% of customers who are not parents
117
00:05:05,000 --> 00:05:10,000
own a home, versus 75.18% who are parents.
118
00:05:10,000 --> 00:05:11,000
Let's look at one more example here
119
00:05:11,000 --> 00:05:13,000
with a couple additional options.
120
00:05:13,000 --> 00:05:14,000
In this case,
121
00:05:14,000 --> 00:05:18,000
customers who fall in this 30 to 120,000 income range
122
00:05:18,000 --> 00:05:21,000
own homes at a rate of 72%
123
00:05:21,000 --> 00:05:24,000
compared to the average of all the other categories
124
00:05:24,000 --> 00:05:26,000
at 58.31%.
125
00:05:26,000 --> 00:05:30,000
So these strength values, the 1.62, the 1.59, the 1.23
126
00:05:32,000 --> 00:05:33,000
are essentially derived
127
00:05:33,000 --> 00:05:36,000
by comparing the difference in home ownership
128
00:05:36,000 --> 00:05:37,000
for a specific category
129
00:05:37,000 --> 00:05:40,000
compared to the average of all the others.
130
00:05:40,000 --> 00:05:41,000
Now, what's also helpful
131
00:05:41,000 --> 00:05:43,000
is that if you hover over these buttons,
132
00:05:43,000 --> 00:05:45,000
it gives you some additional information,
133
00:05:45,000 --> 00:05:48,000
including how much of the data set is represented
134
00:05:48,000 --> 00:05:51,000
by this particular field or factor.
135
00:05:51,000 --> 00:05:53,000
So in this case, married customers
136
00:05:53,000 --> 00:05:57,000
represent approximately 54.49% of the data.
137
00:05:57,000 --> 00:05:59,000
And we can actually show that too
138
00:05:59,000 --> 00:06:01,000
using some formatting settings.
139
00:06:01,000 --> 00:06:05,000
If we drill into analysis here, we can toggle on counts,
140
00:06:05,000 --> 00:06:07,000
and that's gonna show this kind of subtle ring
141
00:06:07,000 --> 00:06:09,000
around the outside of the bubble
142
00:06:09,000 --> 00:06:12,000
that represents that same percentage.
143
00:06:12,000 --> 00:06:14,000
Now, by default, we're sorting by impact,
144
00:06:14,000 --> 00:06:15,000
which I think makes the most sense,
145
00:06:15,000 --> 00:06:18,000
but you can also sort by count as well.
146
00:06:18,000 --> 00:06:20,000
So that's a quick summary
147
00:06:20,000 --> 00:06:22,000
of the key influencers part of this visual.
148
00:06:22,000 --> 00:06:23,000
What's really cool
149
00:06:23,000 --> 00:06:26,000
is that you can also toggle over to Top segments
150
00:06:26,000 --> 00:06:28,000
and explore these customer segments
151
00:06:28,000 --> 00:06:30,000
that Power BI has identified.
152
00:06:30,000 --> 00:06:32,000
So this is Segment 1
153
00:06:32,000 --> 00:06:34,000
which is defined by these characteristics,
154
00:06:34,000 --> 00:06:38,000
education level is bachelors, is parent, is yes,
155
00:06:38,000 --> 00:06:40,000
and marital status, is married.
156
00:06:40,000 --> 00:06:42,000
And within this entire segment,
157
00:06:42,000 --> 00:06:46,000
91% of those customers are homeowners.
158
00:06:46,000 --> 00:06:49,000
And it's telling us that's 23 percentage points higher
159
00:06:49,000 --> 00:06:52,000
than the average of 67.6.
160
00:06:52,000 --> 00:06:57,000
It also tells us that there are 2,552 data points or records
161
00:06:57,000 --> 00:06:59,000
that this segment represents,
162
00:06:59,000 --> 00:07:02,000
which is 14.1% of the data as a whole.
163
00:07:02,000 --> 00:07:06,000
So under the hood, Power BI is running a cluster analysis.
164
00:07:06,000 --> 00:07:08,000
It's combining all of these factors
165
00:07:08,000 --> 00:07:10,000
that we've determined here,
166
00:07:10,000 --> 00:07:11,000
and it's combining them
167
00:07:11,000 --> 00:07:14,000
into specific customer segments or profiles
168
00:07:14,000 --> 00:07:18,000
that together are highly predictive of home ownership,
169
00:07:18,000 --> 00:07:21,000
just incredibly powerful stuff that we're able to do
170
00:07:21,000 --> 00:07:22,000
with just a few clicks.
171
00:07:22,000 --> 00:07:24,000
Now I wanna show you one more example here,
172
00:07:24,000 --> 00:07:28,000
I'm actually gonna keep this version,
173
00:07:28,000 --> 00:07:30,000
and I'm gonna add another instance
174
00:07:30,000 --> 00:07:32,000
of the key influencer visual.
175
00:07:33,000 --> 00:07:35,000
And this time, I wanna show you a continuous outcome
176
00:07:35,000 --> 00:07:37,000
instead of categorical.
177
00:07:37,000 --> 00:07:38,000
So in this case,
178
00:07:38,000 --> 00:07:42,000
my analyze metric will be product price, right here.
179
00:07:45,000 --> 00:07:48,000
And I want to explain product price
180
00:07:48,000 --> 00:07:53,000
by product cost, like so.
181
00:07:53,000 --> 00:07:56,000
And I recognize, this example is a little bit contrived,
182
00:07:56,000 --> 00:07:58,000
but it does help show some of the differences
183
00:07:58,000 --> 00:08:01,000
when you're comparing categorical outcomes
184
00:08:01,000 --> 00:08:03,000
versus continuous outcomes like this.
185
00:08:03,000 --> 00:08:04,000
So now, instead of selecting
186
00:08:04,000 --> 00:08:06,000
different categorical outcomes here,
187
00:08:06,000 --> 00:08:09,000
we're saying what influences product price
188
00:08:09,000 --> 00:08:11,000
to increase or decrease?
189
00:08:11,000 --> 00:08:13,000
And you can see in the Analysis type here,
190
00:08:13,000 --> 00:08:15,000
it's defaulted to Continuous.
191
00:08:15,000 --> 00:08:17,000
If we change that to Categorical,
192
00:08:17,000 --> 00:08:19,000
it wouldn't really make much sense, right?
193
00:08:19,000 --> 00:08:23,000
We'd see all of the different potential product prices,
194
00:08:23,000 --> 00:08:25,000
which in this case, is just meaningless.
195
00:08:25,000 --> 00:08:27,000
So continuous is the right move here.
196
00:08:27,000 --> 00:08:28,000
And you'll notice now
197
00:08:28,000 --> 00:08:30,000
that the visuals look a little bit different.
198
00:08:30,000 --> 00:08:32,000
Now we see a linear regression.
199
00:08:32,000 --> 00:08:34,000
We notice a very strong correlation
200
00:08:34,000 --> 00:08:37,000
and positive relationship between these fields,
201
00:08:37,000 --> 00:08:39,000
which isn't surprising at all.
202
00:08:39,000 --> 00:08:41,000
Generally the more product costs,
203
00:08:41,000 --> 00:08:43,000
the higher the retail price will be.
204
00:08:43,000 --> 00:08:46,000
But in this case, the slope of this line
205
00:08:46,000 --> 00:08:47,000
is how we derive the strength
206
00:08:47,000 --> 00:08:50,000
of these influencers or these factors.
207
00:08:50,000 --> 00:08:51,000
So Power BI is telling us
208
00:08:51,000 --> 00:08:54,000
when the sum of product cost goes up $516.73,
209
00:08:57,000 --> 00:09:02,000
the average product retail price increases by $865.
210
00:09:02,000 --> 00:09:03,000
And last thing I wanna show you,
211
00:09:03,000 --> 00:09:04,000
this is where we can play
212
00:09:04,000 --> 00:09:07,000
with this 'expand by' option as well.
213
00:09:07,000 --> 00:09:08,000
So note that we're breaking down the data
214
00:09:08,000 --> 00:09:12,000
at the product name here in this scatter plot,
215
00:09:12,000 --> 00:09:16,000
but if we expand by a higher level product field
216
00:09:16,000 --> 00:09:19,000
like subcategory, we're gonna see an error at first,
217
00:09:19,000 --> 00:09:21,000
because it says the field in our Analyze pane,
218
00:09:21,000 --> 00:09:24,000
product price is not summarized.
219
00:09:24,000 --> 00:09:27,000
So it tells us to either summarize that value
220
00:09:27,000 --> 00:09:29,000
or remove the 'expand by' fields.
221
00:09:29,000 --> 00:09:32,000
So instead of product price, what we can do
222
00:09:32,000 --> 00:09:36,000
is pull in a summarized field that we've calculated,
223
00:09:36,000 --> 00:09:38,000
like average retail price,
224
00:09:38,000 --> 00:09:41,000
and now we're gonna see that same linear regression,
225
00:09:41,000 --> 00:09:44,000
except now each point in our scatter plot
226
00:09:44,000 --> 00:09:47,000
represents a product subcategory,
227
00:09:47,000 --> 00:09:51,000
road bikes, mountain bikes, touring bikes, and so on.
228
00:09:51,000 --> 00:09:54,000
So obviously, a lot to digest here,
229
00:09:54,000 --> 00:09:55,000
you can go much, much deeper
230
00:09:55,000 --> 00:09:57,000
with these key influencer visuals,
231
00:09:57,000 --> 00:09:59,000
but hopefully that quick overview
232
00:09:59,000 --> 00:10:01,000
helps you understand how these are working
233
00:10:01,000 --> 00:10:03,000
and exactly how powerful they are.
18152
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.