Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:01,280 --> 00:00:08,390
Hello, and a very warm welcome to the data scientist course to many people, data science and
2
00:00:08,390 --> 00:00:14,600
machine learning feel like unknown territory, something too abstract and complex to show you.
3
00:00:14,600 --> 00:00:15,860
That isn't the case.
4
00:00:15,890 --> 00:00:21,680
I would like to start by giving you a hands on example of what to expect from this course and
5
00:00:21,680 --> 00:00:25,160
what you will be able to do on your own only a few hours into our training.
6
00:00:26,110 --> 00:00:32,560
There are three key words that will come up a lot during the lecture data algorithm and insight.
7
00:00:33,500 --> 00:00:40,520
Here we have data collected from the customers of a shop, 30 observations in total, each observation
8
00:00:40,520 --> 00:00:45,030
represents a client who shared their customer satisfaction and brand loyalty.
9
00:00:45,650 --> 00:00:51,680
Let's suppose the owners of the shop hired our consultancy firm to analyze customer behavior.
10
00:00:52,820 --> 00:00:58,490
Dividing the shop's customer base into groups of individuals with similar traits is a great way to reduce complexity
11
00:00:58,490 --> 00:01:03,630
and come up with ideas on how to serve these customer groups better.
12
00:01:03,680 --> 00:01:06,680
And, of course, when their business in the long run.
13
00:01:07,730 --> 00:01:10,880
To do that, we will have to apply machine learning.
14
00:01:11,950 --> 00:01:12,490
Ready?
15
00:01:13,000 --> 00:01:13,670
Here we go.
16
00:01:14,110 --> 00:01:18,250
The data set that we've got is already loaded in the variable data.
17
00:01:18,790 --> 00:01:24,160
A good preliminary step of most analyses is to visualize the data and examine it.
18
00:01:24,640 --> 00:01:27,550
One of the better tools to do that is a scatterplot.
19
00:01:28,710 --> 00:01:31,020
How many groups of points can you see here?
20
00:01:32,040 --> 00:01:38,670
There are two groups standing out in data science, we would normally call these groups clusters
21
00:01:38,670 --> 00:01:43,070
so two clusters can be identified instantly with no machine learning whatsoever.
22
00:01:43,620 --> 00:01:50,400
One represents people with low loyalty and low satisfaction and the other one containing all the rest.
23
00:01:51,650 --> 00:01:57,740
Our preliminary visual examination shows us that there are some insights we can draw for sure, but
24
00:01:57,920 --> 00:02:00,410
let's take a more scientific approach.
25
00:02:01,490 --> 00:02:06,140
Most of the times in data science, you would want to standardize your data.
26
00:02:07,760 --> 00:02:14,810
Next, we will perform some unsupervised machine learning, more specifically cluster analysis using
27
00:02:14,810 --> 00:02:19,190
the popular K means algorithm, we will identify four clusters.
28
00:02:19,670 --> 00:02:24,320
The code, which we will examine in detail later on in the course, looks like the following.
29
00:02:25,460 --> 00:02:26,780
And we are done.
30
00:02:27,690 --> 00:02:32,820
I can now plot the data using the predicted clusters as colors of the new scatterplot.
31
00:02:34,220 --> 00:02:41,510
We've got the same scatterplot, but with four clusters, our customers have been segmented from here,
32
00:02:41,660 --> 00:02:45,380
we can distinguish four types of customers and actually name them.
33
00:02:46,100 --> 00:02:53,090
The ones with the low satisfaction and low loyalty will be called alienated, those with high satisfaction
34
00:02:53,090 --> 00:02:54,140
and high loyalty.
35
00:02:54,260 --> 00:03:01,850
Our fans, those with low satisfaction and high loyalty are supporters and the last ones that are neutral
36
00:03:01,880 --> 00:03:04,310
or disloyal but have a high satisfaction.
37
00:03:04,520 --> 00:03:05,990
These are roamers.
38
00:03:07,060 --> 00:03:13,120
Using just a few lines of code, we've reached a remarkable result we have segmented our customers in
39
00:03:13,120 --> 00:03:14,500
four different groups.
40
00:03:14,710 --> 00:03:18,610
We've applied an algorithm on our data to reach an insight.
41
00:03:19,750 --> 00:03:27,120
Naturally, we must analyze what we see data science is about storytelling and making sense of numbers.
42
00:03:27,760 --> 00:03:30,580
We have four groups, but only one of them is favorable.
43
00:03:30,760 --> 00:03:34,690
The fans cluster analysis indicates the problem.
44
00:03:35,170 --> 00:03:38,860
Some customers are dissatisfied, others are disloyal.
45
00:03:39,160 --> 00:03:43,300
However, we must figure out how to solve the problem ourselves.
46
00:03:44,580 --> 00:03:48,130
What are some ideas a data scientist and management will come up with?
47
00:03:48,660 --> 00:03:54,430
It makes sense to focus our efforts to turn supporters into fans by improving their shopping experience.
48
00:03:55,080 --> 00:04:00,890
Normally, we would have to dig deeper to find the drivers of dissatisfaction for these customers.
49
00:04:01,470 --> 00:04:06,840
Maybe it is long queues or unfriendly staff or perhaps high prices.
50
00:04:07,300 --> 00:04:13,320
Whatever the reason, we must take actionable steps to fix the issue and make our supporters happier.
51
00:04:14,790 --> 00:04:17,680
Simultaneously, we can do something else.
52
00:04:18,120 --> 00:04:22,560
We can turn the roamers into fans by increasing their brand loyalty.
53
00:04:23,570 --> 00:04:29,540
Loyalty cards, gifts, personalized discounts, vouchers and raffles are different strategies used
54
00:04:29,540 --> 00:04:32,150
to make such clients loyal in the long run.
55
00:04:32,750 --> 00:04:33,260
Great.
56
00:04:34,540 --> 00:04:40,780
Please bear in mind that in this exercise, we missed a few steps along the way, typing code step by
57
00:04:40,780 --> 00:04:46,510
step, creating a program, analyzing a heat map and finding the optimal number of clusters.
58
00:04:47,080 --> 00:04:51,180
However, these are all topics we will address later on in the course.
59
00:04:51,910 --> 00:04:58,060
So let's begin acquiring the knowledge needed step by step until we are ready to gain insights from
60
00:04:58,060 --> 00:05:05,080
larger data sets with various algorithms so we can turn all types of data into actionable insights.
6319
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.