subtitlecat.com

All language subtitles for 001 A Practical Example What You Will Learn in This Course_en

Afrikaans

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bengali

Bosnian

Bulgarian

Catalan

Cebuano

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Filipino

Finnish

French Download

Frisian

Galician

Georgian

German

Greek

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish (Kurmanji)

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mongolian

Myanmar (Burmese)

Nepali

Norwegian

Pashto

Persian Download

Polish

Portuguese

Punjabi

Romanian

Russian

Samoan

Scots Gaelic

Serbian

Sesotho

Shona

Sindhi

Sinhala

Slovak

Slovenian

Somali

Spanish

Sundanese

Swahili

Swedish

Tajik

Tamil

Telugu

Thai

Turkish

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Xhosa

Yiddish

Yoruba

Zulu

Odia (Oriya)

Kinyarwanda

Turkmen

Tatar

Uyghur

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,280 --> 00:00:08,390 Hello, and a very warm welcome to the data scientist course to many people, data science and 2 00:00:08,390 --> 00:00:14,600 machine learning feel like unknown territory, something too abstract and complex to show you. 3 00:00:14,600 --> 00:00:15,860 That isn't the case. 4 00:00:15,890 --> 00:00:21,680 I would like to start by giving you a hands on example of what to expect from this course and 5 00:00:21,680 --> 00:00:25,160 what you will be able to do on your own only a few hours into our training. 6 00:00:26,110 --> 00:00:32,560 There are three key words that will come up a lot during the lecture data algorithm and insight. 7 00:00:33,500 --> 00:00:40,520 Here we have data collected from the customers of a shop, 30 observations in total, each observation 8 00:00:40,520 --> 00:00:45,030 represents a client who shared their customer satisfaction and brand loyalty. 9 00:00:45,650 --> 00:00:51,680 Let's suppose the owners of the shop hired our consultancy firm to analyze customer behavior. 10 00:00:52,820 --> 00:00:58,490 Dividing the shop's customer base into groups of individuals with similar traits is a great way to reduce complexity 11 00:00:58,490 --> 00:01:03,630 and come up with ideas on how to serve these customer groups better. 12 00:01:03,680 --> 00:01:06,680 And, of course, when their business in the long run. 13 00:01:07,730 --> 00:01:10,880 To do that, we will have to apply machine learning. 14 00:01:11,950 --> 00:01:12,490 Ready? 15 00:01:13,000 --> 00:01:13,670 Here we go. 16 00:01:14,110 --> 00:01:18,250 The data set that we've got is already loaded in the variable data. 17 00:01:18,790 --> 00:01:24,160 A good preliminary step of most analyses is to visualize the data and examine it. 18 00:01:24,640 --> 00:01:27,550 One of the better tools to do that is a scatterplot. 19 00:01:28,710 --> 00:01:31,020 How many groups of points can you see here? 20 00:01:32,040 --> 00:01:38,670 There are two groups standing out in data science, we would normally call these groups clusters 21 00:01:38,670 --> 00:01:43,070 so two clusters can be identified instantly with no machine learning whatsoever. 22 00:01:43,620 --> 00:01:50,400 One represents people with low loyalty and low satisfaction and the other one containing all the rest. 23 00:01:51,650 --> 00:01:57,740 Our preliminary visual examination shows us that there are some insights we can draw for sure, but 24 00:01:57,920 --> 00:02:00,410 let's take a more scientific approach. 25 00:02:01,490 --> 00:02:06,140 Most of the times in data science, you would want to standardize your data. 26 00:02:07,760 --> 00:02:14,810 Next, we will perform some unsupervised machine learning, more specifically cluster analysis using 27 00:02:14,810 --> 00:02:19,190 the popular K means algorithm, we will identify four clusters. 28 00:02:19,670 --> 00:02:24,320 The code, which we will examine in detail later on in the course, looks like the following. 29 00:02:25,460 --> 00:02:26,780 And we are done. 30 00:02:27,690 --> 00:02:32,820 I can now plot the data using the predicted clusters as colors of the new scatterplot. 31 00:02:34,220 --> 00:02:41,510 We've got the same scatterplot, but with four clusters, our customers have been segmented from here, 32 00:02:41,660 --> 00:02:45,380 we can distinguish four types of customers and actually name them. 33 00:02:46,100 --> 00:02:53,090 The ones with the low satisfaction and low loyalty will be called alienated, those with high satisfaction 34 00:02:53,090 --> 00:02:54,140 and high loyalty. 35 00:02:54,260 --> 00:03:01,850 Our fans, those with low satisfaction and high loyalty are supporters and the last ones that are neutral 36 00:03:01,880 --> 00:03:04,310 or disloyal but have a high satisfaction. 37 00:03:04,520 --> 00:03:05,990 These are roamers. 38 00:03:07,060 --> 00:03:13,120 Using just a few lines of code, we've reached a remarkable result we have segmented our customers in 39 00:03:13,120 --> 00:03:14,500 four different groups. 40 00:03:14,710 --> 00:03:18,610 We've applied an algorithm on our data to reach an insight. 41 00:03:19,750 --> 00:03:27,120 Naturally, we must analyze what we see data science is about storytelling and making sense of numbers. 42 00:03:27,760 --> 00:03:30,580 We have four groups, but only one of them is favorable. 43 00:03:30,760 --> 00:03:34,690 The fans cluster analysis indicates the problem. 44 00:03:35,170 --> 00:03:38,860 Some customers are dissatisfied, others are disloyal. 45 00:03:39,160 --> 00:03:43,300 However, we must figure out how to solve the problem ourselves. 46 00:03:44,580 --> 00:03:48,130 What are some ideas a data scientist and management will come up with? 47 00:03:48,660 --> 00:03:54,430 It makes sense to focus our efforts to turn supporters into fans by improving their shopping experience. 48 00:03:55,080 --> 00:04:00,890 Normally, we would have to dig deeper to find the drivers of dissatisfaction for these customers. 49 00:04:01,470 --> 00:04:06,840 Maybe it is long queues or unfriendly staff or perhaps high prices. 50 00:04:07,300 --> 00:04:13,320 Whatever the reason, we must take actionable steps to fix the issue and make our supporters happier. 51 00:04:14,790 --> 00:04:17,680 Simultaneously, we can do something else. 52 00:04:18,120 --> 00:04:22,560 We can turn the roamers into fans by increasing their brand loyalty. 53 00:04:23,570 --> 00:04:29,540 Loyalty cards, gifts, personalized discounts, vouchers and raffles are different strategies used 54 00:04:29,540 --> 00:04:32,150 to make such clients loyal in the long run. 55 00:04:32,750 --> 00:04:33,260 Great. 56 00:04:34,540 --> 00:04:40,780 Please bear in mind that in this exercise, we missed a few steps along the way, typing code step by 57 00:04:40,780 --> 00:04:46,510 step, creating a program, analyzing a heat map and finding the optimal number of clusters. 58 00:04:47,080 --> 00:04:51,180 However, these are all topics we will address later on in the course. 59 00:04:51,910 --> 00:04:58,060 So let's begin acquiring the knowledge needed step by step until we are ready to gain insights from 60 00:04:58,060 --> 00:05:05,080 larger data sets with various algorithms so we can turn all types of data into actionable insights. 6319