Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:01,040 --> 00:00:02,080
[Autogenerated] before we talk about
1
00:00:02,080 --> 00:00:04,570
future engineering, let's talk about what
2
00:00:04,570 --> 00:00:06,559
features and labels mean when you're
3
00:00:06,559 --> 00:00:07,870
building and training your machine
4
00:00:07,870 --> 00:00:09,740
learning. Marty, if you're taking this
5
00:00:09,740 --> 00:00:11,099
course, you've been introduced. The
6
00:00:11,099 --> 00:00:13,070
concept of machine landing a machine
7
00:00:13,070 --> 00:00:15,599
learning algorithm is an algorithm that is
8
00:00:15,599 --> 00:00:18,329
able to learn from data. It's able to look
9
00:00:18,329 --> 00:00:20,989
at data and find patterns and use these
10
00:00:20,989 --> 00:00:23,390
patterns to make analysis. Machine
11
00:00:23,390 --> 00:00:25,289
learning models have the ability to work
12
00:00:25,289 --> 00:00:28,429
with a huge base of data and find patterns
13
00:00:28,429 --> 00:00:30,519
within this data. These are patterns that
14
00:00:30,519 --> 00:00:32,990
are not easily discoverable using, say,
15
00:00:32,990 --> 00:00:34,609
exploratory data analysis or
16
00:00:34,609 --> 00:00:37,000
visualizations. One thing's patterns have
17
00:00:37,000 --> 00:00:38,530
been determined. Machine learning
18
00:00:38,530 --> 00:00:40,710
algorithms can then be used for
19
00:00:40,710 --> 00:00:43,189
prediction. Machine learning models, once
20
00:00:43,189 --> 00:00:45,549
it's learned from your data, are capable
21
00:00:45,549 --> 00:00:48,100
of making intelligent decisions. No
22
00:00:48,100 --> 00:00:50,250
machine learning is a vast field with many
23
00:00:50,250 --> 00:00:53,140
applications, but at its core, machine
24
00:00:53,140 --> 00:00:55,259
learning problems can be divided into four
25
00:00:55,259 --> 00:00:57,030
broad categories. The first is
26
00:00:57,030 --> 00:01:00,689
classifications user model, tow, classify
27
00:01:00,689 --> 00:01:04,500
instances, good or bad girl or boy cat or
28
00:01:04,500 --> 00:01:07,159
dog classification. Models are used to
29
00:01:07,159 --> 00:01:10,049
predict classes or categories. If you want
30
00:01:10,049 --> 00:01:13,099
to predict a continuous numeric value, you
31
00:01:13,099 --> 00:01:15,400
lose a regression model regression
32
00:01:15,400 --> 00:01:17,140
analysis of what you'll do to predict,
33
00:01:17,140 --> 00:01:19,069
say, the mileage of an automobile, the
34
00:01:19,069 --> 00:01:22,000
price off a stock or a home. If you have a
35
00:01:22,000 --> 00:01:24,849
large corpus off data and you want to find
36
00:01:24,849 --> 00:01:26,859
logically, groupings are patterns that
37
00:01:26,859 --> 00:01:28,980
exist in your data. You can apply a
38
00:01:28,980 --> 00:01:31,510
clustering model Clustering model tries to
39
00:01:31,510 --> 00:01:33,400
bring together those data points in a
40
00:01:33,400 --> 00:01:35,780
single cluster, which are similar to one
41
00:01:35,780 --> 00:01:38,450
another. And finally, if the entities in
42
00:01:38,450 --> 00:01:40,549
your data have many characteristics are
43
00:01:40,549 --> 00:01:42,450
features and you want to find which
44
00:01:42,450 --> 00:01:44,670
features are important, extract latent
45
00:01:44,670 --> 00:01:46,829
features from your data. You'll apply
46
00:01:46,829 --> 00:01:49,790
dimensionality reduction as a student of
47
00:01:49,790 --> 00:01:51,750
machine learning. These are the 1st 4
48
00:01:51,750 --> 00:01:53,349
broad categories of machine learning
49
00:01:53,349 --> 00:01:55,250
techniques that you'll encounter. Let's
50
00:01:55,250 --> 00:01:57,459
understand what machine learning is by
51
00:01:57,459 --> 00:01:59,670
taking an example off classifications. You
52
00:01:59,670 --> 00:02:01,810
want to determine whether wheels are fish
53
00:02:01,810 --> 00:02:04,670
are mammals. Now you know that wheels are
54
00:02:04,670 --> 00:02:07,239
members of the infra order, said Asia,
55
00:02:07,239 --> 00:02:09,580
which indicates that they're mammals. But
56
00:02:09,580 --> 00:02:11,080
on the other hand, if you look at the
57
00:02:11,080 --> 00:02:13,370
character, the sticks off a real they look
58
00:02:13,370 --> 00:02:15,949
like fish. They swim like fish. They move
59
00:02:15,949 --> 00:02:18,280
it fish, they live it fish in the sea.
60
00:02:18,280 --> 00:02:21,340
They could be fish Now for your classifier
61
00:02:21,340 --> 00:02:23,610
you essentially want to be able to feed in
62
00:02:23,610 --> 00:02:26,789
the character the six off a veil into your
63
00:02:26,789 --> 00:02:28,979
machine learning markets. You have an ML
64
00:02:28,979 --> 00:02:31,020
based classifier, which has been trained
65
00:02:31,020 --> 00:02:33,979
on a huge corpus off data. Once it has
66
00:02:33,979 --> 00:02:35,840
been trained, you'll feed in the character
67
00:02:35,840 --> 00:02:37,810
the sticks off a well and hope for a
68
00:02:37,810 --> 00:02:40,129
prediction that is correct. If it's a
69
00:02:40,129 --> 00:02:41,870
robust, welcoming classifier it he
70
00:02:41,870 --> 00:02:45,180
correctly identify the wheel as a mammal.
71
00:02:45,180 --> 00:02:47,849
Now the question is, how did we get this
72
00:02:47,849 --> 00:02:50,270
ML base classifier trained on a corpus off
73
00:02:50,270 --> 00:02:53,280
data. Then you start off with a machine
74
00:02:53,280 --> 00:02:56,110
learning algorithm and you feed in a huge
75
00:02:56,110 --> 00:02:58,889
corpus off training data to train your
76
00:02:58,889 --> 00:03:00,900
machine learning algorithm. This
77
00:03:00,900 --> 00:03:02,930
classification algorithm will look through
78
00:03:02,930 --> 00:03:04,949
all of the samples present in the training
79
00:03:04,949 --> 00:03:07,460
data and try toe extract significant
80
00:03:07,460 --> 00:03:09,680
patterns, and this, in turn will give you
81
00:03:09,680 --> 00:03:12,900
an ML base classifier. A classifier is a
82
00:03:12,900 --> 00:03:16,259
fully trained model. The algorithm learns
83
00:03:16,259 --> 00:03:19,180
from data to give you a trained morning.
84
00:03:19,180 --> 00:03:21,949
That is your classifier. If you have a
85
00:03:21,949 --> 00:03:24,099
good model, you should be able to give it
86
00:03:24,099 --> 00:03:26,539
information and have it make predictions
87
00:03:26,539 --> 00:03:29,409
the characteristics off well that you feed
88
00:03:29,409 --> 00:03:31,139
into your machine learning model, whether
89
00:03:31,139 --> 00:03:33,680
for training or for prediction, is a refer
90
00:03:33,680 --> 00:03:36,050
to as the feature vector. These are the
91
00:03:36,050 --> 00:03:39,340
features off your instance at the other
92
00:03:39,340 --> 00:03:41,030
end the output off your model. The
93
00:03:41,030 --> 00:03:43,360
prediction that it makes whether it's an
94
00:03:43,360 --> 00:03:46,250
output category or a continuous of values
95
00:03:46,250 --> 00:03:48,840
such as a stock price, is referred to as
96
00:03:48,840 --> 00:03:51,180
the label. Now it's quite possible that
97
00:03:51,180 --> 00:03:53,069
you feed into your machine learning model
98
00:03:53,069 --> 00:03:55,479
entirely different characteristics off.
99
00:03:55,479 --> 00:03:57,840
Will you tell it that it moves like a fish
100
00:03:57,840 --> 00:03:59,969
and it looks like a fish Now, in such
101
00:03:59,969 --> 00:04:03,009
situations, your classifier is lightly toe
102
00:04:03,009 --> 00:04:05,289
indicate that the wheel is a fish, which
103
00:04:05,289 --> 00:04:08,360
is clearly wrong. What you fed in in your
104
00:04:08,360 --> 00:04:11,099
input feature vector are incorrectly
105
00:04:11,099 --> 00:04:13,580
specified features, and when the features
106
00:04:13,580 --> 00:04:15,889
that you feed into your model are not set
107
00:04:15,889 --> 00:04:18,279
up correctly, you're likely to get an
108
00:04:18,279 --> 00:04:20,939
incorrect prediction from your model. Your
109
00:04:20,939 --> 00:04:23,079
model is only as good as the features that
110
00:04:23,079 --> 00:04:25,550
you use for training the features that you
111
00:04:25,550 --> 00:04:27,810
used to train your model or also refer to
112
00:04:27,810 --> 00:04:31,129
US ex variables. X variables are the
113
00:04:31,129 --> 00:04:33,060
attributes that the machine learning
114
00:04:33,060 --> 00:04:36,779
algorithm focuses on the characteristics
115
00:04:36,779 --> 00:04:38,709
of the entities on which your training
116
00:04:38,709 --> 00:04:40,740
your model of the attributes are ex
117
00:04:40,740 --> 00:04:44,079
variables, and every data point is a list
118
00:04:44,079 --> 00:04:47,310
or vector off such ex variables, and this
119
00:04:47,310 --> 00:04:49,589
is what is together refer to as a feature
120
00:04:49,589 --> 00:04:52,350
vector does The important animal algorithm
121
00:04:52,350 --> 00:04:55,779
is a feature vector feature vector exactly
122
00:04:55,779 --> 00:04:57,939
the same thing as ex variables, and I'll
123
00:04:57,939 --> 00:05:00,089
use the storm's interchangeably throughout
124
00:05:00,089 --> 00:05:02,180
the scores. The output of your machine
125
00:05:02,180 --> 00:05:04,339
learning model, the predictions that it
126
00:05:04,339 --> 00:05:07,199
makes I refer to us. Why variables the
127
00:05:07,199 --> 00:05:09,100
attributes that the machine learning
128
00:05:09,100 --> 00:05:11,300
algorithm tries to predict our called
129
00:05:11,300 --> 00:05:14,519
labels, or by valuables once again, a
130
00:05:14,519 --> 00:05:17,389
loose the terms labels and by variables
131
00:05:17,389 --> 00:05:19,560
interchangeably, they mean the same thing.
132
00:05:19,560 --> 00:05:21,160
Another time that you can use for by
133
00:05:21,160 --> 00:05:24,689
variables are targets. Now, based on the
134
00:05:24,689 --> 00:05:26,389
kind of model that you're building, the
135
00:05:26,389 --> 00:05:28,699
labels can be off different types. If you
136
00:05:28,699 --> 00:05:31,850
have categorical or discreet label values,
137
00:05:31,850 --> 00:05:33,290
that is typically the output off a
138
00:05:33,290 --> 00:05:36,550
classification algorithm. Spam or ham,
139
00:05:36,550 --> 00:05:39,800
True or false? A, B, C or D. These are
140
00:05:39,800 --> 00:05:41,709
examples of categorically values or
141
00:05:41,709 --> 00:05:44,850
labels. Labels can also be numeric or
142
00:05:44,850 --> 00:05:47,389
continuous values. These air typically the
143
00:05:47,389 --> 00:05:49,839
output off what regression models such as
144
00:05:49,839 --> 00:05:51,420
the models that you use for price
145
00:05:51,420 --> 00:05:53,379
prediction. When you're working with
146
00:05:53,379 --> 00:05:55,230
continuous output from your machine
147
00:05:55,230 --> 00:05:57,740
learning Marley, you might call them by
148
00:05:57,740 --> 00:05:59,769
values or targets rather than labels,
149
00:05:59,769 --> 00:06:01,660
because labels have a very categorically
150
00:06:01,660 --> 00:06:03,819
field. Now that you understand the
151
00:06:03,819 --> 00:06:06,230
difference between features and labels,
152
00:06:06,230 --> 00:06:07,899
there is an important point that you need
153
00:06:07,899 --> 00:06:11,300
to remember garbage in garbage out. If the
154
00:06:11,300 --> 00:06:13,759
data that you feed into an ML model is off
155
00:06:13,759 --> 00:06:17,029
a poor quality, the model itself will be a
156
00:06:17,029 --> 00:06:21,000
poor model. Your mortal is only as good as her data.
12299
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.