Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:01,040 --> 00:00:02,109
[Autogenerated] Let's dive into a little
1
00:00:02,109 --> 00:00:04,049
more detail on each of these components or
2
00:00:04,049 --> 00:00:06,179
future engineering, starting with feature
3
00:00:06,179 --> 00:00:08,390
selection. Feature selection involves
4
00:00:08,390 --> 00:00:10,660
choosing the best subset from within. An
5
00:00:10,660 --> 00:00:13,380
existing set of features are ex variables
6
00:00:13,380 --> 00:00:15,910
without substantially transforming or
7
00:00:15,910 --> 00:00:18,609
changing the features in any manner when
8
00:00:18,609 --> 00:00:20,120
you're building and training on machine
9
00:00:20,120 --> 00:00:22,379
learning model. When would you use feature
10
00:00:22,379 --> 00:00:24,989
selection? Let's say you have many ex
11
00:00:24,989 --> 00:00:27,500
variables present in your data, and not
12
00:00:27,500 --> 00:00:29,300
all of these X variables contained
13
00:00:29,300 --> 00:00:31,539
information. During exploratory data
14
00:00:31,539 --> 00:00:33,679
analysis, you found that most off your
15
00:00:33,679 --> 00:00:36,380
features are ex variables contained little
16
00:00:36,380 --> 00:00:38,659
information. They're not relieved relevant
17
00:00:38,659 --> 00:00:41,100
to your problem. But there are a few
18
00:00:41,100 --> 00:00:43,890
features. Are ex variables that are very
19
00:00:43,890 --> 00:00:46,240
meaningful and have high predictive power,
20
00:00:46,240 --> 00:00:47,780
and you find that these meaningful
21
00:00:47,780 --> 00:00:50,740
variables are independent of each other.
22
00:00:50,740 --> 00:00:53,850
You will then use feta selection toe only.
23
00:00:53,850 --> 00:00:56,920
Choose those meaningful variables to train
24
00:00:56,920 --> 00:00:59,359
your model. Feature. Selection techniques
25
00:00:59,359 --> 00:01:01,270
can be divided into three broad
26
00:01:01,270 --> 00:01:03,420
categories. The first of these are filter
27
00:01:03,420 --> 00:01:05,739
method, where you apply a statistical
28
00:01:05,739 --> 00:01:07,439
technique to extract the most relevant
29
00:01:07,439 --> 00:01:09,890
features. You can use embedded methods
30
00:01:09,890 --> 00:01:11,700
where you build a machine learning Marty
31
00:01:11,700 --> 00:01:13,530
that assigns importance to the different
32
00:01:13,530 --> 00:01:15,560
features and select those features that
33
00:01:15,560 --> 00:01:17,719
are the most important. And the third
34
00:01:17,719 --> 00:01:20,079
technique that you can apply is rapper
35
00:01:20,079 --> 00:01:22,439
methods rapper methods like somewhere
36
00:01:22,439 --> 00:01:24,189
between filter method and embedded
37
00:01:24,189 --> 00:01:26,730
methods. Here you train a number of
38
00:01:26,730 --> 00:01:28,909
different candidate models on a subset of
39
00:01:28,909 --> 00:01:31,219
features and choose that subset of
40
00:01:31,219 --> 00:01:34,340
features that produces the best model.
41
00:01:34,340 --> 00:01:36,439
Filter methods for future selection are
42
00:01:36,439 --> 00:01:38,879
very you applied statistical techniques to
43
00:01:38,879 --> 00:01:40,980
select those features that are the most
44
00:01:40,980 --> 00:01:43,739
relevant feature. Selection is completely
45
00:01:43,739 --> 00:01:45,840
independent off. Actually building and
46
00:01:45,840 --> 00:01:48,180
training the model with fatal metals,
47
00:01:48,180 --> 00:01:49,980
you'll use techniques such as sky Square.
48
00:01:49,980 --> 00:01:52,400
Analysis on over analysis or mutual
49
00:01:52,400 --> 00:01:55,810
information to determine board features
50
00:01:55,810 --> 00:01:57,599
are relevant to the target that you're
51
00:01:57,599 --> 00:01:59,920
trying to predict. Embedded method for
52
00:01:59,920 --> 00:02:02,069
feature selection involved training a
53
00:02:02,069 --> 00:02:05,349
machine learning model. Now not all models
54
00:02:05,349 --> 00:02:08,069
assigned importance measures to features
55
00:02:08,069 --> 00:02:09,860
Relevant features can be selected by
56
00:02:09,860 --> 00:02:11,830
training a machine learning algorithm,
57
00:02:11,830 --> 00:02:14,110
which under the hold will sort features by
58
00:02:14,110 --> 00:02:16,699
important and assigning importance or
59
00:02:16,699 --> 00:02:19,300
relevant score. Tow each feature. Examples
60
00:02:19,300 --> 00:02:21,610
of such models are last regression on
61
00:02:21,610 --> 00:02:24,400
decision Trees feature selection using
62
00:02:24,400 --> 00:02:26,639
embedded metadata embedded within the
63
00:02:26,639 --> 00:02:28,599
mortal training fees. That's why they're
64
00:02:28,599 --> 00:02:30,919
called embedded methods when you perform
65
00:02:30,919 --> 00:02:33,620
feature selection using rapper method you
66
00:02:33,620 --> 00:02:35,629
build a number of different candidate
67
00:02:35,629 --> 00:02:38,090
models on each of these candidate models
68
00:02:38,090 --> 00:02:40,639
are built on different subsets off your
69
00:02:40,639 --> 00:02:43,189
features. You then choose that fetus
70
00:02:43,189 --> 00:02:45,389
upset, which gives you the best model.
71
00:02:45,389 --> 00:02:47,240
Let's go back to the different components
72
00:02:47,240 --> 00:02:48,530
of feature engineering that people
73
00:02:48,530 --> 00:02:50,449
discussing earlier and let's move on to
74
00:02:50,449 --> 00:02:52,729
discussing feature learning. Feature
75
00:02:52,729 --> 00:02:54,629
learning is where you rely on machine
76
00:02:54,629 --> 00:02:56,909
learning algorithms rather than human
77
00:02:56,909 --> 00:02:59,629
experts to learn the best representation
78
00:02:59,629 --> 00:03:01,919
off. Complex leader. This is especially
79
00:03:01,919 --> 00:03:04,289
useful when you're working with data such
80
00:03:04,289 --> 00:03:06,930
as images or videos. Feature Learning is
81
00:03:06,930 --> 00:03:09,180
also often referred to as representation
82
00:03:09,180 --> 00:03:11,310
learning. Now you can have fetal learning
83
00:03:11,310 --> 00:03:13,360
techniques that are supervising nature
84
00:03:13,360 --> 00:03:15,449
when you're working with a label. Corpus
85
00:03:15,449 --> 00:03:18,310
off data. Neural networks are classic
86
00:03:18,310 --> 00:03:20,409
example. Off, supervise feature learning.
87
00:03:20,409 --> 00:03:22,389
Then you feed data into neural networks.
88
00:03:22,389 --> 00:03:24,460
You typically don't highlight significant
89
00:03:24,460 --> 00:03:26,370
features. You just feed all off the data
90
00:03:26,370 --> 00:03:29,689
in and neural networks find out what late
91
00:03:29,689 --> 00:03:32,240
and features are important or significant
92
00:03:32,240 --> 00:03:34,870
Supervise. Future learning is an extremely
93
00:03:34,870 --> 00:03:36,650
important technique because it greatly
94
00:03:36,650 --> 00:03:39,930
reduces the need for expert judgment. When
95
00:03:39,930 --> 00:03:42,009
the selection of your features rely on
96
00:03:42,009 --> 00:03:44,310
humans, that technique will simply not
97
00:03:44,310 --> 00:03:45,979
skill. It's much better to have an
98
00:03:45,979 --> 00:03:47,930
automated solution. Such a supervised
99
00:03:47,930 --> 00:03:50,310
feature learning and the fact that new
100
00:03:50,310 --> 00:03:52,330
neural networks can learn significant
101
00:03:52,330 --> 00:03:54,530
features in your data is an important
102
00:03:54,530 --> 00:03:56,560
difference between neural network and deep
103
00:03:56,560 --> 00:03:58,330
learning techniques and traditional
104
00:03:58,330 --> 00:04:00,090
machine learning based systems.
105
00:04:00,090 --> 00:04:02,759
Traditional ML systems rely on experts to
106
00:04:02,759 --> 00:04:05,539
decide what features to pay attention to.
107
00:04:05,539 --> 00:04:07,650
On the other hand of representation, ML
108
00:04:07,650 --> 00:04:09,840
based systems figure out by themselves.
109
00:04:09,840 --> 00:04:11,610
What features are important or
110
00:04:11,610 --> 00:04:14,520
significant. Neural networks are examples
111
00:04:14,520 --> 00:04:18,259
off representation MLB systems. One thing
112
00:04:18,259 --> 00:04:19,930
you'll see when you work in the real ball
113
00:04:19,930 --> 00:04:22,100
that it's really hard to get a label
114
00:04:22,100 --> 00:04:24,529
corpus. Most of the data out in the real
115
00:04:24,529 --> 00:04:27,129
world tends to be unlabeled, and
116
00:04:27,129 --> 00:04:28,560
thankfully, there are future learning
117
00:04:28,560 --> 00:04:30,689
techniques that can work with an unlabeled
118
00:04:30,689 --> 00:04:33,350
corpus as well. These techniques are for
119
00:04:33,350 --> 00:04:35,889
tow US unsupervised feature learning on
120
00:04:35,889 --> 00:04:38,029
surprise, indicating the absence of label
121
00:04:38,029 --> 00:04:40,600
data. In order to learn patterns and
122
00:04:40,600 --> 00:04:43,160
unlabeled data, you can apply clustering
123
00:04:43,160 --> 00:04:45,300
techniques. Clustering will allow you to
124
00:04:45,300 --> 00:04:47,199
find a logical groupings that exist in
125
00:04:47,199 --> 00:04:49,360
your data. If you're working with image
126
00:04:49,360 --> 00:04:51,170
data and unsupervised technique for
127
00:04:51,170 --> 00:04:53,370
feature learning is dictionary Learning
128
00:04:53,370 --> 00:04:55,790
Dictionary Learning learns sparse
129
00:04:55,790 --> 00:04:58,300
representation. Zoff dense features such
130
00:04:58,300 --> 00:05:00,350
as with images. If you're working with
131
00:05:00,350 --> 00:05:02,129
deep learning, specifically neural
132
00:05:02,129 --> 00:05:04,990
networks, you can use auto and quarters to
133
00:05:04,990 --> 00:05:07,620
extract latent significant representations
134
00:05:07,620 --> 00:05:10,089
off your data. Now you might have machine
135
00:05:10,089 --> 00:05:12,540
learning models that can't work directly
136
00:05:12,540 --> 00:05:14,620
with raw features, which worked better
137
00:05:14,620 --> 00:05:16,389
with derived features. And that's their
138
00:05:16,389 --> 00:05:18,660
feature. Extraction comes in feature
139
00:05:18,660 --> 00:05:20,800
extraction differs from feature selection
140
00:05:20,800 --> 00:05:22,949
that we discussed earlier in that the
141
00:05:22,949 --> 00:05:24,670
input features are fundamentally
142
00:05:24,670 --> 00:05:27,459
transformed into derived features. The
143
00:05:27,459 --> 00:05:29,790
delight features are often unrecognisable
144
00:05:29,790 --> 00:05:31,949
and may be hard to interpret. These
145
00:05:31,949 --> 00:05:34,079
derived features might represent patterns
146
00:05:34,079 --> 00:05:36,000
that exist in your data, which are not
147
00:05:36,000 --> 00:05:38,689
intuitively understandable. Feature
148
00:05:38,689 --> 00:05:41,129
extraction techniques exists for data off
149
00:05:41,129 --> 00:05:43,579
all kinds of your working with images. You
150
00:05:43,579 --> 00:05:45,910
can use key points and descriptors to
151
00:05:45,910 --> 00:05:48,639
represent interesting areas in your image.
152
00:05:48,639 --> 00:05:50,269
If you're working with simple numeric
153
00:05:50,269 --> 00:05:52,709
data, feature extraction might involve
154
00:05:52,709 --> 00:05:55,259
reorienting your data to be protected
155
00:05:55,259 --> 00:05:57,959
along new axes, such as what we do when we
156
00:05:57,959 --> 00:05:59,949
find the principle components for me to
157
00:05:59,949 --> 00:06:02,800
seize or don't feel working with text data
158
00:06:02,800 --> 00:06:05,129
and natural language processing, you might
159
00:06:05,129 --> 00:06:07,160
choose to represent the worlds within your
160
00:06:07,160 --> 00:06:09,889
document using their DF idea of scores.
161
00:06:09,889 --> 00:06:11,949
The stands for Tom Frequency inverse
162
00:06:11,949 --> 00:06:14,269
document frequency and this is a score
163
00:06:14,269 --> 00:06:16,259
that represents how significant a
164
00:06:16,259 --> 00:06:17,910
particular word is in a particular
165
00:06:17,910 --> 00:06:21,649
document and across the entire corpus. Now
166
00:06:21,649 --> 00:06:23,459
it so happens that when you perform
167
00:06:23,459 --> 00:06:26,139
feature extraction, this also leads to
168
00:06:26,139 --> 00:06:29,009
dimensionality reduction. It reduces the
169
00:06:29,009 --> 00:06:31,079
number of dimensions which are required to
170
00:06:31,079 --> 00:06:33,649
express their data. So many off. The
171
00:06:33,649 --> 00:06:35,060
feature extraction techniques that we
172
00:06:35,060 --> 00:06:36,879
discussed earlier also happened to be
173
00:06:36,879 --> 00:06:38,970
dimensionality reduction techniques.
174
00:06:38,970 --> 00:06:40,939
However, when you use these techniques for
175
00:06:40,939 --> 00:06:44,269
feature extraction, the explicit objective
176
00:06:44,269 --> 00:06:46,870
is to the express your feature in a better
177
00:06:46,870 --> 00:06:51,000
form, not to reduce the number of ex columns or features.
13904
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.