subtitlecat.com

All language subtitles for 07 - Feature Selection, Feature Learning, and Feature Extraction.en

Afrikaans

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bengali

Bosnian

Bulgarian

Catalan

Cebuano

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Filipino

Finnish

French Download

Frisian

Galician

Georgian

German

Greek

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish (Kurmanji)

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mongolian

Myanmar (Burmese)

Nepali

Norwegian

Pashto

Persian

Polish

Portuguese

Punjabi

Romanian

Russian

Samoan

Scots Gaelic

Serbian

Sesotho

Shona

Sindhi

Sinhala

Slovak

Slovenian

Somali

Spanish

Sundanese

Swahili

Swedish

Tajik

Tamil

Telugu

Thai

Turkish

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Xhosa

Yiddish

Yoruba

Zulu

Odia (Oriya)

Kinyarwanda

Turkmen

Tatar

Uyghur

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:01,040 --> 00:00:02,109 [Autogenerated] Let's dive into a little 1 00:00:02,109 --> 00:00:04,049 more detail on each of these components or 2 00:00:04,049 --> 00:00:06,179 future engineering, starting with feature 3 00:00:06,179 --> 00:00:08,390 selection. Feature selection involves 4 00:00:08,390 --> 00:00:10,660 choosing the best subset from within. An 5 00:00:10,660 --> 00:00:13,380 existing set of features are ex variables 6 00:00:13,380 --> 00:00:15,910 without substantially transforming or 7 00:00:15,910 --> 00:00:18,609 changing the features in any manner when 8 00:00:18,609 --> 00:00:20,120 you're building and training on machine 9 00:00:20,120 --> 00:00:22,379 learning model. When would you use feature 10 00:00:22,379 --> 00:00:24,989 selection? Let's say you have many ex 11 00:00:24,989 --> 00:00:27,500 variables present in your data, and not 12 00:00:27,500 --> 00:00:29,300 all of these X variables contained 13 00:00:29,300 --> 00:00:31,539 information. During exploratory data 14 00:00:31,539 --> 00:00:33,679 analysis, you found that most off your 15 00:00:33,679 --> 00:00:36,380 features are ex variables contained little 16 00:00:36,380 --> 00:00:38,659 information. They're not relieved relevant 17 00:00:38,659 --> 00:00:41,100 to your problem. But there are a few 18 00:00:41,100 --> 00:00:43,890 features. Are ex variables that are very 19 00:00:43,890 --> 00:00:46,240 meaningful and have high predictive power, 20 00:00:46,240 --> 00:00:47,780 and you find that these meaningful 21 00:00:47,780 --> 00:00:50,740 variables are independent of each other. 22 00:00:50,740 --> 00:00:53,850 You will then use feta selection toe only. 23 00:00:53,850 --> 00:00:56,920 Choose those meaningful variables to train 24 00:00:56,920 --> 00:00:59,359 your model. Feature. Selection techniques 25 00:00:59,359 --> 00:01:01,270 can be divided into three broad 26 00:01:01,270 --> 00:01:03,420 categories. The first of these are filter 27 00:01:03,420 --> 00:01:05,739 method, where you apply a statistical 28 00:01:05,739 --> 00:01:07,439 technique to extract the most relevant 29 00:01:07,439 --> 00:01:09,890 features. You can use embedded methods 30 00:01:09,890 --> 00:01:11,700 where you build a machine learning Marty 31 00:01:11,700 --> 00:01:13,530 that assigns importance to the different 32 00:01:13,530 --> 00:01:15,560 features and select those features that 33 00:01:15,560 --> 00:01:17,719 are the most important. And the third 34 00:01:17,719 --> 00:01:20,079 technique that you can apply is rapper 35 00:01:20,079 --> 00:01:22,439 methods rapper methods like somewhere 36 00:01:22,439 --> 00:01:24,189 between filter method and embedded 37 00:01:24,189 --> 00:01:26,730 methods. Here you train a number of 38 00:01:26,730 --> 00:01:28,909 different candidate models on a subset of 39 00:01:28,909 --> 00:01:31,219 features and choose that subset of 40 00:01:31,219 --> 00:01:34,340 features that produces the best model. 41 00:01:34,340 --> 00:01:36,439 Filter methods for future selection are 42 00:01:36,439 --> 00:01:38,879 very you applied statistical techniques to 43 00:01:38,879 --> 00:01:40,980 select those features that are the most 44 00:01:40,980 --> 00:01:43,739 relevant feature. Selection is completely 45 00:01:43,739 --> 00:01:45,840 independent off. Actually building and 46 00:01:45,840 --> 00:01:48,180 training the model with fatal metals, 47 00:01:48,180 --> 00:01:49,980 you'll use techniques such as sky Square. 48 00:01:49,980 --> 00:01:52,400 Analysis on over analysis or mutual 49 00:01:52,400 --> 00:01:55,810 information to determine board features 50 00:01:55,810 --> 00:01:57,599 are relevant to the target that you're 51 00:01:57,599 --> 00:01:59,920 trying to predict. Embedded method for 52 00:01:59,920 --> 00:02:02,069 feature selection involved training a 53 00:02:02,069 --> 00:02:05,349 machine learning model. Now not all models 54 00:02:05,349 --> 00:02:08,069 assigned importance measures to features 55 00:02:08,069 --> 00:02:09,860 Relevant features can be selected by 56 00:02:09,860 --> 00:02:11,830 training a machine learning algorithm, 57 00:02:11,830 --> 00:02:14,110 which under the hold will sort features by 58 00:02:14,110 --> 00:02:16,699 important and assigning importance or 59 00:02:16,699 --> 00:02:19,300 relevant score. Tow each feature. Examples 60 00:02:19,300 --> 00:02:21,610 of such models are last regression on 61 00:02:21,610 --> 00:02:24,400 decision Trees feature selection using 62 00:02:24,400 --> 00:02:26,639 embedded metadata embedded within the 63 00:02:26,639 --> 00:02:28,599 mortal training fees. That's why they're 64 00:02:28,599 --> 00:02:30,919 called embedded methods when you perform 65 00:02:30,919 --> 00:02:33,620 feature selection using rapper method you 66 00:02:33,620 --> 00:02:35,629 build a number of different candidate 67 00:02:35,629 --> 00:02:38,090 models on each of these candidate models 68 00:02:38,090 --> 00:02:40,639 are built on different subsets off your 69 00:02:40,639 --> 00:02:43,189 features. You then choose that fetus 70 00:02:43,189 --> 00:02:45,389 upset, which gives you the best model. 71 00:02:45,389 --> 00:02:47,240 Let's go back to the different components 72 00:02:47,240 --> 00:02:48,530 of feature engineering that people 73 00:02:48,530 --> 00:02:50,449 discussing earlier and let's move on to 74 00:02:50,449 --> 00:02:52,729 discussing feature learning. Feature 75 00:02:52,729 --> 00:02:54,629 learning is where you rely on machine 76 00:02:54,629 --> 00:02:56,909 learning algorithms rather than human 77 00:02:56,909 --> 00:02:59,629 experts to learn the best representation 78 00:02:59,629 --> 00:03:01,919 off. Complex leader. This is especially 79 00:03:01,919 --> 00:03:04,289 useful when you're working with data such 80 00:03:04,289 --> 00:03:06,930 as images or videos. Feature Learning is 81 00:03:06,930 --> 00:03:09,180 also often referred to as representation 82 00:03:09,180 --> 00:03:11,310 learning. Now you can have fetal learning 83 00:03:11,310 --> 00:03:13,360 techniques that are supervising nature 84 00:03:13,360 --> 00:03:15,449 when you're working with a label. Corpus 85 00:03:15,449 --> 00:03:18,310 off data. Neural networks are classic 86 00:03:18,310 --> 00:03:20,409 example. Off, supervise feature learning. 87 00:03:20,409 --> 00:03:22,389 Then you feed data into neural networks. 88 00:03:22,389 --> 00:03:24,460 You typically don't highlight significant 89 00:03:24,460 --> 00:03:26,370 features. You just feed all off the data 90 00:03:26,370 --> 00:03:29,689 in and neural networks find out what late 91 00:03:29,689 --> 00:03:32,240 and features are important or significant 92 00:03:32,240 --> 00:03:34,870 Supervise. Future learning is an extremely 93 00:03:34,870 --> 00:03:36,650 important technique because it greatly 94 00:03:36,650 --> 00:03:39,930 reduces the need for expert judgment. When 95 00:03:39,930 --> 00:03:42,009 the selection of your features rely on 96 00:03:42,009 --> 00:03:44,310 humans, that technique will simply not 97 00:03:44,310 --> 00:03:45,979 skill. It's much better to have an 98 00:03:45,979 --> 00:03:47,930 automated solution. Such a supervised 99 00:03:47,930 --> 00:03:50,310 feature learning and the fact that new 100 00:03:50,310 --> 00:03:52,330 neural networks can learn significant 101 00:03:52,330 --> 00:03:54,530 features in your data is an important 102 00:03:54,530 --> 00:03:56,560 difference between neural network and deep 103 00:03:56,560 --> 00:03:58,330 learning techniques and traditional 104 00:03:58,330 --> 00:04:00,090 machine learning based systems. 105 00:04:00,090 --> 00:04:02,759 Traditional ML systems rely on experts to 106 00:04:02,759 --> 00:04:05,539 decide what features to pay attention to. 107 00:04:05,539 --> 00:04:07,650 On the other hand of representation, ML 108 00:04:07,650 --> 00:04:09,840 based systems figure out by themselves. 109 00:04:09,840 --> 00:04:11,610 What features are important or 110 00:04:11,610 --> 00:04:14,520 significant. Neural networks are examples 111 00:04:14,520 --> 00:04:18,259 off representation MLB systems. One thing 112 00:04:18,259 --> 00:04:19,930 you'll see when you work in the real ball 113 00:04:19,930 --> 00:04:22,100 that it's really hard to get a label 114 00:04:22,100 --> 00:04:24,529 corpus. Most of the data out in the real 115 00:04:24,529 --> 00:04:27,129 world tends to be unlabeled, and 116 00:04:27,129 --> 00:04:28,560 thankfully, there are future learning 117 00:04:28,560 --> 00:04:30,689 techniques that can work with an unlabeled 118 00:04:30,689 --> 00:04:33,350 corpus as well. These techniques are for 119 00:04:33,350 --> 00:04:35,889 tow US unsupervised feature learning on 120 00:04:35,889 --> 00:04:38,029 surprise, indicating the absence of label 121 00:04:38,029 --> 00:04:40,600 data. In order to learn patterns and 122 00:04:40,600 --> 00:04:43,160 unlabeled data, you can apply clustering 123 00:04:43,160 --> 00:04:45,300 techniques. Clustering will allow you to 124 00:04:45,300 --> 00:04:47,199 find a logical groupings that exist in 125 00:04:47,199 --> 00:04:49,360 your data. If you're working with image 126 00:04:49,360 --> 00:04:51,170 data and unsupervised technique for 127 00:04:51,170 --> 00:04:53,370 feature learning is dictionary Learning 128 00:04:53,370 --> 00:04:55,790 Dictionary Learning learns sparse 129 00:04:55,790 --> 00:04:58,300 representation. Zoff dense features such 130 00:04:58,300 --> 00:05:00,350 as with images. If you're working with 131 00:05:00,350 --> 00:05:02,129 deep learning, specifically neural 132 00:05:02,129 --> 00:05:04,990 networks, you can use auto and quarters to 133 00:05:04,990 --> 00:05:07,620 extract latent significant representations 134 00:05:07,620 --> 00:05:10,089 off your data. Now you might have machine 135 00:05:10,089 --> 00:05:12,540 learning models that can't work directly 136 00:05:12,540 --> 00:05:14,620 with raw features, which worked better 137 00:05:14,620 --> 00:05:16,389 with derived features. And that's their 138 00:05:16,389 --> 00:05:18,660 feature. Extraction comes in feature 139 00:05:18,660 --> 00:05:20,800 extraction differs from feature selection 140 00:05:20,800 --> 00:05:22,949 that we discussed earlier in that the 141 00:05:22,949 --> 00:05:24,670 input features are fundamentally 142 00:05:24,670 --> 00:05:27,459 transformed into derived features. The 143 00:05:27,459 --> 00:05:29,790 delight features are often unrecognisable 144 00:05:29,790 --> 00:05:31,949 and may be hard to interpret. These 145 00:05:31,949 --> 00:05:34,079 derived features might represent patterns 146 00:05:34,079 --> 00:05:36,000 that exist in your data, which are not 147 00:05:36,000 --> 00:05:38,689 intuitively understandable. Feature 148 00:05:38,689 --> 00:05:41,129 extraction techniques exists for data off 149 00:05:41,129 --> 00:05:43,579 all kinds of your working with images. You 150 00:05:43,579 --> 00:05:45,910 can use key points and descriptors to 151 00:05:45,910 --> 00:05:48,639 represent interesting areas in your image. 152 00:05:48,639 --> 00:05:50,269 If you're working with simple numeric 153 00:05:50,269 --> 00:05:52,709 data, feature extraction might involve 154 00:05:52,709 --> 00:05:55,259 reorienting your data to be protected 155 00:05:55,259 --> 00:05:57,959 along new axes, such as what we do when we 156 00:05:57,959 --> 00:05:59,949 find the principle components for me to 157 00:05:59,949 --> 00:06:02,800 seize or don't feel working with text data 158 00:06:02,800 --> 00:06:05,129 and natural language processing, you might 159 00:06:05,129 --> 00:06:07,160 choose to represent the worlds within your 160 00:06:07,160 --> 00:06:09,889 document using their DF idea of scores. 161 00:06:09,889 --> 00:06:11,949 The stands for Tom Frequency inverse 162 00:06:11,949 --> 00:06:14,269 document frequency and this is a score 163 00:06:14,269 --> 00:06:16,259 that represents how significant a 164 00:06:16,259 --> 00:06:17,910 particular word is in a particular 165 00:06:17,910 --> 00:06:21,649 document and across the entire corpus. Now 166 00:06:21,649 --> 00:06:23,459 it so happens that when you perform 167 00:06:23,459 --> 00:06:26,139 feature extraction, this also leads to 168 00:06:26,139 --> 00:06:29,009 dimensionality reduction. It reduces the 169 00:06:29,009 --> 00:06:31,079 number of dimensions which are required to 170 00:06:31,079 --> 00:06:33,649 express their data. So many off. The 171 00:06:33,649 --> 00:06:35,060 feature extraction techniques that we 172 00:06:35,060 --> 00:06:36,879 discussed earlier also happened to be 173 00:06:36,879 --> 00:06:38,970 dimensionality reduction techniques. 174 00:06:38,970 --> 00:06:40,939 However, when you use these techniques for 175 00:06:40,939 --> 00:06:44,269 feature extraction, the explicit objective 176 00:06:44,269 --> 00:06:46,870 is to the express your feature in a better 177 00:06:46,870 --> 00:06:51,000 form, not to reduce the number of ex columns or features. 13904