subtitlecat.com

All language subtitles for 04 - Features and Labels.en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French Download

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:01,040 --> 00:00:02,080 [Autogenerated] before we talk about 1 00:00:02,080 --> 00:00:04,570 future engineering, let's talk about what 2 00:00:04,570 --> 00:00:06,559 features and labels mean when you're 3 00:00:06,559 --> 00:00:07,870 building and training your machine 4 00:00:07,870 --> 00:00:09,740 learning. Marty, if you're taking this 5 00:00:09,740 --> 00:00:11,099 course, you've been introduced. The 6 00:00:11,099 --> 00:00:13,070 concept of machine landing a machine 7 00:00:13,070 --> 00:00:15,599 learning algorithm is an algorithm that is 8 00:00:15,599 --> 00:00:18,329 able to learn from data. It's able to look 9 00:00:18,329 --> 00:00:20,989 at data and find patterns and use these 10 00:00:20,989 --> 00:00:23,390 patterns to make analysis. Machine 11 00:00:23,390 --> 00:00:25,289 learning models have the ability to work 12 00:00:25,289 --> 00:00:28,429 with a huge base of data and find patterns 13 00:00:28,429 --> 00:00:30,519 within this data. These are patterns that 14 00:00:30,519 --> 00:00:32,990 are not easily discoverable using, say, 15 00:00:32,990 --> 00:00:34,609 exploratory data analysis or 16 00:00:34,609 --> 00:00:37,000 visualizations. One thing's patterns have 17 00:00:37,000 --> 00:00:38,530 been determined. Machine learning 18 00:00:38,530 --> 00:00:40,710 algorithms can then be used for 19 00:00:40,710 --> 00:00:43,189 prediction. Machine learning models, once 20 00:00:43,189 --> 00:00:45,549 it's learned from your data, are capable 21 00:00:45,549 --> 00:00:48,100 of making intelligent decisions. No 22 00:00:48,100 --> 00:00:50,250 machine learning is a vast field with many 23 00:00:50,250 --> 00:00:53,140 applications, but at its core, machine 24 00:00:53,140 --> 00:00:55,259 learning problems can be divided into four 25 00:00:55,259 --> 00:00:57,030 broad categories. The first is 26 00:00:57,030 --> 00:01:00,689 classifications user model, tow, classify 27 00:01:00,689 --> 00:01:04,500 instances, good or bad girl or boy cat or 28 00:01:04,500 --> 00:01:07,159 dog classification. Models are used to 29 00:01:07,159 --> 00:01:10,049 predict classes or categories. If you want 30 00:01:10,049 --> 00:01:13,099 to predict a continuous numeric value, you 31 00:01:13,099 --> 00:01:15,400 lose a regression model regression 32 00:01:15,400 --> 00:01:17,140 analysis of what you'll do to predict, 33 00:01:17,140 --> 00:01:19,069 say, the mileage of an automobile, the 34 00:01:19,069 --> 00:01:22,000 price off a stock or a home. If you have a 35 00:01:22,000 --> 00:01:24,849 large corpus off data and you want to find 36 00:01:24,849 --> 00:01:26,859 logically, groupings are patterns that 37 00:01:26,859 --> 00:01:28,980 exist in your data. You can apply a 38 00:01:28,980 --> 00:01:31,510 clustering model Clustering model tries to 39 00:01:31,510 --> 00:01:33,400 bring together those data points in a 40 00:01:33,400 --> 00:01:35,780 single cluster, which are similar to one 41 00:01:35,780 --> 00:01:38,450 another. And finally, if the entities in 42 00:01:38,450 --> 00:01:40,549 your data have many characteristics are 43 00:01:40,549 --> 00:01:42,450 features and you want to find which 44 00:01:42,450 --> 00:01:44,670 features are important, extract latent 45 00:01:44,670 --> 00:01:46,829 features from your data. You'll apply 46 00:01:46,829 --> 00:01:49,790 dimensionality reduction as a student of 47 00:01:49,790 --> 00:01:51,750 machine learning. These are the 1st 4 48 00:01:51,750 --> 00:01:53,349 broad categories of machine learning 49 00:01:53,349 --> 00:01:55,250 techniques that you'll encounter. Let's 50 00:01:55,250 --> 00:01:57,459 understand what machine learning is by 51 00:01:57,459 --> 00:01:59,670 taking an example off classifications. You 52 00:01:59,670 --> 00:02:01,810 want to determine whether wheels are fish 53 00:02:01,810 --> 00:02:04,670 are mammals. Now you know that wheels are 54 00:02:04,670 --> 00:02:07,239 members of the infra order, said Asia, 55 00:02:07,239 --> 00:02:09,580 which indicates that they're mammals. But 56 00:02:09,580 --> 00:02:11,080 on the other hand, if you look at the 57 00:02:11,080 --> 00:02:13,370 character, the sticks off a real they look 58 00:02:13,370 --> 00:02:15,949 like fish. They swim like fish. They move 59 00:02:15,949 --> 00:02:18,280 it fish, they live it fish in the sea. 60 00:02:18,280 --> 00:02:21,340 They could be fish Now for your classifier 61 00:02:21,340 --> 00:02:23,610 you essentially want to be able to feed in 62 00:02:23,610 --> 00:02:26,789 the character the six off a veil into your 63 00:02:26,789 --> 00:02:28,979 machine learning markets. You have an ML 64 00:02:28,979 --> 00:02:31,020 based classifier, which has been trained 65 00:02:31,020 --> 00:02:33,979 on a huge corpus off data. Once it has 66 00:02:33,979 --> 00:02:35,840 been trained, you'll feed in the character 67 00:02:35,840 --> 00:02:37,810 the sticks off a well and hope for a 68 00:02:37,810 --> 00:02:40,129 prediction that is correct. If it's a 69 00:02:40,129 --> 00:02:41,870 robust, welcoming classifier it he 70 00:02:41,870 --> 00:02:45,180 correctly identify the wheel as a mammal. 71 00:02:45,180 --> 00:02:47,849 Now the question is, how did we get this 72 00:02:47,849 --> 00:02:50,270 ML base classifier trained on a corpus off 73 00:02:50,270 --> 00:02:53,280 data. Then you start off with a machine 74 00:02:53,280 --> 00:02:56,110 learning algorithm and you feed in a huge 75 00:02:56,110 --> 00:02:58,889 corpus off training data to train your 76 00:02:58,889 --> 00:03:00,900 machine learning algorithm. This 77 00:03:00,900 --> 00:03:02,930 classification algorithm will look through 78 00:03:02,930 --> 00:03:04,949 all of the samples present in the training 79 00:03:04,949 --> 00:03:07,460 data and try toe extract significant 80 00:03:07,460 --> 00:03:09,680 patterns, and this, in turn will give you 81 00:03:09,680 --> 00:03:12,900 an ML base classifier. A classifier is a 82 00:03:12,900 --> 00:03:16,259 fully trained model. The algorithm learns 83 00:03:16,259 --> 00:03:19,180 from data to give you a trained morning. 84 00:03:19,180 --> 00:03:21,949 That is your classifier. If you have a 85 00:03:21,949 --> 00:03:24,099 good model, you should be able to give it 86 00:03:24,099 --> 00:03:26,539 information and have it make predictions 87 00:03:26,539 --> 00:03:29,409 the characteristics off well that you feed 88 00:03:29,409 --> 00:03:31,139 into your machine learning model, whether 89 00:03:31,139 --> 00:03:33,680 for training or for prediction, is a refer 90 00:03:33,680 --> 00:03:36,050 to as the feature vector. These are the 91 00:03:36,050 --> 00:03:39,340 features off your instance at the other 92 00:03:39,340 --> 00:03:41,030 end the output off your model. The 93 00:03:41,030 --> 00:03:43,360 prediction that it makes whether it's an 94 00:03:43,360 --> 00:03:46,250 output category or a continuous of values 95 00:03:46,250 --> 00:03:48,840 such as a stock price, is referred to as 96 00:03:48,840 --> 00:03:51,180 the label. Now it's quite possible that 97 00:03:51,180 --> 00:03:53,069 you feed into your machine learning model 98 00:03:53,069 --> 00:03:55,479 entirely different characteristics off. 99 00:03:55,479 --> 00:03:57,840 Will you tell it that it moves like a fish 100 00:03:57,840 --> 00:03:59,969 and it looks like a fish Now, in such 101 00:03:59,969 --> 00:04:03,009 situations, your classifier is lightly toe 102 00:04:03,009 --> 00:04:05,289 indicate that the wheel is a fish, which 103 00:04:05,289 --> 00:04:08,360 is clearly wrong. What you fed in in your 104 00:04:08,360 --> 00:04:11,099 input feature vector are incorrectly 105 00:04:11,099 --> 00:04:13,580 specified features, and when the features 106 00:04:13,580 --> 00:04:15,889 that you feed into your model are not set 107 00:04:15,889 --> 00:04:18,279 up correctly, you're likely to get an 108 00:04:18,279 --> 00:04:20,939 incorrect prediction from your model. Your 109 00:04:20,939 --> 00:04:23,079 model is only as good as the features that 110 00:04:23,079 --> 00:04:25,550 you use for training the features that you 111 00:04:25,550 --> 00:04:27,810 used to train your model or also refer to 112 00:04:27,810 --> 00:04:31,129 US ex variables. X variables are the 113 00:04:31,129 --> 00:04:33,060 attributes that the machine learning 114 00:04:33,060 --> 00:04:36,779 algorithm focuses on the characteristics 115 00:04:36,779 --> 00:04:38,709 of the entities on which your training 116 00:04:38,709 --> 00:04:40,740 your model of the attributes are ex 117 00:04:40,740 --> 00:04:44,079 variables, and every data point is a list 118 00:04:44,079 --> 00:04:47,310 or vector off such ex variables, and this 119 00:04:47,310 --> 00:04:49,589 is what is together refer to as a feature 120 00:04:49,589 --> 00:04:52,350 vector does The important animal algorithm 121 00:04:52,350 --> 00:04:55,779 is a feature vector feature vector exactly 122 00:04:55,779 --> 00:04:57,939 the same thing as ex variables, and I'll 123 00:04:57,939 --> 00:05:00,089 use the storm's interchangeably throughout 124 00:05:00,089 --> 00:05:02,180 the scores. The output of your machine 125 00:05:02,180 --> 00:05:04,339 learning model, the predictions that it 126 00:05:04,339 --> 00:05:07,199 makes I refer to us. Why variables the 127 00:05:07,199 --> 00:05:09,100 attributes that the machine learning 128 00:05:09,100 --> 00:05:11,300 algorithm tries to predict our called 129 00:05:11,300 --> 00:05:14,519 labels, or by valuables once again, a 130 00:05:14,519 --> 00:05:17,389 loose the terms labels and by variables 131 00:05:17,389 --> 00:05:19,560 interchangeably, they mean the same thing. 132 00:05:19,560 --> 00:05:21,160 Another time that you can use for by 133 00:05:21,160 --> 00:05:24,689 variables are targets. Now, based on the 134 00:05:24,689 --> 00:05:26,389 kind of model that you're building, the 135 00:05:26,389 --> 00:05:28,699 labels can be off different types. If you 136 00:05:28,699 --> 00:05:31,850 have categorical or discreet label values, 137 00:05:31,850 --> 00:05:33,290 that is typically the output off a 138 00:05:33,290 --> 00:05:36,550 classification algorithm. Spam or ham, 139 00:05:36,550 --> 00:05:39,800 True or false? A, B, C or D. These are 140 00:05:39,800 --> 00:05:41,709 examples of categorically values or 141 00:05:41,709 --> 00:05:44,850 labels. Labels can also be numeric or 142 00:05:44,850 --> 00:05:47,389 continuous values. These air typically the 143 00:05:47,389 --> 00:05:49,839 output off what regression models such as 144 00:05:49,839 --> 00:05:51,420 the models that you use for price 145 00:05:51,420 --> 00:05:53,379 prediction. When you're working with 146 00:05:53,379 --> 00:05:55,230 continuous output from your machine 147 00:05:55,230 --> 00:05:57,740 learning Marley, you might call them by 148 00:05:57,740 --> 00:05:59,769 values or targets rather than labels, 149 00:05:59,769 --> 00:06:01,660 because labels have a very categorically 150 00:06:01,660 --> 00:06:03,819 field. Now that you understand the 151 00:06:03,819 --> 00:06:06,230 difference between features and labels, 152 00:06:06,230 --> 00:06:07,899 there is an important point that you need 153 00:06:07,899 --> 00:06:11,300 to remember garbage in garbage out. If the 154 00:06:11,300 --> 00:06:13,759 data that you feed into an ML model is off 155 00:06:13,759 --> 00:06:17,029 a poor quality, the model itself will be a 156 00:06:17,029 --> 00:06:21,000 poor model. Your mortal is only as good as her data. 12299