subtitlecat.com

All language subtitles for 6. Power, Log, and Box-Cox Transformations

Afrikaans

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bengali

Bosnian

Bulgarian

Catalan

Cebuano

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Khmer

Korean

Kurdish (Kurmanji)

Kyrgyz

Lao

Latin

Latvian

Lithuanian

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mongolian

Myanmar (Burmese)

Nepali

Norwegian

Pashto

Persian

Polish

Portuguese

Punjabi

Romanian

Russian

Samoan

Scots Gaelic

Serbian

Sesotho

Shona

Sindhi

Sinhala

Slovak

Slovenian

Somali

Spanish

Sundanese

Swahili

Swedish

Tajik

Tamil

Telugu

Thai

Turkish

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Xhosa

Yiddish

Yoruba

Zulu

Odia (Oriya)

Kinyarwanda

Turkmen

Tatar

Uyghur

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:11,140 --> 00:00:16,750 So in this lecture, we are going to discuss some common at times serious transformations, if you're 2 00:00:16,750 --> 00:00:22,000 familiar with machine learning, then you know that it's often useful to transform your data before 3 00:00:22,000 --> 00:00:24,000 passing it into a machine learning model. 4 00:00:24,550 --> 00:00:32,380 For example, standardization or min scaling four time series, we'll be discussing three common transformations 5 00:00:32,620 --> 00:00:36,700 the power transform, the log transform and the Buzzcocks transform. 6 00:00:37,240 --> 00:00:40,210 As you'll see, these all essentially serve the same purpose. 7 00:00:44,910 --> 00:00:50,160 So let's start with the power transform, the power transform involves raising all your data points 8 00:00:50,160 --> 00:00:55,530 to a power, for example, by raising every data point to the power of one half, you'll be taking the 9 00:00:55,530 --> 00:00:56,900 square root of your data set. 10 00:00:57,810 --> 00:00:59,100 So why is this useful? 11 00:01:00,030 --> 00:01:03,560 Well, imagine that your data appears to grow quadratic in time. 12 00:01:04,020 --> 00:01:09,090 If you take the square root, the result would be that you transform your data to grow linearly. 13 00:01:09,630 --> 00:01:11,070 So why is that useful? 14 00:01:11,670 --> 00:01:16,250 Well, you'll soon learn about some machine learning models that can learn linear trends very well, 15 00:01:16,680 --> 00:01:20,480 but there's no model for quadratic trends or Kubic trends and so forth. 16 00:01:21,090 --> 00:01:26,430 Thus, by transforming your data to appear like it has a linear trend, you give your model a better 17 00:01:26,430 --> 00:01:31,650 chance of forecasting future data points and modeling the true nature of the Time series more closely. 18 00:01:36,370 --> 00:01:42,760 So another transformation with a similar purpose is the log transform, like the power transform, it 19 00:01:42,760 --> 00:01:46,160 basically ends up squashing your data into a smaller range. 20 00:01:46,600 --> 00:01:51,940 In fact, a lot of the time I'll just end up using the log transform by default without considering 21 00:01:51,940 --> 00:01:52,880 other options. 22 00:01:53,800 --> 00:01:58,530 One common application of the log transform is in finance and finance. 23 00:01:58,540 --> 00:02:02,620 It's common to model stock prices as following a normal distribution. 24 00:02:03,640 --> 00:02:07,960 It's also common to model log returns instead of returns based on percentages. 25 00:02:08,770 --> 00:02:12,340 As an example, this is the basis for the famous Black-Scholes formula. 26 00:02:13,750 --> 00:02:19,630 Note that one possible issue with the log transform is that it doesn't accept zero or negative values 27 00:02:19,630 --> 00:02:20,270 as input. 28 00:02:21,190 --> 00:02:27,450 For this reason, it can only be used for data which is strictly positive for data that might be non-negative. 29 00:02:27,460 --> 00:02:30,610 It's common to simply add one before taking the log. 30 00:02:35,170 --> 00:02:41,350 OK, so a third transform we're going to discuss is the box cox transform, which generalises the concept 31 00:02:41,350 --> 00:02:44,270 of both the power transform and the log transform. 32 00:02:44,740 --> 00:02:50,020 You can see that it involves this parameter lambda, which is the power to use when taking the transform. 33 00:02:51,010 --> 00:02:52,630 So why does this make sense? 34 00:02:53,200 --> 00:02:58,900 This makes sense because the natural logarithm is actually the limit of this specific power transform 35 00:02:59,140 --> 00:03:00,730 as the power approaches zero. 36 00:03:01,990 --> 00:03:06,850 Now inside the box Cox function will automatically choose the value of Lambda for us. 37 00:03:07,210 --> 00:03:10,620 So we don't need to worry about finding the optimal value ourselves. 38 00:03:11,080 --> 00:03:15,880 But if you're interested in learning how this value is chosen, I'd encourage you to check out the CPA 39 00:03:15,880 --> 00:03:20,440 documentation as well as this article I've included in extra reading tea. 40 00:03:25,150 --> 00:03:30,790 So one common reason people give for why they use the Buzzcocks transform is that they want to make 41 00:03:30,790 --> 00:03:32,510 the data normally distributed. 42 00:03:33,370 --> 00:03:37,510 However, note that this motivation does not apply to Raw Time series. 43 00:03:38,020 --> 00:03:39,110 So why is this? 44 00:03:39,940 --> 00:03:42,850 Well, remember that Time series data is dynamic. 45 00:03:43,000 --> 00:03:44,300 It changes in time. 46 00:03:44,350 --> 00:03:45,470 It can have a trend. 47 00:03:46,000 --> 00:03:51,340 So when you take time series data and plot a histogram hoping that it will be normal, this is actually 48 00:03:51,340 --> 00:03:55,290 the wrong thing to do was discuss this more later in the course. 49 00:03:55,300 --> 00:04:01,300 But in order to take data over time and plot its distribution or histogram, we need that data to be 50 00:04:01,300 --> 00:04:02,140 stationary. 51 00:04:02,740 --> 00:04:06,490 Stationary essentially means distribution doesn't change over time. 52 00:04:07,780 --> 00:04:09,430 So why is this a requirement? 53 00:04:10,060 --> 00:04:14,380 Well, imagine you have some data which simply follows a line that grows at a constant rate. 54 00:04:15,010 --> 00:04:17,670 Does plotting the histogram of this data makes sense? 55 00:04:18,100 --> 00:04:19,060 The answer is no. 56 00:04:19,720 --> 00:04:22,030 What do we want this to be normally distributed? 57 00:04:22,420 --> 00:04:23,440 The answer is no. 58 00:04:23,950 --> 00:04:27,310 In fact, this data behaves much better with a linear trend. 59 00:04:27,820 --> 00:04:33,460 The point of plotting a histogram is to understand the distribution of the data, but the distribution 60 00:04:33,460 --> 00:04:38,020 at the bottom of this plot is clearly different from the distribution at the top of this plot. 61 00:04:38,650 --> 00:04:43,390 Therefore, it makes no sense to mix this data together into a single histogram. 62 00:04:43,780 --> 00:04:46,330 This does not tell us how the data is distributed. 63 00:04:50,960 --> 00:04:56,510 The final topic I want to discuss in this lecture is why the log transform is deeply fundamental. 64 00:04:56,990 --> 00:05:01,490 Not only is it useful mathematically, but it also seems to be part of nature itself. 65 00:05:02,180 --> 00:05:04,170 One example of this is perception. 66 00:05:04,730 --> 00:05:10,070 For example, although a normal conversation is ten thousand times louder than a whisper, it doesn't 67 00:05:10,070 --> 00:05:12,750 have ten thousand times the effect on your senses. 68 00:05:13,310 --> 00:05:18,050 That's why we use the decibel scale to measure sound, which is essentially a log transform. 69 00:05:19,810 --> 00:05:25,540 Another example of how the logarithm seems to simply be a part of nature is how we as humans interpret 70 00:05:25,540 --> 00:05:26,210 numbers. 71 00:05:26,740 --> 00:05:31,630 For example, if you have one thousand dollars in the bank, then losing one thousand dollars would 72 00:05:31,630 --> 00:05:32,710 be a pretty big deal. 73 00:05:33,190 --> 00:05:38,080 But if you have one billion dollars in the bank, spending one thousand dollars on a pair of jeans would 74 00:05:38,080 --> 00:05:39,300 feel completely normal. 75 00:05:40,330 --> 00:05:45,490 Another way to think of this is imagine going from zero dollars in wealth to one million. 76 00:05:45,880 --> 00:05:47,050 That's a pretty big jump. 77 00:05:47,620 --> 00:05:49,570 How about one million to two million? 78 00:05:49,990 --> 00:05:54,400 Although you still made the same amount of money, its utility is less so. 79 00:05:54,400 --> 00:05:58,690 One might model the utility of wealth as the logarithm of the wealth and. 8299