All language subtitles for 6. Pandas Cleaning and selection

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French Download
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:10,920 --> 00:00:12,990 Hi, everyone, and welcome in this new video. 2 00:00:13,530 --> 00:00:21,570 In this video, we're going to see the power of pandas for cleaning and do a little transformation in 3 00:00:21,570 --> 00:00:22,230 our data. 4 00:00:23,280 --> 00:00:29,430 So first, we need to import the CAC file attached to this video. 5 00:00:29,590 --> 00:00:34,970 Or you can find on the GitHub named Assets Point. 6 00:00:35,180 --> 00:00:35,770 CAC. 7 00:00:37,840 --> 00:00:44,530 To put it in Google Calendar, you just need to do a drag and drop here. 8 00:00:48,880 --> 00:00:56,770 Then we are going to use pandas to open this CSB file to do it. 9 00:00:56,980 --> 00:01:01,360 We are going to use the Reed Point system function. 10 00:01:04,450 --> 00:01:06,010 Then we need to put. 11 00:01:09,670 --> 00:01:18,040 The path to have access to this sea is defined, so if you walk in the local environment, I assure 12 00:01:18,040 --> 00:01:22,780 you that you have the good puff here. 13 00:01:25,300 --> 00:01:32,560 Then we first I will show you that. 14 00:01:32,560 --> 00:01:34,360 I say I think it's way better. 15 00:01:34,990 --> 00:01:41,110 As we can see, we have a lot of stock price, the volume, etc. and the dates. 16 00:01:42,160 --> 00:01:49,960 And usually it is really better to put the date as the index. 17 00:01:50,380 --> 00:02:04,230 So to do it, we need to specify specific dates equal to and we need to give the name of the index column. 18 00:02:13,160 --> 00:02:22,550 So it is better, but Hugh, I have put a database with a lot of issue. 19 00:02:25,110 --> 00:02:30,690 Too onerous to clean these datasets together. 20 00:02:30,930 --> 00:02:43,260 So first, the first things that we need to fix is that Biton read in chronological order. 21 00:02:43,950 --> 00:02:45,240 So if? 22 00:02:47,850 --> 00:02:50,530 Here we have the date. 23 00:02:50,820 --> 00:02:55,440 We need to have the latest date first. 24 00:02:55,830 --> 00:03:07,080 And here it is not that we want because here we have two thousand twenty and here we have 2000. 25 00:03:07,230 --> 00:03:09,450 So here it is. 26 00:03:11,990 --> 00:03:17,510 Older in descending order, and we want an ascending order. 27 00:03:17,750 --> 00:03:18,080 So 28 00:03:24,500 --> 00:03:29,270 I'm going to fix this using the SALT index function. 29 00:03:32,680 --> 00:03:35,380 And Putin ascending equal to. 30 00:03:40,340 --> 00:03:44,030 So we are fixed this issue now. 31 00:03:44,300 --> 00:03:47,390 We have a lot of missing value. 32 00:03:48,170 --> 00:03:53,570 So we are going to delete all this missing value. 33 00:03:53,600 --> 00:04:04,310 But before we are just going to select some specific Collins to have, though to have a very huge dataset. 34 00:04:30,260 --> 00:04:36,540 So now we are going to delete all this missing value to do it. 35 00:04:36,560 --> 00:04:37,760 It is very simple. 36 00:04:38,030 --> 00:04:40,520 You just have to 37 00:04:42,890 --> 00:04:54,500 apply the drop any function which will delete all the while containing at least one known value. 38 00:04:54,860 --> 00:05:00,920 But this function is more complex than that. 39 00:05:01,250 --> 00:05:09,740 If you want to use all the possibilities of this function, for example, just delete if there or at 40 00:05:09,740 --> 00:05:17,600 least three value or a certain threshold, etc. I will invite you to go on the panda's documentation 41 00:05:17,600 --> 00:05:21,110 because it is equivalent for all this calls. 42 00:05:21,770 --> 00:05:22,100 So 43 00:05:24,950 --> 00:05:31,160 now we can see that we don't have any missing value. 44 00:05:35,190 --> 00:05:38,040 So now we are going to see. 45 00:05:41,150 --> 00:05:54,620 How to reset the index, because sometimes we need to reset the index because dates is not would be 46 00:05:55,370 --> 00:06:06,020 good for some situation, and if we want to reset the index, we just have to use the reset index function 47 00:06:06,380 --> 00:06:06,740 and. 48 00:06:12,770 --> 00:06:19,010 With this function, we can choose to keep the date, for example, all. 49 00:06:21,960 --> 00:06:32,010 To drop it using drop equal to so I have not put this data frame in or viable assets. 50 00:06:32,280 --> 00:06:32,610 So 51 00:06:35,790 --> 00:06:40,230 asset is still with the dates is in index. 52 00:06:41,970 --> 00:06:50,700 So now I will show you a very interesting function to create our own technical indicators. 53 00:06:51,330 --> 00:06:53,610 So to show you this function. 54 00:06:53,820 --> 00:06:57,060 I will create a simple moving average. 55 00:07:02,970 --> 00:07:06,030 So first, we need to select. 56 00:07:09,170 --> 00:07:10,390 At DataFrame. 57 00:07:10,540 --> 00:07:24,580 So we choose the close S&P 500 columns, but we can choose a lot of other column if we want. 58 00:07:29,240 --> 00:07:39,530 Then we applied the warning function, and the warning function doesn't walk alone because the warning 59 00:07:39,530 --> 00:07:50,000 function allows us to do one bit, but we need to apply a function to this warning function and to us 60 00:07:50,000 --> 00:07:51,860 is the mean function. 61 00:08:01,070 --> 00:08:12,050 Now, let me show you the shift function, the shift function is very interesting when you want to create 62 00:08:12,230 --> 00:08:14,630 the percentage of variation. 63 00:08:15,020 --> 00:08:15,770 For example, 64 00:08:20,210 --> 00:08:28,130 when you want to compute the return of our strategy, you have the percentage of variation of the assets 65 00:08:28,700 --> 00:08:35,270 and you have your signal to buy or sell a stock, for example. 66 00:08:35,690 --> 00:08:49,340 But these two columns are in the same day, but it is not true to just multiply these two columns together 67 00:08:49,730 --> 00:08:58,820 because if you think the return of strategy as a percentage of operations, for example, from eight 68 00:08:58,850 --> 00:09:07,760 a.m. to eight p.m. but you take just the decision at eight pm, you have an issue because. 69 00:09:10,580 --> 00:09:16,850 You are not in the markets during the period that you take the percentage of variation. 70 00:09:17,060 --> 00:09:24,050 So I don't know, it's very clear, but it will be very clear when we are going to compute the percentage 71 00:09:24,050 --> 00:09:26,480 of aviation in the next chapter. 72 00:09:26,810 --> 00:09:35,860 But it is important to at least show this function to be a little bit more comfortable with this function 73 00:09:35,870 --> 00:09:36,350 later. 74 00:09:40,930 --> 00:09:44,530 So if I put shift equal one, for example, 75 00:09:47,560 --> 00:09:55,240 we can see that we have just shift the value by one rule if I put 10, for example. 76 00:09:56,340 --> 00:10:07,350 We shift the value with 10 rule, so it is very important to understand what this function does because 77 00:10:07,770 --> 00:10:13,500 it will be the key to do a back this properly. 78 00:10:16,030 --> 00:10:24,400 Now we are going to talk about the group by function, the group buying function is a very interesting 79 00:10:24,400 --> 00:10:33,160 tool, even if we are not going to use it in this courses, but it is very important to know it because 80 00:10:33,460 --> 00:10:35,560 sometimes it can be very useful. 81 00:10:38,130 --> 00:10:45,360 So I will use the group function on my data frame. 82 00:10:48,210 --> 00:10:54,080 I want to group by the column none. 83 00:10:54,300 --> 00:10:56,220 So all the number here. 84 00:10:59,270 --> 00:11:07,910 And then as for the running function, Roubaix doesn't work well and you need to apply something. 85 00:11:08,360 --> 00:11:13,430 And for example, if I put me, I will have. 86 00:11:15,730 --> 00:11:25,270 For one, the mean of all the venue here, associate to the them one, for example, here, if I change 87 00:11:25,270 --> 00:11:33,700 the venue, I will have the mean of this three venue for the number three. 88 00:11:34,150 --> 00:11:35,980 So it is very interesting. 89 00:11:35,980 --> 00:11:42,670 In some way we can do exactly the same with the same function, for example. 90 00:11:42,970 --> 00:11:50,050 So we have four here, 11 plus 10, so 21. 91 00:11:50,320 --> 00:11:56,350 And for the number of three, we have 15 plus 15 plus plus 10. 92 00:11:57,520 --> 00:12:02,740 So 15, 15 percent plus 10 is equal to 40. 93 00:12:03,970 --> 00:12:07,600 So we have the same. 94 00:12:09,300 --> 00:12:18,840 Of the value associate to each member, so it can be very interesting in some case, and we can also 95 00:12:18,840 --> 00:12:29,580 do the standard deviation to have idea of the dispersion of the value from the mean. 96 00:12:33,520 --> 00:12:42,330 So, Hugh, I cannot compute it with my mind quickly, like the Mean or the Sun. 97 00:12:42,640 --> 00:12:45,310 But it is exactly the same thing. 98 00:12:45,530 --> 00:12:48,800 We're going to take the mean. 99 00:12:48,820 --> 00:12:57,310 So this number, for example, for the number three, then we compute the distance between the mean 100 00:12:57,310 --> 00:13:01,000 and 15, the mean and 15 and the mean and 10. 101 00:13:01,450 --> 00:13:08,380 And then we compute the standard deviation and we do exactly the same for the number one, we compute 102 00:13:08,620 --> 00:13:15,490 the difference between the mean and this value, the mean and this value and we compute the standard 103 00:13:15,580 --> 00:13:16,330 deviation. 104 00:13:17,650 --> 00:13:20,050 So it is all for this video. 105 00:13:20,170 --> 00:13:31,180 I will invite you again to play with this because Master Penders is a very necessary skill to master 106 00:13:31,180 --> 00:13:32,500 algorithmic trading. 10193

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.