Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:10,920 --> 00:00:12,990
Hi, everyone, and welcome in this new video.
2
00:00:13,530 --> 00:00:21,570
In this video, we're going to see the power of pandas for cleaning and do a little transformation in
3
00:00:21,570 --> 00:00:22,230
our data.
4
00:00:23,280 --> 00:00:29,430
So first, we need to import the CAC file attached to this video.
5
00:00:29,590 --> 00:00:34,970
Or you can find on the GitHub named Assets Point.
6
00:00:35,180 --> 00:00:35,770
CAC.
7
00:00:37,840 --> 00:00:44,530
To put it in Google Calendar, you just need to do a drag and drop here.
8
00:00:48,880 --> 00:00:56,770
Then we are going to use pandas to open this CSB file to do it.
9
00:00:56,980 --> 00:01:01,360
We are going to use the Reed Point system function.
10
00:01:04,450 --> 00:01:06,010
Then we need to put.
11
00:01:09,670 --> 00:01:18,040
The path to have access to this sea is defined, so if you walk in the local environment, I assure
12
00:01:18,040 --> 00:01:22,780
you that you have the good puff here.
13
00:01:25,300 --> 00:01:32,560
Then we first I will show you that.
14
00:01:32,560 --> 00:01:34,360
I say I think it's way better.
15
00:01:34,990 --> 00:01:41,110
As we can see, we have a lot of stock price, the volume, etc. and the dates.
16
00:01:42,160 --> 00:01:49,960
And usually it is really better to put the date as the index.
17
00:01:50,380 --> 00:02:04,230
So to do it, we need to specify specific dates equal to and we need to give the name of the index column.
18
00:02:13,160 --> 00:02:22,550
So it is better, but Hugh, I have put a database with a lot of issue.
19
00:02:25,110 --> 00:02:30,690
Too onerous to clean these datasets together.
20
00:02:30,930 --> 00:02:43,260
So first, the first things that we need to fix is that Biton read in chronological order.
21
00:02:43,950 --> 00:02:45,240
So if?
22
00:02:47,850 --> 00:02:50,530
Here we have the date.
23
00:02:50,820 --> 00:02:55,440
We need to have the latest date first.
24
00:02:55,830 --> 00:03:07,080
And here it is not that we want because here we have two thousand twenty and here we have 2000.
25
00:03:07,230 --> 00:03:09,450
So here it is.
26
00:03:11,990 --> 00:03:17,510
Older in descending order, and we want an ascending order.
27
00:03:17,750 --> 00:03:18,080
So
28
00:03:24,500 --> 00:03:29,270
I'm going to fix this using the SALT index function.
29
00:03:32,680 --> 00:03:35,380
And Putin ascending equal to.
30
00:03:40,340 --> 00:03:44,030
So we are fixed this issue now.
31
00:03:44,300 --> 00:03:47,390
We have a lot of missing value.
32
00:03:48,170 --> 00:03:53,570
So we are going to delete all this missing value.
33
00:03:53,600 --> 00:04:04,310
But before we are just going to select some specific Collins to have, though to have a very huge dataset.
34
00:04:30,260 --> 00:04:36,540
So now we are going to delete all this missing value to do it.
35
00:04:36,560 --> 00:04:37,760
It is very simple.
36
00:04:38,030 --> 00:04:40,520
You just have to
37
00:04:42,890 --> 00:04:54,500
apply the drop any function which will delete all the while containing at least one known value.
38
00:04:54,860 --> 00:05:00,920
But this function is more complex than that.
39
00:05:01,250 --> 00:05:09,740
If you want to use all the possibilities of this function, for example, just delete if there or at
40
00:05:09,740 --> 00:05:17,600
least three value or a certain threshold, etc. I will invite you to go on the panda's documentation
41
00:05:17,600 --> 00:05:21,110
because it is equivalent for all this calls.
42
00:05:21,770 --> 00:05:22,100
So
43
00:05:24,950 --> 00:05:31,160
now we can see that we don't have any missing value.
44
00:05:35,190 --> 00:05:38,040
So now we are going to see.
45
00:05:41,150 --> 00:05:54,620
How to reset the index, because sometimes we need to reset the index because dates is not would be
46
00:05:55,370 --> 00:06:06,020
good for some situation, and if we want to reset the index, we just have to use the reset index function
47
00:06:06,380 --> 00:06:06,740
and.
48
00:06:12,770 --> 00:06:19,010
With this function, we can choose to keep the date, for example, all.
49
00:06:21,960 --> 00:06:32,010
To drop it using drop equal to so I have not put this data frame in or viable assets.
50
00:06:32,280 --> 00:06:32,610
So
51
00:06:35,790 --> 00:06:40,230
asset is still with the dates is in index.
52
00:06:41,970 --> 00:06:50,700
So now I will show you a very interesting function to create our own technical indicators.
53
00:06:51,330 --> 00:06:53,610
So to show you this function.
54
00:06:53,820 --> 00:06:57,060
I will create a simple moving average.
55
00:07:02,970 --> 00:07:06,030
So first, we need to select.
56
00:07:09,170 --> 00:07:10,390
At DataFrame.
57
00:07:10,540 --> 00:07:24,580
So we choose the close S&P 500 columns, but we can choose a lot of other column if we want.
58
00:07:29,240 --> 00:07:39,530
Then we applied the warning function, and the warning function doesn't walk alone because the warning
59
00:07:39,530 --> 00:07:50,000
function allows us to do one bit, but we need to apply a function to this warning function and to us
60
00:07:50,000 --> 00:07:51,860
is the mean function.
61
00:08:01,070 --> 00:08:12,050
Now, let me show you the shift function, the shift function is very interesting when you want to create
62
00:08:12,230 --> 00:08:14,630
the percentage of variation.
63
00:08:15,020 --> 00:08:15,770
For example,
64
00:08:20,210 --> 00:08:28,130
when you want to compute the return of our strategy, you have the percentage of variation of the assets
65
00:08:28,700 --> 00:08:35,270
and you have your signal to buy or sell a stock, for example.
66
00:08:35,690 --> 00:08:49,340
But these two columns are in the same day, but it is not true to just multiply these two columns together
67
00:08:49,730 --> 00:08:58,820
because if you think the return of strategy as a percentage of operations, for example, from eight
68
00:08:58,850 --> 00:09:07,760
a.m. to eight p.m. but you take just the decision at eight pm, you have an issue because.
69
00:09:10,580 --> 00:09:16,850
You are not in the markets during the period that you take the percentage of variation.
70
00:09:17,060 --> 00:09:24,050
So I don't know, it's very clear, but it will be very clear when we are going to compute the percentage
71
00:09:24,050 --> 00:09:26,480
of aviation in the next chapter.
72
00:09:26,810 --> 00:09:35,860
But it is important to at least show this function to be a little bit more comfortable with this function
73
00:09:35,870 --> 00:09:36,350
later.
74
00:09:40,930 --> 00:09:44,530
So if I put shift equal one, for example,
75
00:09:47,560 --> 00:09:55,240
we can see that we have just shift the value by one rule if I put 10, for example.
76
00:09:56,340 --> 00:10:07,350
We shift the value with 10 rule, so it is very important to understand what this function does because
77
00:10:07,770 --> 00:10:13,500
it will be the key to do a back this properly.
78
00:10:16,030 --> 00:10:24,400
Now we are going to talk about the group by function, the group buying function is a very interesting
79
00:10:24,400 --> 00:10:33,160
tool, even if we are not going to use it in this courses, but it is very important to know it because
80
00:10:33,460 --> 00:10:35,560
sometimes it can be very useful.
81
00:10:38,130 --> 00:10:45,360
So I will use the group function on my data frame.
82
00:10:48,210 --> 00:10:54,080
I want to group by the column none.
83
00:10:54,300 --> 00:10:56,220
So all the number here.
84
00:10:59,270 --> 00:11:07,910
And then as for the running function, Roubaix doesn't work well and you need to apply something.
85
00:11:08,360 --> 00:11:13,430
And for example, if I put me, I will have.
86
00:11:15,730 --> 00:11:25,270
For one, the mean of all the venue here, associate to the them one, for example, here, if I change
87
00:11:25,270 --> 00:11:33,700
the venue, I will have the mean of this three venue for the number three.
88
00:11:34,150 --> 00:11:35,980
So it is very interesting.
89
00:11:35,980 --> 00:11:42,670
In some way we can do exactly the same with the same function, for example.
90
00:11:42,970 --> 00:11:50,050
So we have four here, 11 plus 10, so 21.
91
00:11:50,320 --> 00:11:56,350
And for the number of three, we have 15 plus 15 plus plus 10.
92
00:11:57,520 --> 00:12:02,740
So 15, 15 percent plus 10 is equal to 40.
93
00:12:03,970 --> 00:12:07,600
So we have the same.
94
00:12:09,300 --> 00:12:18,840
Of the value associate to each member, so it can be very interesting in some case, and we can also
95
00:12:18,840 --> 00:12:29,580
do the standard deviation to have idea of the dispersion of the value from the mean.
96
00:12:33,520 --> 00:12:42,330
So, Hugh, I cannot compute it with my mind quickly, like the Mean or the Sun.
97
00:12:42,640 --> 00:12:45,310
But it is exactly the same thing.
98
00:12:45,530 --> 00:12:48,800
We're going to take the mean.
99
00:12:48,820 --> 00:12:57,310
So this number, for example, for the number three, then we compute the distance between the mean
100
00:12:57,310 --> 00:13:01,000
and 15, the mean and 15 and the mean and 10.
101
00:13:01,450 --> 00:13:08,380
And then we compute the standard deviation and we do exactly the same for the number one, we compute
102
00:13:08,620 --> 00:13:15,490
the difference between the mean and this value, the mean and this value and we compute the standard
103
00:13:15,580 --> 00:13:16,330
deviation.
104
00:13:17,650 --> 00:13:20,050
So it is all for this video.
105
00:13:20,170 --> 00:13:31,180
I will invite you again to play with this because Master Penders is a very necessary skill to master
106
00:13:31,180 --> 00:13:32,500
algorithmic trading.
10193
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.