Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:05,560 --> 00:00:11,650
In this section of the course, we're going to explore some of the techniques we can use to prepare
2
00:00:11,650 --> 00:00:13,510
our data for analysis.
3
00:00:14,170 --> 00:00:20,590
Now, Excel is well known for its data analysis tools, so things like pivot tables and charts and different
4
00:00:20,590 --> 00:00:22,570
types of calculations and formulas.
5
00:00:23,020 --> 00:00:29,080
But what's lesser known is how important it is to make sure that the data that you're going to analyze
6
00:00:29,320 --> 00:00:36,850
is clean and in an appropriate state, because many times data that we're using isn't necessarily a
7
00:00:36,850 --> 00:00:38,380
data set that we've created.
8
00:00:39,040 --> 00:00:45,730
So it might be that you've received a data set from a client or a colleague, or maybe you've even downloaded
9
00:00:45,730 --> 00:00:49,510
data from an external system or some third party software.
10
00:00:50,170 --> 00:00:52,630
And as we all know, we don't live in a perfect world.
11
00:00:52,630 --> 00:00:59,560
So it might be that when you've done that, download the data that comes into Excel isn't in the best
12
00:00:59,560 --> 00:01:01,030
format for analyzing.
13
00:01:01,660 --> 00:01:05,110
It might be that you have blank cells, all blank rows everywhere.
14
00:01:05,410 --> 00:01:08,860
Maybe there's weird formatting or inconsistent case.
15
00:01:09,520 --> 00:01:13,060
Maybe they're spelling errors or duplicate values.
16
00:01:13,270 --> 00:01:18,850
All of these things need to be dealt with before you start to create things like pivot tables and pivot
17
00:01:18,850 --> 00:01:19,390
charts.
18
00:01:19,690 --> 00:01:22,090
And that is what we're going to look at in this section.
19
00:01:22,480 --> 00:01:27,250
I'm going to show you lots of different techniques when it comes to cleaning data.
20
00:01:27,490 --> 00:01:30,700
So in this first lesson, we need to get some data.
21
00:01:30,700 --> 00:01:31,420
First of all.
22
00:01:31,960 --> 00:01:36,190
And this is going to give me an opportunity to show you a couple of other techniques when it comes to
23
00:01:36,190 --> 00:01:37,690
importing data.
24
00:01:37,930 --> 00:01:43,690
So what we're going to do here is we're going to import a data set first from a folder that contains
25
00:01:43,720 --> 00:01:48,250
numerous excel files and then we're going to import a text file.
26
00:01:48,550 --> 00:01:51,520
Now for this, we need to start with a blank document.
27
00:01:51,790 --> 00:01:56,470
So I'm going to use the shortcut key of control and to create a new blank workbook.
28
00:01:56,800 --> 00:02:01,450
Now, to import data into Excel, we need to go to the data tab.
29
00:02:01,540 --> 00:02:05,050
And it's this first group here that gets and transform data.
30
00:02:05,590 --> 00:02:09,640
And this little group tends to change in every new release of Excel.
31
00:02:09,790 --> 00:02:14,680
They usually change the wording and where they have things positioned in this little area, but this
32
00:02:14,680 --> 00:02:17,800
is how it looks right now in Excel 2021.
33
00:02:17,950 --> 00:02:23,680
Now notice in here we have a few different buttons which will allow us to quickly import data from a
34
00:02:23,680 --> 00:02:29,650
text or CSV file, from the web, from another table or range of cells.
35
00:02:29,950 --> 00:02:33,310
We can go to recent sources or look at existing connections.
36
00:02:34,150 --> 00:02:41,740
We also have a get data dropdown over here, which gives us access to all these different sources to
37
00:02:41,740 --> 00:02:42,880
import data from.
38
00:02:43,420 --> 00:02:49,990
And you can see there are so many so you can do things like import data from a SharePoint list, or
39
00:02:49,990 --> 00:02:57,310
maybe you want to import something from Microsoft Exchange or even an external application like Salesforce.
40
00:02:57,550 --> 00:03:04,300
Maybe you want to import something from an access database or an Oracle database, or maybe you just
41
00:03:04,300 --> 00:03:11,140
want to import from a different file that might be an Excel file, a text file, maybe a PDF, or even
42
00:03:11,140 --> 00:03:12,040
from a folder.
43
00:03:12,700 --> 00:03:19,090
So lots of different sources that we can import data into excel from now before we get to this point.
44
00:03:19,120 --> 00:03:24,700
Let me show you the files that we're going to import, and you'll find these to download in the Course
45
00:03:24,700 --> 00:03:25,450
Files folder.
46
00:03:25,870 --> 00:03:30,100
Now, in the course files, I have a folder here called sales data.
47
00:03:30,710 --> 00:03:36,670
If I double click to open this folder, I have in here for files that are all exactly the same.
48
00:03:36,970 --> 00:03:39,910
They just contain sales data for different years.
49
00:03:40,540 --> 00:03:47,530
But if I was to open each one, they have the same format, so they all have the same column headings.
50
00:03:47,740 --> 00:03:52,930
They just contain different sales data for each of the years now because these files are pretty much
51
00:03:52,930 --> 00:03:55,430
identical in their layout.
52
00:03:55,450 --> 00:03:59,800
It means that I can import them and combine them all in one go.
53
00:04:00,460 --> 00:04:08,980
So instead of having four separate files for the 2016, 2017 and 2019 data, what I can say to excel
54
00:04:08,980 --> 00:04:15,640
on the import is import all four of these from the folder and combine them together into one big, long
55
00:04:15,640 --> 00:04:16,120
file.
56
00:04:16,360 --> 00:04:18,970
So let's go back to excel.
57
00:04:19,750 --> 00:04:22,540
Let's go to get data from file.
58
00:04:22,780 --> 00:04:27,520
And because I have these stored in a folder, I can choose to import them from the folder.
59
00:04:27,760 --> 00:04:29,700
Let's select the folder.
60
00:04:29,710 --> 00:04:31,050
I don't need to drill down into it.
61
00:04:31,060 --> 00:04:35,050
I just need to select the folder that contains the files and click on Open.
62
00:04:36,660 --> 00:04:42,570
Now, what Excel will do here is it's going to take you into power query, and you may not be all that
63
00:04:42,570 --> 00:04:45,570
familiar with power query or you might have used it briefly.
64
00:04:45,900 --> 00:04:50,910
We're not going to go too far into power query because it is more of an advanced topic, but I am going
65
00:04:50,910 --> 00:04:53,370
to show you a couple of small things that you can do here.
66
00:04:53,610 --> 00:04:58,380
So what Excel has done is it's picked up those four files and you can see them sitting here.
67
00:04:58,380 --> 00:05:01,980
So sales 2016 to sales 2019.
68
00:05:02,790 --> 00:05:07,890
And then at the bottom, I have a few different options with regards to what I can do with these files.
69
00:05:08,610 --> 00:05:10,050
Now I can choose to combine them.
70
00:05:10,050 --> 00:05:16,590
At this stage, I can simply load them straight into excel, or I can transform the data.
71
00:05:17,430 --> 00:05:24,030
Now, transforming the data will open these files, empower query and will give you access to all of
72
00:05:24,030 --> 00:05:26,700
power queries tools for cleaning data.
73
00:05:26,970 --> 00:05:30,960
Now, the focus of this section of the course isn't to learn how to clean data.
74
00:05:30,960 --> 00:05:34,950
Simple query it's to learn how to clean data in Excel.
75
00:05:35,610 --> 00:05:39,300
However, let's click Transform data so you can see what this looks like.
76
00:05:39,630 --> 00:05:42,840
So basically, I'm now in the power query editor.
77
00:05:43,350 --> 00:05:46,860
And as I said, don't worry too much about what you see on here.
78
00:05:47,190 --> 00:05:52,860
The only thing I really want to show you here is how you can use power query to combine these four files
79
00:05:52,860 --> 00:05:53,340
together.
80
00:05:54,060 --> 00:05:56,940
So currently, there's still four separate files.
81
00:05:57,330 --> 00:06:02,940
What I can do here is click these two dropdown arrows and when I hover over it, says combined files,
82
00:06:03,270 --> 00:06:06,540
it's going to give me a preview as to what the first file looks like.
83
00:06:06,930 --> 00:06:14,220
And if I click on OK, it's going to combine those files together and load the data into the power query
84
00:06:14,220 --> 00:06:14,730
window.
85
00:06:15,510 --> 00:06:21,870
So now I basically have a big long list of all of those files instead of four separate files.
86
00:06:22,110 --> 00:06:26,100
Now that I've done that, I can load it into excel and start to clean it.
87
00:06:26,880 --> 00:06:30,180
So I have a close and load button at the top here.
88
00:06:32,060 --> 00:06:38,060
And what this does is it basically creates a connection between that folder and excel.
89
00:06:38,210 --> 00:06:44,870
So if maybe at the end of the year, I then get a file called Sales 2020 in order to update this Excel
90
00:06:44,870 --> 00:06:51,650
spreadsheet, all I would need to do would be to copy the sales 2020 file to that sales data folder
91
00:06:52,070 --> 00:06:53,990
because we're connected to the folder.
92
00:06:54,470 --> 00:07:00,080
The connection will update and automatically include all the 2020 data in this file.
93
00:07:00,260 --> 00:07:03,230
So that is why this method can be pretty useful.
94
00:07:03,380 --> 00:07:08,810
Now I'm going to close down the queries and connections because once it's been imported, notice that
95
00:07:08,810 --> 00:07:11,450
it's automatically been placed in a table.
96
00:07:12,110 --> 00:07:13,020
How do I know that?
97
00:07:13,040 --> 00:07:15,650
Will I have the table design ribbon at the top?
98
00:07:15,980 --> 00:07:21,860
And I have a table name called Sales Data, which now includes all four of those files.
99
00:07:22,130 --> 00:07:25,520
Now there might be a couple of things I want to do here to tidy up this data.
100
00:07:26,300 --> 00:07:32,060
For example, I don't really need Column A, which is just basically listing the file name.
101
00:07:32,630 --> 00:07:35,180
So I'm probably going to delete out that column.
102
00:07:35,900 --> 00:07:42,740
I might want to apply some formatting to these columns a control shift down arrow and let's go to home
103
00:07:42,740 --> 00:07:44,960
and apply some currency formatting.
104
00:07:45,200 --> 00:07:48,260
And there might be some other bits and bobs that I want to do here.
105
00:07:48,560 --> 00:07:55,160
So that is one way that I can import a folder of files into excel, and you don't necessarily have to
106
00:07:55,160 --> 00:07:56,390
always import a folder.
107
00:07:56,390 --> 00:08:02,330
If you just had one excel file, you could go from file from workbook and then you can just select the
108
00:08:02,330 --> 00:08:03,650
file that you want to import.
109
00:08:04,040 --> 00:08:08,330
Now what we're actually going to do and this next file is the file that we're actually going to be cleaning
110
00:08:08,330 --> 00:08:11,450
up is we want to import a text file.
111
00:08:12,080 --> 00:08:15,500
And again, you're going to find this file in the course files folder.
112
00:08:15,680 --> 00:08:23,630
So if we go up to the data tab again in this get and transform data group, I have a from text CSP button.
113
00:08:24,200 --> 00:08:29,660
So let's click and it's this file here that we want to import sales data dot text.
114
00:08:29,930 --> 00:08:33,980
Now this is going to open a preview of this text file.
115
00:08:34,250 --> 00:08:40,250
I have the option of opening this simple query and cleaning up in that application by clicking the Transform
116
00:08:40,250 --> 00:08:41,090
data button.
117
00:08:41,420 --> 00:08:45,890
But this time, I'm simply going to load it straight into excel and clean it there.
118
00:08:46,160 --> 00:08:48,230
So let's click on the load button.
119
00:08:49,510 --> 00:08:56,470
And there we go, so once again, it's loaded this data in and it's created a table automatically out
120
00:08:56,470 --> 00:08:57,220
of this data.
121
00:08:57,610 --> 00:09:03,490
And it's this data set that we're going to take the time to clean up because notice there are a few
122
00:09:03,490 --> 00:09:07,510
different issues going on in here now to make this a bit easier to see.
123
00:09:07,540 --> 00:09:13,000
I'm going to go to table design and just turn off banded rows so we can see these issues a little bit
124
00:09:13,000 --> 00:09:13,480
clearer.
125
00:09:14,350 --> 00:09:19,810
Now I can see that in this spreadsheet, I have a couple of blank rows in here which I need to get rid
126
00:09:19,810 --> 00:09:20,140
of.
127
00:09:20,710 --> 00:09:26,080
I can also see Column B. I have some inconsistent casing issues going on.
128
00:09:26,520 --> 00:09:30,700
I also have some weird spacing at the front of these words.
129
00:09:30,910 --> 00:09:36,400
I can see that many of these columns haven't got the correct formatting applied, and there might well
130
00:09:36,400 --> 00:09:39,130
be duplicates lurking around in here as well.
131
00:09:39,880 --> 00:09:44,950
So all of these are things that we need to consider when we're cleaning our data sets.
132
00:09:45,190 --> 00:09:49,870
So we're going to start out with this dataset in the next lesson, and I'm going to show you how you
133
00:09:49,870 --> 00:09:55,840
can quickly remove all blank rows and even blank cells from your worksheets.
14132
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.