Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:02,000
Narrator: All right, I've got another quick demo for you.
2
00:00:02,000 --> 00:00:06,000
And this time we're going to be using the Web connector
3
00:00:06,000 --> 00:00:08,000
to scrape data from a Wikipedia page
4
00:00:08,000 --> 00:00:11,000
and then import it into Power Query as a table.
5
00:00:11,000 --> 00:00:13,000
Now, the really cool thing about this Web connector
6
00:00:13,000 --> 00:00:17,000
is that it allows you to connect to web-hosted files,
7
00:00:17,000 --> 00:00:21,000
or we'll scan a webpage, identify any structured tables,
8
00:00:21,000 --> 00:00:24,000
and then provide a list of the tables found
9
00:00:24,000 --> 00:00:25,000
and show you a preview
10
00:00:25,000 --> 00:00:27,000
before loading it into the Query editor.
11
00:00:27,000 --> 00:00:31,000
Now, it's super simple to use and really powerful.
12
00:00:31,000 --> 00:00:32,000
Let's check it out.
13
00:00:32,000 --> 00:00:34,000
All right, so for this demo,
14
00:00:34,000 --> 00:00:36,000
we're going to be using the Wikipedia URL
15
00:00:36,000 --> 00:00:39,000
that was added at the bottom left of that slide.
16
00:00:39,000 --> 00:00:42,000
So feel free to pause the video, rewind a bit,
17
00:00:42,000 --> 00:00:45,000
and then copy the URL so you can follow along.
18
00:00:45,000 --> 00:00:47,000
All right, so what we're going to do here
19
00:00:47,000 --> 00:00:49,000
is head up to a new source,
20
00:00:49,000 --> 00:00:52,000
and Web is actually one of the common connectors here.
21
00:00:52,000 --> 00:00:54,000
So we can just click Web.
22
00:00:55,000 --> 00:00:57,000
So from here, it's as simple
23
00:00:57,000 --> 00:01:01,000
as copying and pasting that URL.
24
00:01:01,000 --> 00:01:02,000
If we were to click Advanced Options,
25
00:01:02,000 --> 00:01:04,000
there's some other pieces here
26
00:01:04,000 --> 00:01:07,000
where we can add different URL parts,
27
00:01:07,000 --> 00:01:09,000
again, kind of command timeouts,
28
00:01:09,000 --> 00:01:12,000
different HTTP requests, and parameters and stuff.
29
00:01:12,000 --> 00:01:15,000
So we're not going to mess around with any of that.
30
00:01:15,000 --> 00:01:17,000
We'll stick to Basic. We'll click OK.
31
00:01:21,000 --> 00:01:23,000
All right, so what Power BI is doing here
32
00:01:23,000 --> 00:01:26,000
is it's going through and connecting to that webpage,
33
00:01:26,000 --> 00:01:28,000
and then it's going to load this very familiar
34
00:01:28,000 --> 00:01:30,000
data preview window, right?
35
00:01:30,000 --> 00:01:34,000
Kind of similar to the MySQL example that we saw.
36
00:01:34,000 --> 00:01:36,000
We have all of the kind of different elements
37
00:01:36,000 --> 00:01:39,000
or tables on the left hand side.
38
00:01:39,000 --> 00:01:42,000
All right, and if we click one of these,
39
00:01:42,000 --> 00:01:45,000
we're going to get a Preview window here.
40
00:01:45,000 --> 00:01:47,000
The other view that you can see here,
41
00:01:47,000 --> 00:01:48,000
that's actually really cool,
42
00:01:48,000 --> 00:01:50,000
is you can actually click on this Web View,
43
00:01:50,000 --> 00:01:53,000
and it actually shows you what the webpage looks like.
44
00:01:54,000 --> 00:01:56,000
All right, so you can scroll through and, you know,
45
00:01:56,000 --> 00:01:58,000
let's say we want to import this table
46
00:01:58,000 --> 00:02:00,000
with rank, firm, company, country.
47
00:02:02,000 --> 00:02:05,000
All right, we can see that this largest company's table
48
00:02:05,000 --> 00:02:07,000
that we've selected, rank, firm, company, country,
49
00:02:07,000 --> 00:02:09,000
is that table.
50
00:02:09,000 --> 00:02:10,000
All right, if we started clicking
51
00:02:10,000 --> 00:02:12,000
through some of these other tables, you know,
52
00:02:12,000 --> 00:02:16,000
we're basically exploring the different table elements
53
00:02:16,000 --> 00:02:18,000
that have been discovered within that page.
54
00:02:18,000 --> 00:02:21,000
So if any of these other ones make sense to connect to,
55
00:02:21,000 --> 00:02:23,000
you certainly could.
56
00:02:23,000 --> 00:02:24,000
For the sake of this example,
57
00:02:24,000 --> 00:02:26,000
we're going to connect to this table.
58
00:02:31,000 --> 00:02:34,000
All right, so now that the data is in the Query editor,
59
00:02:34,000 --> 00:02:36,000
we can go through and follow the same process
60
00:02:36,000 --> 00:02:38,000
that we've been using for our other tables.
61
00:02:38,000 --> 00:02:40,000
All right, we'll go through and we can check
62
00:02:40,000 --> 00:02:45,000
that our data types and our column headers are appropriate.
63
00:02:45,000 --> 00:02:47,000
All right, so we've got a whole number here.
64
00:02:47,000 --> 00:02:51,000
Text makes sense for firm and company, and also for country.
65
00:02:51,000 --> 00:02:53,000
And then for this, AUM column,
66
00:02:53,000 --> 00:02:55,000
right, assets under management,
67
00:02:55,000 --> 00:02:57,000
this is actually in billions of dollars.
68
00:02:57,000 --> 00:02:59,000
So we could update this to currency
69
00:02:59,000 --> 00:03:01,000
or leave this as a whole number here.
70
00:03:01,000 --> 00:03:03,000
And there are some other transformation steps
71
00:03:03,000 --> 00:03:07,000
that you could apply here to show the actual value here
72
00:03:07,000 --> 00:03:08,000
and not have it shortened.
73
00:03:08,000 --> 00:03:11,000
So again, everything looks good there.
74
00:03:11,000 --> 00:03:13,000
You know, we could update this name
75
00:03:13,000 --> 00:03:18,000
to Largest Asset Management Firms,
76
00:03:22,000 --> 00:03:24,000
and then we'd be good to go.
77
00:03:24,000 --> 00:03:27,000
All right, so that is how you can use the Web connector
78
00:03:27,000 --> 00:03:30,000
to scrape data from a website
79
00:03:30,000 --> 00:03:33,000
and import any sort of tables that are visible
80
00:03:33,000 --> 00:03:34,000
within that page.
81
00:03:34,000 --> 00:03:36,000
The last step that we'll do here
82
00:03:36,000 --> 00:03:39,000
is similar to our Fuzzy Factory data.
83
00:03:39,000 --> 00:03:42,000
We're going to disable the load.
84
00:03:42,000 --> 00:03:44,000
And finally, we'll save our work.
85
00:03:50,000 --> 00:03:53,000
All right, up next we're going to dig into
86
00:03:53,000 --> 00:03:55,000
some of the data QA and profiling tools
87
00:03:55,000 --> 00:03:56,000
within the Query editor.
6867
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.