All language subtitles for 3. Demonstration Delivering Business Intelligence
Afrikaans
Akan
Albanian
Amharic
Arabic
Armenian
Azerbaijani
Basque
Belarusian
Bemba
Bengali
Bihari
Bosnian
Breton
Bulgarian
Cambodian
Catalan
Cebuano
Cherokee
Chichewa
Chinese (Simplified)
Chinese (Traditional)
Corsican
Croatian
Czech
Danish
Dutch
English
Esperanto
Estonian
Ewe
Faroese
Filipino
Finnish
French
Frisian
Ga
Galician
Georgian
German
Greek
Guarani
Gujarati
Haitian Creole
Hausa
Hawaiian
Hebrew
Hindi
Hmong
Hungarian
Icelandic
Igbo
Indonesian
Interlingua
Irish
Italian
Japanese
Javanese
Kannada
Kazakh
Kinyarwanda
Kirundi
Kongo
Korean
Krio (Sierra Leone)
Kurdish
Kurdish (SoranĂ®)
Kyrgyz
Laothian
Latin
Latvian
Lingala
Lithuanian
Lozi
Luganda
Luo
Luxembourgish
Macedonian
Malagasy
Malay
Malayalam
Maltese
Maori
Marathi
Mauritian Creole
Moldavian
Mongolian
Myanmar (Burmese)
Montenegrin
Nepali
Nigerian Pidgin
Northern Sotho
Norwegian
Norwegian (Nynorsk)
Occitan
Oriya
Oromo
Pashto
Persian
Polish
Portuguese (Brazil)
Portuguese (Portugal)
Punjabi
Quechua
Romanian
Romansh
Runyakitara
Russian
Samoan
Scots Gaelic
Serbian
Serbo-Croatian
Sesotho
Setswana
Seychellois Creole
Shona
Sindhi
Sinhalese
Slovak
Slovenian
Somali
Spanish
Spanish (Latin American)
Sundanese
Swahili
Swedish
Tajik
Tamil
Tatar
Telugu
Thai
Tigrinya
Tonga
Tshiluba
Tumbuka
Turkish
Turkmen
Twi
Uighur
Ukrainian
Urdu
Uzbek
Vietnamese
Welsh
Wolof
Xhosa
Yiddish
Yoruba
Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:05,645 --> 00:00:08,635
The objectives of this demo are to describe common data
1
00:00:08,635 --> 00:00:12,080
challenges, describe the purpose of the data warehouse, and
2
00:00:12,080 --> 00:00:14,758
to also describe the data warehouse ecosystem.
3
00:00:17,352 --> 00:00:19,346
Today, like in many organizations,
4
00:00:19,346 --> 00:00:21,710
there are numerous source systems.
5
00:00:21,710 --> 00:00:24,801
These systems have been designed to capture operational
6
00:00:24,801 --> 00:00:29,690
workloads, be them Sale systems, HR systems, finance systems
7
00:00:29,690 --> 00:00:34,210
that could also be cloud based, Data extracts, massive amount of
8
00:00:34,210 --> 00:00:37,510
extracts like Web logs that are sitting on big data stores.
9
00:00:38,710 --> 00:00:42,880
Somehow users need to connect to and access this data.
10
00:00:42,880 --> 00:00:47,350
And therein is a challenge for operational reports,
11
00:00:47,350 --> 00:00:50,400
one approach is, that you could connect direct to these systems.
12
00:00:50,400 --> 00:00:53,400
In fact, this is a supportive approach when you consider
13
00:00:53,400 --> 00:00:56,910
a Sale system and the need to raise an invoice, this is
14
00:00:56,910 --> 00:01:00,230
a report that is driven from the Operational data system.
15
00:01:00,230 --> 00:01:03,820
But let's consider the other requirements of our users.
16
00:01:03,820 --> 00:01:07,000
Analytics, the need to aggregate,
17
00:01:07,000 --> 00:01:11,175
summarize, drill through this type of activity on data,
18
00:01:11,175 --> 00:01:14,660
might seems simple from an interface perspective but
19
00:01:14,660 --> 00:01:17,630
is quite demanding across the systems.
20
00:01:17,630 --> 00:01:18,430
For example,
21
00:01:18,430 --> 00:01:22,560
looking at a pivot table that it shows me employees on the rows,
22
00:01:22,560 --> 00:01:25,740
the months of the year on columns and at the intersection,
23
00:01:25,740 --> 00:01:28,922
I see the sum of sales sold by each employee by month.
24
00:01:28,922 --> 00:01:33,080
That looks simple but what may not be clear to you
25
00:01:33,080 --> 00:01:35,780
is that the underlying Source System is storing
26
00:01:35,780 --> 00:01:39,470
billions of rows of data that needed to be retrieved,
27
00:01:39,470 --> 00:01:43,230
filtered, grouped, aggregated simply to produce that result.
28
00:01:43,230 --> 00:01:45,830
That could be intensive and especially for
29
00:01:45,830 --> 00:01:48,490
a Source System that has been optimized for
30
00:01:48,490 --> 00:01:52,530
write intensive activities, not read intensive.
31
00:01:52,530 --> 00:01:55,160
Dashboards, Alerting systems,
32
00:01:55,160 --> 00:01:59,090
Scorecards that compare goals to actuals.
33
00:01:59,090 --> 00:02:02,560
These are all requirements that are not well driven through
34
00:02:02,560 --> 00:02:05,930
Operational Reporting direct from Source Systems.
35
00:02:05,930 --> 00:02:08,430
So, what could the solution be?
36
00:02:08,430 --> 00:02:13,230
Well, one approach might be to reduce contention, is that you
37
00:02:13,230 --> 00:02:16,510
might replicate these systems and that could be achieved, for
38
00:02:16,510 --> 00:02:19,610
example with SQL server, through replication or
39
00:02:19,610 --> 00:02:24,270
through database mirroring, both high availability strategy and
40
00:02:24,270 --> 00:02:27,870
these mirrors could also be read only replicas,
41
00:02:27,870 --> 00:02:30,180
allowing us to perform reporting.
42
00:02:30,180 --> 00:02:34,180
However, while this might reduce contention, the structures and
43
00:02:34,180 --> 00:02:37,085
typically of relational databases with operational
44
00:02:37,085 --> 00:02:41,020
workloads to insert data are efficiently are highly
45
00:02:41,020 --> 00:02:44,040
normalized, and this doesn't work so well for the analytic
46
00:02:44,040 --> 00:02:46,680
requirements that we might want to drive from this.
47
00:02:47,770 --> 00:02:50,950
So what we could consider is introducing this
48
00:02:50,950 --> 00:02:52,660
data storage and aggregation.
49
00:02:52,660 --> 00:02:54,900
A very generic box here.
50
00:02:54,900 --> 00:02:57,590
And let's take a look and build up some scenarios
51
00:02:57,590 --> 00:03:01,110
of how we could implement more effective User Access and
52
00:03:01,110 --> 00:03:05,470
Business Intelligence driven from optimized data stores.
53
00:03:05,470 --> 00:03:08,960
Here's an example of details in an Operational Data Store.
54
00:03:08,960 --> 00:03:11,530
This is referred to as an ODS.
55
00:03:11,530 --> 00:03:14,810
And where there's a need for Operational reporting and real
56
00:03:14,810 --> 00:03:19,210
time up to date data without impacting on the Source Systems,
57
00:03:19,210 --> 00:03:21,370
and what they've been designed to do.
58
00:03:21,370 --> 00:03:25,110
An ODS could on a very frequent basis, collect and
59
00:03:25,110 --> 00:03:27,590
integrate data from Operational systems and
60
00:03:27,590 --> 00:03:29,490
support Operational reporting.
61
00:03:31,770 --> 00:03:34,675
The focus however on this course is more
62
00:03:34,675 --> 00:03:37,475
about the delivery of the Enterprise Data Warehouse.
63
00:03:37,475 --> 00:03:39,685
And we're gonna talk in this course about different
64
00:03:39,685 --> 00:03:42,575
architectures, and we see one here which is the Enterprise
65
00:03:42,575 --> 00:03:46,720
Data Warehouse consisting all the series of Data Marts.
66
00:03:46,720 --> 00:03:49,850
And now, I'll describe these as subject specific store like
67
00:03:49,850 --> 00:03:52,930
Sales, Operations, HR and Finance.
68
00:03:52,930 --> 00:03:57,220
And these still relational sources are optimized for
69
00:03:57,220 --> 00:03:58,640
analytic work loads.
70
00:03:58,640 --> 00:04:02,100
They're optimized for read intensive operations.
71
00:04:02,100 --> 00:04:03,650
So, the question then is,
72
00:04:03,650 --> 00:04:06,520
well, how do we get data from the Source Systems
73
00:04:06,520 --> 00:04:09,930
loaded into these and a Enterprise Data Warehouse
74
00:04:09,930 --> 00:04:12,951
structures to support User Access?
75
00:04:12,951 --> 00:04:15,790
Well, the first discussion will be commonly
76
00:04:15,790 --> 00:04:19,447
with large implementation is to introduce a staging area.
77
00:04:19,447 --> 00:04:23,360
And the staging area in relational format is a place
78
00:04:23,360 --> 00:04:25,670
to land Operational Data.
79
00:04:25,670 --> 00:04:29,030
Often in our design approach, we wanna get in and out as quickly
80
00:04:29,030 --> 00:04:32,986
as possible minimizing the impact on the Source Systems.
81
00:04:32,986 --> 00:04:36,440
So with read only access perhaps the logic is something like
82
00:04:36,440 --> 00:04:41,750
this, retrieve all Sales Data, since the last
83
00:04:41,750 --> 00:04:46,220
time you retrieved it until now, i.e.the last 24 hours.
84
00:04:46,220 --> 00:04:50,470
So truncate the staging, load in the incremental transactions
85
00:04:50,470 --> 00:04:53,970
that have taken place since the last ETL process and
86
00:04:53,970 --> 00:04:58,490
then the staging system supports interrogation,
87
00:04:58,490 --> 00:05:02,660
transformation, cleansing it also supports restartability,
88
00:05:02,660 --> 00:05:05,920
if there was a need to redo detail process.
89
00:05:05,920 --> 00:05:09,260
Now, supporting the staging system could be a Master Data
90
00:05:09,260 --> 00:05:12,930
System, where there's an identified need to maintain for
91
00:05:12,930 --> 00:05:17,460
consistency purposes certain business entities, for example,
92
00:05:17,460 --> 00:05:19,820
products, geography.
93
00:05:19,820 --> 00:05:26,400
We might want to maintain golden records of data that are not
94
00:05:26,400 --> 00:05:29,820
possible to maintain directly in our Operational system maybe
95
00:05:29,820 --> 00:05:33,500
because it doesn't support it, there's no interface, or perhaps
96
00:05:33,500 --> 00:05:37,250
products are actually defined in multiple systems, and so we have
97
00:05:37,250 --> 00:05:41,740
no single place to take as an authoritative store of data.
98
00:05:41,740 --> 00:05:45,750
So, Master Data System can solve many of those challenges.
99
00:05:45,750 --> 00:05:49,130
The next consideration is in relation to garbage in,
100
00:05:49,130 --> 00:05:50,780
garbage out.
101
00:05:50,780 --> 00:05:54,560
If you collect garbage data or sourced garbage data from your
102
00:05:54,560 --> 00:05:57,330
systems and load this into the Data Warehouse
103
00:05:57,330 --> 00:06:00,320
you cannot expect quality decisions to be made, so
104
00:06:00,320 --> 00:06:03,472
there may be data quality cleansing systems.
105
00:06:03,472 --> 00:06:05,960
Knowledge basis on how to correct and
106
00:06:05,960 --> 00:06:10,140
standardize or even repair or deduplicate data.
107
00:06:10,140 --> 00:06:12,185
So collectively, Staging Systems,
108
00:06:12,185 --> 00:06:15,250
Master Data Reference Systems and Data Quality Systems
109
00:06:15,250 --> 00:06:20,200
can be used to drive a periodic ETL process, that according to
110
00:06:20,200 --> 00:06:24,340
the business rules that define good quality data and
111
00:06:24,340 --> 00:06:29,400
ETL process extract, transform, and load can periodically
112
00:06:29,400 --> 00:06:33,050
load from this systems and load into the Data Warehouse.
113
00:06:33,050 --> 00:06:35,570
Once loaded into the Data Warehouse it is clean,
114
00:06:35,570 --> 00:06:39,620
consistent, credible current data that is available for
115
00:06:39,620 --> 00:06:40,850
production reporting.
116
00:06:43,010 --> 00:06:47,120
Now this data Marts are still relational data bases.
117
00:06:47,120 --> 00:06:49,870
And we just mentioned that as Source System,
118
00:06:49,870 --> 00:06:53,240
they're not the most efficient source to retrieve from.
119
00:06:53,240 --> 00:06:57,050
This is because they often designed in third normal form
120
00:06:57,050 --> 00:06:59,398
and that is a design optimization for
121
00:06:59,398 --> 00:07:01,970
write intensive operations.
122
00:07:01,970 --> 00:07:05,120
As relational databases, but with an analytic workload.
123
00:07:05,120 --> 00:07:07,920
We still design in terms of tables, columns, and
124
00:07:07,920 --> 00:07:10,890
relationships, but we use different and
125
00:07:10,890 --> 00:07:15,360
mature methodologies that support the analytic workloads.
126
00:07:15,360 --> 00:07:17,440
Dimensional modeling is the topic here, and
127
00:07:17,440 --> 00:07:20,460
you may well be familiar with fact tables, dimension tables.
128
00:07:21,600 --> 00:07:25,230
Now, relational systems, even when they're designed optimally
129
00:07:25,230 --> 00:07:28,070
in this fashion, they're still inherently slow.
130
00:07:28,070 --> 00:07:32,350
And so, what you will find in a Data Warehouse are data models
131
00:07:32,350 --> 00:07:34,840
like the Sales Model and the Finance Model here.
132
00:07:35,890 --> 00:07:38,360
These can be referred to as cubes, or
133
00:07:38,360 --> 00:07:42,440
BI semantic models, data models whatever you name them, they're
134
00:07:42,440 --> 00:07:46,920
essentially a very convenient access point for your end users.
135
00:07:46,920 --> 00:07:49,380
Your end users, granted permission,
136
00:07:49,380 --> 00:07:51,150
can connected to these models.
137
00:07:51,150 --> 00:07:53,580
They can work with high performance queries and
138
00:07:53,580 --> 00:07:55,270
analytic query workloads.
139
00:07:55,270 --> 00:07:59,050
This is achieved often because these data models may cache and
140
00:07:59,050 --> 00:08:03,030
place in memory or vision structure on disk in the memory,
141
00:08:03,030 --> 00:08:05,970
and what is enable is very high performance slicing and
142
00:08:05,970 --> 00:08:09,540
dicing very natural to answer the type of analytic
143
00:08:09,540 --> 00:08:11,880
questions that business typically has.
144
00:08:11,880 --> 00:08:15,400
Now, these data models are also a great place to encapsulate
145
00:08:15,400 --> 00:08:19,890
business logic even difficult calculations, time manipulation
146
00:08:19,890 --> 00:08:23,990
can be encapsulated far easier here and can be achieved with
147
00:08:23,990 --> 00:08:26,470
the logic available in relational querying.
148
00:08:27,470 --> 00:08:29,880
In addition, there are other great things,
149
00:08:29,880 --> 00:08:34,880
like translations for different languages, actions to support
150
00:08:34,880 --> 00:08:38,716
moving from the data to other experiences like reports or
151
00:08:38,716 --> 00:08:40,790
drill through data sets.
152
00:08:40,790 --> 00:08:44,030
And lastly, there's the concept of security.
153
00:08:44,030 --> 00:08:45,590
When you have different permission sets for
154
00:08:45,590 --> 00:08:47,210
different audiences,
155
00:08:47,210 --> 00:08:50,750
it's quite difficult in a relational system to apply this.
156
00:08:50,750 --> 00:08:55,440
And yet, data models have roles and ways to define permission
157
00:08:55,440 --> 00:08:59,150
sets that can be quite complex right to a granular level that
158
00:08:59,150 --> 00:09:00,970
different people can see different data.
159
00:09:02,030 --> 00:09:03,573
Now, also at this level,
160
00:09:03,573 --> 00:09:06,920
you see on the presentation the Churn Analysis Model.
161
00:09:06,920 --> 00:09:09,440
This is, in fact, a machine learning or even a data mining
162
00:09:09,440 --> 00:09:12,640
model that has been processed against the Data Marts
163
00:09:12,640 --> 00:09:15,620
looking for patterns of interest, in this case, it might
164
00:09:15,620 --> 00:09:20,030
be looking for characteristics of customers that leave you.
165
00:09:20,030 --> 00:09:22,970
And if you can identify these, that's useful.
166
00:09:22,970 --> 00:09:25,990
So they provide exploration, and beyond that,
167
00:09:25,990 --> 00:09:28,750
we do trust the patterns that they have surfaced.
168
00:09:28,750 --> 00:09:31,280
They can be deployed as predictive models.
169
00:09:31,280 --> 00:09:33,670
And they can be used in reporting and analytics or
170
00:09:33,670 --> 00:09:35,830
to drive other business functionality.
171
00:09:37,990 --> 00:09:40,730
The last build then of this slide is to introduce from
172
00:09:40,730 --> 00:09:44,460
a User Access perspective, the need for Self Service BI.
173
00:09:45,520 --> 00:09:48,260
We should never recognize that an Enterprise Data Warehouse
174
00:09:48,260 --> 00:09:51,865
will deliver 100% of the business requirements.
175
00:09:51,865 --> 00:09:54,935
I might strive somewhere between 80 and 90%.
176
00:09:54,935 --> 00:09:58,125
Yet as the business evolves, you'll find that
177
00:09:58,125 --> 00:10:00,505
the Data Warehouse is a major undertaking, and
178
00:10:00,505 --> 00:10:03,335
it is not so agile to simply adapt
179
00:10:03,335 --> 00:10:06,960
quickly to the new questions that the business might raise.
180
00:10:06,960 --> 00:10:09,660
So, Self Service Business Intelligence is a way to fill
181
00:10:09,660 --> 00:10:13,830
that gap by empowering the right people in the organization with
182
00:10:13,830 --> 00:10:16,570
tools and access to data and training.
183
00:10:16,570 --> 00:10:19,440
They can connect to the Data Warehouse resources,
184
00:10:19,440 --> 00:10:23,220
even the data marts directly or even the models.
185
00:10:23,220 --> 00:10:28,110
And they may construct new models by extending or
186
00:10:28,110 --> 00:10:31,820
adding new logic beyond what the Data Warehouse delivers.
187
00:10:31,820 --> 00:10:35,210
This is a valid form of BI, and we should see it as a mutual
188
00:10:35,210 --> 00:10:39,340
benefit to the organization by extending and working with
189
00:10:39,340 --> 00:10:43,680
the resources already deployed in an Enterprise Data Warehouse.
190
00:10:43,680 --> 00:10:46,329
This then end to end describes the business case for
191
00:10:46,329 --> 00:10:48,926
the Data Warehouse in the organization in today.
16220
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.