All language subtitles for 3. Demonstration Delivering Business Intelligence

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:05,645 --> 00:00:08,635 The objectives of this demo are to describe common data 1 00:00:08,635 --> 00:00:12,080 challenges, describe the purpose of the data warehouse, and 2 00:00:12,080 --> 00:00:14,758 to also describe the data warehouse ecosystem. 3 00:00:17,352 --> 00:00:19,346 Today, like in many organizations, 4 00:00:19,346 --> 00:00:21,710 there are numerous source systems. 5 00:00:21,710 --> 00:00:24,801 These systems have been designed to capture operational 6 00:00:24,801 --> 00:00:29,690 workloads, be them Sale systems, HR systems, finance systems 7 00:00:29,690 --> 00:00:34,210 that could also be cloud based, Data extracts, massive amount of 8 00:00:34,210 --> 00:00:37,510 extracts like Web logs that are sitting on big data stores. 9 00:00:38,710 --> 00:00:42,880 Somehow users need to connect to and access this data. 10 00:00:42,880 --> 00:00:47,350 And therein is a challenge for operational reports, 11 00:00:47,350 --> 00:00:50,400 one approach is, that you could connect direct to these systems. 12 00:00:50,400 --> 00:00:53,400 In fact, this is a supportive approach when you consider 13 00:00:53,400 --> 00:00:56,910 a Sale system and the need to raise an invoice, this is 14 00:00:56,910 --> 00:01:00,230 a report that is driven from the Operational data system. 15 00:01:00,230 --> 00:01:03,820 But let's consider the other requirements of our users. 16 00:01:03,820 --> 00:01:07,000 Analytics, the need to aggregate, 17 00:01:07,000 --> 00:01:11,175 summarize, drill through this type of activity on data, 18 00:01:11,175 --> 00:01:14,660 might seems simple from an interface perspective but 19 00:01:14,660 --> 00:01:17,630 is quite demanding across the systems. 20 00:01:17,630 --> 00:01:18,430 For example, 21 00:01:18,430 --> 00:01:22,560 looking at a pivot table that it shows me employees on the rows, 22 00:01:22,560 --> 00:01:25,740 the months of the year on columns and at the intersection, 23 00:01:25,740 --> 00:01:28,922 I see the sum of sales sold by each employee by month. 24 00:01:28,922 --> 00:01:33,080 That looks simple but what may not be clear to you 25 00:01:33,080 --> 00:01:35,780 is that the underlying Source System is storing 26 00:01:35,780 --> 00:01:39,470 billions of rows of data that needed to be retrieved, 27 00:01:39,470 --> 00:01:43,230 filtered, grouped, aggregated simply to produce that result. 28 00:01:43,230 --> 00:01:45,830 That could be intensive and especially for 29 00:01:45,830 --> 00:01:48,490 a Source System that has been optimized for 30 00:01:48,490 --> 00:01:52,530 write intensive activities, not read intensive. 31 00:01:52,530 --> 00:01:55,160 Dashboards, Alerting systems, 32 00:01:55,160 --> 00:01:59,090 Scorecards that compare goals to actuals. 33 00:01:59,090 --> 00:02:02,560 These are all requirements that are not well driven through 34 00:02:02,560 --> 00:02:05,930 Operational Reporting direct from Source Systems. 35 00:02:05,930 --> 00:02:08,430 So, what could the solution be? 36 00:02:08,430 --> 00:02:13,230 Well, one approach might be to reduce contention, is that you 37 00:02:13,230 --> 00:02:16,510 might replicate these systems and that could be achieved, for 38 00:02:16,510 --> 00:02:19,610 example with SQL server, through replication or 39 00:02:19,610 --> 00:02:24,270 through database mirroring, both high availability strategy and 40 00:02:24,270 --> 00:02:27,870 these mirrors could also be read only replicas, 41 00:02:27,870 --> 00:02:30,180 allowing us to perform reporting. 42 00:02:30,180 --> 00:02:34,180 However, while this might reduce contention, the structures and 43 00:02:34,180 --> 00:02:37,085 typically of relational databases with operational 44 00:02:37,085 --> 00:02:41,020 workloads to insert data are efficiently are highly 45 00:02:41,020 --> 00:02:44,040 normalized, and this doesn't work so well for the analytic 46 00:02:44,040 --> 00:02:46,680 requirements that we might want to drive from this. 47 00:02:47,770 --> 00:02:50,950 So what we could consider is introducing this 48 00:02:50,950 --> 00:02:52,660 data storage and aggregation. 49 00:02:52,660 --> 00:02:54,900 A very generic box here. 50 00:02:54,900 --> 00:02:57,590 And let's take a look and build up some scenarios 51 00:02:57,590 --> 00:03:01,110 of how we could implement more effective User Access and 52 00:03:01,110 --> 00:03:05,470 Business Intelligence driven from optimized data stores. 53 00:03:05,470 --> 00:03:08,960 Here's an example of details in an Operational Data Store. 54 00:03:08,960 --> 00:03:11,530 This is referred to as an ODS. 55 00:03:11,530 --> 00:03:14,810 And where there's a need for Operational reporting and real 56 00:03:14,810 --> 00:03:19,210 time up to date data without impacting on the Source Systems, 57 00:03:19,210 --> 00:03:21,370 and what they've been designed to do. 58 00:03:21,370 --> 00:03:25,110 An ODS could on a very frequent basis, collect and 59 00:03:25,110 --> 00:03:27,590 integrate data from Operational systems and 60 00:03:27,590 --> 00:03:29,490 support Operational reporting. 61 00:03:31,770 --> 00:03:34,675 The focus however on this course is more 62 00:03:34,675 --> 00:03:37,475 about the delivery of the Enterprise Data Warehouse. 63 00:03:37,475 --> 00:03:39,685 And we're gonna talk in this course about different 64 00:03:39,685 --> 00:03:42,575 architectures, and we see one here which is the Enterprise 65 00:03:42,575 --> 00:03:46,720 Data Warehouse consisting all the series of Data Marts. 66 00:03:46,720 --> 00:03:49,850 And now, I'll describe these as subject specific store like 67 00:03:49,850 --> 00:03:52,930 Sales, Operations, HR and Finance. 68 00:03:52,930 --> 00:03:57,220 And these still relational sources are optimized for 69 00:03:57,220 --> 00:03:58,640 analytic work loads. 70 00:03:58,640 --> 00:04:02,100 They're optimized for read intensive operations. 71 00:04:02,100 --> 00:04:03,650 So, the question then is, 72 00:04:03,650 --> 00:04:06,520 well, how do we get data from the Source Systems 73 00:04:06,520 --> 00:04:09,930 loaded into these and a Enterprise Data Warehouse 74 00:04:09,930 --> 00:04:12,951 structures to support User Access? 75 00:04:12,951 --> 00:04:15,790 Well, the first discussion will be commonly 76 00:04:15,790 --> 00:04:19,447 with large implementation is to introduce a staging area. 77 00:04:19,447 --> 00:04:23,360 And the staging area in relational format is a place 78 00:04:23,360 --> 00:04:25,670 to land Operational Data. 79 00:04:25,670 --> 00:04:29,030 Often in our design approach, we wanna get in and out as quickly 80 00:04:29,030 --> 00:04:32,986 as possible minimizing the impact on the Source Systems. 81 00:04:32,986 --> 00:04:36,440 So with read only access perhaps the logic is something like 82 00:04:36,440 --> 00:04:41,750 this, retrieve all Sales Data, since the last 83 00:04:41,750 --> 00:04:46,220 time you retrieved it until now, i.e.the last 24 hours. 84 00:04:46,220 --> 00:04:50,470 So truncate the staging, load in the incremental transactions 85 00:04:50,470 --> 00:04:53,970 that have taken place since the last ETL process and 86 00:04:53,970 --> 00:04:58,490 then the staging system supports interrogation, 87 00:04:58,490 --> 00:05:02,660 transformation, cleansing it also supports restartability, 88 00:05:02,660 --> 00:05:05,920 if there was a need to redo detail process. 89 00:05:05,920 --> 00:05:09,260 Now, supporting the staging system could be a Master Data 90 00:05:09,260 --> 00:05:12,930 System, where there's an identified need to maintain for 91 00:05:12,930 --> 00:05:17,460 consistency purposes certain business entities, for example, 92 00:05:17,460 --> 00:05:19,820 products, geography. 93 00:05:19,820 --> 00:05:26,400 We might want to maintain golden records of data that are not 94 00:05:26,400 --> 00:05:29,820 possible to maintain directly in our Operational system maybe 95 00:05:29,820 --> 00:05:33,500 because it doesn't support it, there's no interface, or perhaps 96 00:05:33,500 --> 00:05:37,250 products are actually defined in multiple systems, and so we have 97 00:05:37,250 --> 00:05:41,740 no single place to take as an authoritative store of data. 98 00:05:41,740 --> 00:05:45,750 So, Master Data System can solve many of those challenges. 99 00:05:45,750 --> 00:05:49,130 The next consideration is in relation to garbage in, 100 00:05:49,130 --> 00:05:50,780 garbage out. 101 00:05:50,780 --> 00:05:54,560 If you collect garbage data or sourced garbage data from your 102 00:05:54,560 --> 00:05:57,330 systems and load this into the Data Warehouse 103 00:05:57,330 --> 00:06:00,320 you cannot expect quality decisions to be made, so 104 00:06:00,320 --> 00:06:03,472 there may be data quality cleansing systems. 105 00:06:03,472 --> 00:06:05,960 Knowledge basis on how to correct and 106 00:06:05,960 --> 00:06:10,140 standardize or even repair or deduplicate data. 107 00:06:10,140 --> 00:06:12,185 So collectively, Staging Systems, 108 00:06:12,185 --> 00:06:15,250 Master Data Reference Systems and Data Quality Systems 109 00:06:15,250 --> 00:06:20,200 can be used to drive a periodic ETL process, that according to 110 00:06:20,200 --> 00:06:24,340 the business rules that define good quality data and 111 00:06:24,340 --> 00:06:29,400 ETL process extract, transform, and load can periodically 112 00:06:29,400 --> 00:06:33,050 load from this systems and load into the Data Warehouse. 113 00:06:33,050 --> 00:06:35,570 Once loaded into the Data Warehouse it is clean, 114 00:06:35,570 --> 00:06:39,620 consistent, credible current data that is available for 115 00:06:39,620 --> 00:06:40,850 production reporting. 116 00:06:43,010 --> 00:06:47,120 Now this data Marts are still relational data bases. 117 00:06:47,120 --> 00:06:49,870 And we just mentioned that as Source System, 118 00:06:49,870 --> 00:06:53,240 they're not the most efficient source to retrieve from. 119 00:06:53,240 --> 00:06:57,050 This is because they often designed in third normal form 120 00:06:57,050 --> 00:06:59,398 and that is a design optimization for 121 00:06:59,398 --> 00:07:01,970 write intensive operations. 122 00:07:01,970 --> 00:07:05,120 As relational databases, but with an analytic workload. 123 00:07:05,120 --> 00:07:07,920 We still design in terms of tables, columns, and 124 00:07:07,920 --> 00:07:10,890 relationships, but we use different and 125 00:07:10,890 --> 00:07:15,360 mature methodologies that support the analytic workloads. 126 00:07:15,360 --> 00:07:17,440 Dimensional modeling is the topic here, and 127 00:07:17,440 --> 00:07:20,460 you may well be familiar with fact tables, dimension tables. 128 00:07:21,600 --> 00:07:25,230 Now, relational systems, even when they're designed optimally 129 00:07:25,230 --> 00:07:28,070 in this fashion, they're still inherently slow. 130 00:07:28,070 --> 00:07:32,350 And so, what you will find in a Data Warehouse are data models 131 00:07:32,350 --> 00:07:34,840 like the Sales Model and the Finance Model here. 132 00:07:35,890 --> 00:07:38,360 These can be referred to as cubes, or 133 00:07:38,360 --> 00:07:42,440 BI semantic models, data models whatever you name them, they're 134 00:07:42,440 --> 00:07:46,920 essentially a very convenient access point for your end users. 135 00:07:46,920 --> 00:07:49,380 Your end users, granted permission, 136 00:07:49,380 --> 00:07:51,150 can connected to these models. 137 00:07:51,150 --> 00:07:53,580 They can work with high performance queries and 138 00:07:53,580 --> 00:07:55,270 analytic query workloads. 139 00:07:55,270 --> 00:07:59,050 This is achieved often because these data models may cache and 140 00:07:59,050 --> 00:08:03,030 place in memory or vision structure on disk in the memory, 141 00:08:03,030 --> 00:08:05,970 and what is enable is very high performance slicing and 142 00:08:05,970 --> 00:08:09,540 dicing very natural to answer the type of analytic 143 00:08:09,540 --> 00:08:11,880 questions that business typically has. 144 00:08:11,880 --> 00:08:15,400 Now, these data models are also a great place to encapsulate 145 00:08:15,400 --> 00:08:19,890 business logic even difficult calculations, time manipulation 146 00:08:19,890 --> 00:08:23,990 can be encapsulated far easier here and can be achieved with 147 00:08:23,990 --> 00:08:26,470 the logic available in relational querying. 148 00:08:27,470 --> 00:08:29,880 In addition, there are other great things, 149 00:08:29,880 --> 00:08:34,880 like translations for different languages, actions to support 150 00:08:34,880 --> 00:08:38,716 moving from the data to other experiences like reports or 151 00:08:38,716 --> 00:08:40,790 drill through data sets. 152 00:08:40,790 --> 00:08:44,030 And lastly, there's the concept of security. 153 00:08:44,030 --> 00:08:45,590 When you have different permission sets for 154 00:08:45,590 --> 00:08:47,210 different audiences, 155 00:08:47,210 --> 00:08:50,750 it's quite difficult in a relational system to apply this. 156 00:08:50,750 --> 00:08:55,440 And yet, data models have roles and ways to define permission 157 00:08:55,440 --> 00:08:59,150 sets that can be quite complex right to a granular level that 158 00:08:59,150 --> 00:09:00,970 different people can see different data. 159 00:09:02,030 --> 00:09:03,573 Now, also at this level, 160 00:09:03,573 --> 00:09:06,920 you see on the presentation the Churn Analysis Model. 161 00:09:06,920 --> 00:09:09,440 This is, in fact, a machine learning or even a data mining 162 00:09:09,440 --> 00:09:12,640 model that has been processed against the Data Marts 163 00:09:12,640 --> 00:09:15,620 looking for patterns of interest, in this case, it might 164 00:09:15,620 --> 00:09:20,030 be looking for characteristics of customers that leave you. 165 00:09:20,030 --> 00:09:22,970 And if you can identify these, that's useful. 166 00:09:22,970 --> 00:09:25,990 So they provide exploration, and beyond that, 167 00:09:25,990 --> 00:09:28,750 we do trust the patterns that they have surfaced. 168 00:09:28,750 --> 00:09:31,280 They can be deployed as predictive models. 169 00:09:31,280 --> 00:09:33,670 And they can be used in reporting and analytics or 170 00:09:33,670 --> 00:09:35,830 to drive other business functionality. 171 00:09:37,990 --> 00:09:40,730 The last build then of this slide is to introduce from 172 00:09:40,730 --> 00:09:44,460 a User Access perspective, the need for Self Service BI. 173 00:09:45,520 --> 00:09:48,260 We should never recognize that an Enterprise Data Warehouse 174 00:09:48,260 --> 00:09:51,865 will deliver 100% of the business requirements. 175 00:09:51,865 --> 00:09:54,935 I might strive somewhere between 80 and 90%. 176 00:09:54,935 --> 00:09:58,125 Yet as the business evolves, you'll find that 177 00:09:58,125 --> 00:10:00,505 the Data Warehouse is a major undertaking, and 178 00:10:00,505 --> 00:10:03,335 it is not so agile to simply adapt 179 00:10:03,335 --> 00:10:06,960 quickly to the new questions that the business might raise. 180 00:10:06,960 --> 00:10:09,660 So, Self Service Business Intelligence is a way to fill 181 00:10:09,660 --> 00:10:13,830 that gap by empowering the right people in the organization with 182 00:10:13,830 --> 00:10:16,570 tools and access to data and training. 183 00:10:16,570 --> 00:10:19,440 They can connect to the Data Warehouse resources, 184 00:10:19,440 --> 00:10:23,220 even the data marts directly or even the models. 185 00:10:23,220 --> 00:10:28,110 And they may construct new models by extending or 186 00:10:28,110 --> 00:10:31,820 adding new logic beyond what the Data Warehouse delivers. 187 00:10:31,820 --> 00:10:35,210 This is a valid form of BI, and we should see it as a mutual 188 00:10:35,210 --> 00:10:39,340 benefit to the organization by extending and working with 189 00:10:39,340 --> 00:10:43,680 the resources already deployed in an Enterprise Data Warehouse. 190 00:10:43,680 --> 00:10:46,329 This then end to end describes the business case for 191 00:10:46,329 --> 00:10:48,926 the Data Warehouse in the organization in today. 16220

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.