Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:09,000 --> 00:00:14,000
focus on how to measure and compare performance of different data structures and algorithms.
2
00:00:14,000 --> 00:00:21,000
This specific module is a bit special, in that it presents a toolbox that we will use intensively throughout
3
00:00:21,000 --> 00:00:28,000
the rest of the course. Some of the concepts presented may seem odd or abstract at first, and they might
4
00:00:28,000 --> 00:00:34,000
take some time to get used to, but hold on, once you are through this module and we have established a common
5
00:00:34,000 --> 00:00:40,000
language for performance complexity comparison, the remaining modules will be much more concrete.
6
00:00:40,000 --> 00:00:46,000
We just need this toolbox first. So, without further ado, let's consider
7
00:00:46,000 --> 00:00:54,000
a few different ways of measure performance. So, one strategy is to use a stopwatch to simply measure the
8
00:00:54,000 --> 00:01:00,000
time it takes for a given action to complete. That was actually what we did in the introduction,
9
00:01:00,000 --> 00:01:05,000
comparing two implementations on an iPhone and a laptop respectively.
10
00:01:05,000 --> 00:01:12,000
However, that approach comes with some drawbacks, because a lot of things are in play when measuring one
11
00:01:12,000 --> 00:01:18,000
thing as the hardware the executes the implementation, where more capable hardware may result in a faster
12
00:01:18,000 --> 00:01:27,000
execution time, even though the algorithm implemented in itself is just the same or maybe even slightly worse
13
00:01:27,000 --> 00:01:33,000
than an algorithm running on some less capable hardware. Other things can influence the measure time, too,
14
00:01:33,000 --> 00:01:42,000
such as the compiler optimizations, and possible simultaneously running programs that may eat resources from our machine.
15
00:01:42,000 --> 00:01:50,000
Measuring execution time with a stopwatch has another drawback, namely the fact that it is only a single measuring point.
16
00:01:50,000 --> 00:01:57,000
Specifically, that single measurement does not say anything about how the execution time may behave for
17
00:01:57,000 --> 00:02:04,000
smaller or larger inputs; for example, how the execution time may grow for our log reader when more log
18
00:02:04,000 --> 00:02:10,000
lines are supposed to be analyzed. We can achieve a better understanding on the nature of the performance
19
00:02:10,000 --> 00:02:17,000
behavior for varying input sizes by considering the number of instructions executed by the machine when
20
00:02:17,000 --> 00:02:23,000
running a given program. If we're shown that each instruction takes a certain fraction of a second to
21
00:02:23,000 --> 00:02:31,000
execute, then the total time usage is just a matter of multiplying this per instruction time with the total
22
00:02:31,000 --> 00:02:39,000
number of instructions. For example, consider the function from the intro that just reads the provided log
23
00:02:39,000 --> 00:02:46,000
file and counts the number of log lines. We could assume that the two first lines in the function uses one
24
00:02:46,000 --> 00:02:53,000
instruction each, summarizing to two instructions. And for the sake of simplicity, that the contents of the
25
00:02:53,000 --> 00:03:04,000
for loop uses four instructions. If the log file contains 100,000 lines, we thus end up with an expression
26
00:03:04,000 --> 00:03:10,000
for the total number of instructions executed that looks like this.
27
00:03:10,000 --> 00:03:19,000
The last + 1 comes from the return statement. If we, instead of assuming 100,000 log lines are using the
28
00:03:19,000 --> 00:03:28,000
letter N to symbolize any number of log lines, then we get a more general expression that looks like this.
29
00:03:28,000 --> 00:03:34,000
Now we can get an idea of the complexity of an algorithm for a given input size simply by inserting an
30
00:03:34,000 --> 00:03:42,000
appropriate number of N in the formal layer. An interesting question we could now ask is how this expression
31
00:03:42,000 --> 00:03:51,000
grows when N increases, and this leads us to the third approach of measuring performance, namely, to look at
32
00:03:51,000 --> 00:03:58,000
the nature of the curve that the instruction count formally from before represents.
33
00:03:58,000 --> 00:04:04,000
And remember that the number of instructions directly relates to the execution time.
34
00:04:04,000 --> 00:04:13,000
Consider the following examples of such curves. The formula from before represented a straight line like this.
35
00:04:13,000 --> 00:04:21,000
A straight line means that the execution time is somewhat reliable, so that an increment in the input size,
36
00:04:21,000 --> 00:04:32,000
or N as we named it, of 500, say, from 1000 to 1500, results in the same execution time growth than an
37
00:04:32,000 --> 00:04:41,000
increment of the input size from, say, 10,000 to 10,500. Here is another example of a line where the
38
00:04:41,000 --> 00:04:47,000
execution time increases slightly faster. Often when the curve looks like this we are happy,
39
00:04:47,000 --> 00:04:53,000
because then our algorithm does not get relatively more cumbersome for a larger input.
40
00:04:53,000 --> 00:05:01,000
We also say that the execution time grows linearly with the input size, and linearity is good.
41
00:05:01,000 --> 00:05:10,000
An example of an even better looking curve is this one. And as you can see, increasing the input size with
42
00:05:10,000 --> 00:05:19,000
a certain amount has a higher relative impact on the execution time for smaller inputs than for larger inputs.
43
00:05:19,000 --> 00:05:28,000
That is, increasing the input size with, say, 500 again, will result in almost no additional execution time
44
00:05:28,000 --> 00:05:33,000
when the input is sufficiently large. Of course, this is a quite ideal performance behavior for an
45
00:05:33,000 --> 00:05:41,000
algorithm, and as we shall see later on, there are ways of searching for data that has exactly this property.
46
00:05:41,000 --> 00:05:49,000
And this basically means that the extra time we need to find the needle in the haystack only increases a tiny
47
00:05:49,000 --> 00:05:57,000
bit, even if the haystack grows significantly in size. On the other end of the scale is a curve like this.
48
00:05:57,000 --> 00:06:02,000
As you can see, this curve has opposite properties than a formal one.
49
00:06:02,000 --> 00:06:09,000
Even though the execution time only grows slowly for small input sizes, the execution time will explode in
50
00:06:09,000 --> 00:06:17,000
size whenever the input gets large enough, and notice that this explosion can happen even if the execution
51
00:06:17,000 --> 00:06:23,000
time is very small for small inputs. Naturally this is the worst imaginable behavior for an algorithm,
52
00:06:23,000 --> 00:06:30,000
because this effectively sets a probably rather low upper limit for how large inputs we can handle.
53
00:06:30,000 --> 00:06:38,000
So, considering the curves that represent the instruction count can give us a pretty good understanding of
54
00:06:38,000 --> 00:06:48,000
how the execution time grows when the input size grows. With that mindset in hand, we can start considering
55
00:06:48,000 --> 00:06:55,000
other performance characteristics. For example, we could consider a best-case scenario for an algorithm,
56
00:06:55,000 --> 00:07:02,000
that is, a curve that represents the most optimistic simulation for an execution, a guarantee that states
57
00:07:02,000 --> 00:07:09,000
what the shortest possible execution time can ever be. We can also consider the worst-case scenario,
58
00:07:09,000 --> 00:07:14,000
that is, an upper limit for how bad an algorithm can perform for a given input size.
59
00:07:14,000 --> 00:07:20,000
When talking about algorithms, this curve is very interesting, simply because there is often a desire to
60
00:07:20,000 --> 00:07:28,000
solve problems as fast as possible, not only when the input is nice, but also in the worst possible situations.
61
00:07:28,000 --> 00:07:34,000
And then, it is useful to know how bad the algorithm will possibly behave.
62
00:07:34,000 --> 00:07:42,000
As a last example, consider a performance behavior that is nice, except for certain spikes, as shown in this curve.
63
00:07:42,000 --> 00:07:49,000
If we are running the corresponding algorithm numerous times, multiple times, it can be interesting to
64
00:07:49,000 --> 00:07:55,000
consider the amortized performance, that is, what is the real cost when summarizing the execution times
65
00:07:55,000 --> 00:08:01,000
across multiple executions. Do the spikes contribute significantly to the execution time in the total
66
00:08:01,000 --> 00:08:09,500
picture, or can they, so to speak, be balanced out? So, that was four ways to consider performance measuring.
67
00:08:09,500 --> 00:08:18,000
In the next clip, we will zoom in on how to come from program source code to its related execution time curve.
9116
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.