Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:03,000 --> 00:00:04,200
-: Hello again.
2
00:00:04,200 --> 00:00:06,090
In this lecture, we are going to talk about
3
00:00:06,090 --> 00:00:08,700
various types of Probability distributions
4
00:00:08,700 --> 00:00:11,733
and what kind of events they can be used to describe.
5
00:00:12,840 --> 00:00:15,150
Certain distributions share features,
6
00:00:15,150 --> 00:00:17,370
so we group them into types.
7
00:00:17,370 --> 00:00:20,220
Some like rolling a die or picking a card,
8
00:00:20,220 --> 00:00:23,010
have a finite number of outcomes.
9
00:00:23,010 --> 00:00:25,470
They follow Discrete distributions,
10
00:00:25,470 --> 00:00:27,960
and we use the formulas we already introduced
11
00:00:27,960 --> 00:00:30,993
to calculate their probabilities and expected values.
12
00:00:32,280 --> 00:00:35,790
Others like, recording time and distance in track and field,
13
00:00:35,790 --> 00:00:38,130
have infinitely many outcomes.
14
00:00:38,130 --> 00:00:40,770
They follow Continuous distributions.
15
00:00:40,770 --> 00:00:42,360
And we use different formulas
16
00:00:42,360 --> 00:00:44,103
from the ones we mentioned so far.
17
00:00:45,360 --> 00:00:46,920
Throughout the course of this video,
18
00:00:46,920 --> 00:00:48,870
we are going to examine the characteristics
19
00:00:48,870 --> 00:00:51,600
of some of the most common distributions.
20
00:00:51,600 --> 00:00:54,960
For each one, we will focus on an important aspect of it
21
00:00:54,960 --> 00:00:56,493
or when it is used.
22
00:00:57,330 --> 00:00:58,950
Before we get into the specifics,
23
00:00:58,950 --> 00:01:01,440
you will need to know the proper notation we implement
24
00:01:01,440 --> 00:01:03,153
when defining distributions.
25
00:01:04,080 --> 00:01:06,690
We start off by writing down the variable name
26
00:01:06,690 --> 00:01:08,340
for our set of values,
27
00:01:08,340 --> 00:01:10,323
followed by the Tilde sign.
28
00:01:11,280 --> 00:01:13,470
This is superseded by a capital letter
29
00:01:13,470 --> 00:01:15,390
depicting the type of the distribution
30
00:01:15,390 --> 00:01:18,513
and some characteristics of the data set in parenthesis.
31
00:01:19,380 --> 00:01:22,440
The characteristics are usually mean and variance,
32
00:01:22,440 --> 00:01:25,533
but they may vary depending on the type of the distribution.
33
00:01:27,720 --> 00:01:32,040
All right, let us start by talking about the discrete ones.
34
00:01:32,040 --> 00:01:33,630
We will give an overview of them
35
00:01:33,630 --> 00:01:36,603
and then we will devote a separate lecture to each one.
36
00:01:37,590 --> 00:01:40,500
So we looked at problems relating to drawing cards
37
00:01:40,500 --> 00:01:42,660
from a deck or flipping a coin.
38
00:01:42,660 --> 00:01:45,420
Both examples show events where all outcomes
39
00:01:45,420 --> 00:01:46,683
are equally likely.
40
00:01:47,520 --> 00:01:50,220
Such outcomes are called equiprobable,
41
00:01:50,220 --> 00:01:53,463
and these sorts of events follow a uniform distribution.
42
00:01:54,660 --> 00:01:57,870
Then there are events with only two possible outcomes,
43
00:01:57,870 --> 00:01:59,760
true or false.
44
00:01:59,760 --> 00:02:02,013
They follow a Bernoulli distribution.
45
00:02:03,000 --> 00:02:05,900
Regardless of whether one outcome is more likely to occur,
46
00:02:06,780 --> 00:02:09,570
any event with two outcomes can be transformed
47
00:02:09,570 --> 00:02:11,850
into a Bernoulli event.
48
00:02:11,850 --> 00:02:14,280
We simply assign one of them to be true
49
00:02:14,280 --> 00:02:15,963
and the other one to be false.
50
00:02:17,100 --> 00:02:19,290
Imagine we are required to elect a captain
51
00:02:19,290 --> 00:02:21,240
for our college sports team.
52
00:02:21,240 --> 00:02:23,760
The team consists of seven native students
53
00:02:23,760 --> 00:02:26,370
and three international students.
54
00:02:26,370 --> 00:02:29,190
We assign the captain being domestic to be true,
55
00:02:29,190 --> 00:02:32,310
and the captain being an international as false.
56
00:02:32,310 --> 00:02:35,520
Since the outcome can now only be true or false,
57
00:02:35,520 --> 00:02:37,473
we have a Bernoulli distribution.
58
00:02:39,150 --> 00:02:41,160
Now, if we carry out a similar experiment
59
00:02:41,160 --> 00:02:42,570
several times in a row,
60
00:02:42,570 --> 00:02:45,153
we are dealing with a Binomial distribution.
61
00:02:46,110 --> 00:02:48,000
Just like the Bernoulli distribution,
62
00:02:48,000 --> 00:02:50,610
the outcomes for each iteration are two,
63
00:02:50,610 --> 00:02:52,443
but we have many iterations.
64
00:02:53,430 --> 00:02:55,500
For example, we could be flipping the coin
65
00:02:55,500 --> 00:02:57,450
we mentioned earlier three times,
66
00:02:57,450 --> 00:02:59,280
and trying to calculate the likelihood
67
00:02:59,280 --> 00:03:00,843
of getting heads twice.
68
00:03:02,760 --> 00:03:06,240
Lastly, we should mention the Poisson distribution.
69
00:03:06,240 --> 00:03:08,010
We used it when we want to test out
70
00:03:08,010 --> 00:03:11,910
how unusual an event frequency is for a given interval.
71
00:03:11,910 --> 00:03:14,610
For example, imagine we know that so far
72
00:03:14,610 --> 00:03:17,670
LeBron James has averaged 35 points per game
73
00:03:17,670 --> 00:03:19,524
during the regular season.
74
00:03:19,524 --> 00:03:21,180
We want to know how likely it is
75
00:03:21,180 --> 00:03:23,760
that he will score 12 points in the first quarter
76
00:03:23,760 --> 00:03:24,783
of his next game.
77
00:03:25,860 --> 00:03:27,510
Since the frequency changes,
78
00:03:27,510 --> 00:03:29,883
so should our expectations for the outcome.
79
00:03:30,750 --> 00:03:32,880
Using the Poisson distribution,
80
00:03:32,880 --> 00:03:35,220
we are able to determine the chance of LeBron
81
00:03:35,220 --> 00:03:38,853
scoring exactly 12 points for the specified time interval.
82
00:03:41,730 --> 00:03:42,563
Great.
83
00:03:42,563 --> 00:03:45,690
Now on to the Continuous distributions.
84
00:03:45,690 --> 00:03:46,860
One thing to remember,
85
00:03:46,860 --> 00:03:49,560
is that since we are dealing with continuous outcomes,
86
00:03:49,560 --> 00:03:52,410
the Probability distribution would be a curve
87
00:03:52,410 --> 00:03:55,413
as opposed to unconnected individual bars.
88
00:03:56,490 --> 00:04:00,003
The first one we will talk about is the Normal distribution.
89
00:04:00,840 --> 00:04:02,730
The outcomes of many events in nature
90
00:04:02,730 --> 00:04:06,543
closely resembled this distribution, hence the name normal.
91
00:04:07,380 --> 00:04:09,540
For instance, according to numerous reports
92
00:04:09,540 --> 00:04:11,220
throughout the last few decades,
93
00:04:11,220 --> 00:04:13,440
the weight of an adult male polar bear
94
00:04:13,440 --> 00:04:16,649
is usually around 500 kilograms.
95
00:04:16,649 --> 00:04:19,470
However, there have been records of individual species
96
00:04:19,470 --> 00:04:23,853
weighing anywhere between 350 kilograms and 700 kilograms.
97
00:04:24,840 --> 00:04:29,840
Extreme values like 350 and 700 are called outliers
98
00:04:30,570 --> 00:04:33,963
and do not feature very frequently in Normal distributions.
99
00:04:35,130 --> 00:04:37,320
Sometimes we have limited data for events
100
00:04:37,320 --> 00:04:39,333
that resemble a normal distribution.
101
00:04:40,590 --> 00:04:44,673
In those cases, we observe the Student's-T distribution.
102
00:04:45,750 --> 00:04:48,030
It serves as a small sample approximation
103
00:04:48,030 --> 00:04:49,563
of a Normal distribution.
104
00:04:50,820 --> 00:04:52,020
Another difference is that
105
00:04:52,020 --> 00:04:54,990
the Student's-T accommodates extreme values
106
00:04:54,990 --> 00:04:56,193
significantly better.
107
00:04:57,420 --> 00:04:59,760
Graphically, that is represented by the curve
108
00:04:59,760 --> 00:05:01,293
having fatter tails.
109
00:05:02,370 --> 00:05:05,220
Overall, this results in a larger number of values
110
00:05:05,220 --> 00:05:07,530
located far away from the mean.
111
00:05:07,530 --> 00:05:09,960
So the curve would probably more closely resemble
112
00:05:09,960 --> 00:05:13,473
a Student's-T distribution than a Normal distribution.
113
00:05:15,030 --> 00:05:17,880
Now, imagine only looking at the recorded weights
114
00:05:17,880 --> 00:05:21,420
of the last 10 sightings across Alaska and Canada.
115
00:05:21,420 --> 00:05:23,820
The lower number of elements would make the occurrence
116
00:05:23,820 --> 00:05:25,200
of any extreme value
117
00:05:25,200 --> 00:05:26,670
represent a much bigger part
118
00:05:26,670 --> 00:05:28,320
of the population than it should.
119
00:05:31,980 --> 00:05:33,093
Good job everyone.
120
00:05:34,230 --> 00:05:35,700
Another Continuous distribution
121
00:05:35,700 --> 00:05:37,050
we would like to introduce,
122
00:05:37,050 --> 00:05:39,840
is the Chi-Squared distribution.
123
00:05:39,840 --> 00:05:41,520
It is the first asymmetric
124
00:05:41,520 --> 00:05:44,070
Continuous distribution we are dealing with
125
00:05:44,070 --> 00:05:46,713
as it only consists of non-negative values.
126
00:05:47,700 --> 00:05:50,940
Graphically, that means that the Chi-Squared distribution
127
00:05:50,940 --> 00:05:53,493
always starts from zero on the left.
128
00:05:54,540 --> 00:05:58,110
Depending on the average and maximum values within the set,
129
00:05:58,110 --> 00:06:00,240
the curve of the Chi-Squared graph
130
00:06:00,240 --> 00:06:01,953
is typically skewed to the right.
131
00:06:03,600 --> 00:06:05,640
Unlike the previous two distributions,
132
00:06:05,640 --> 00:06:09,450
the Chi-Squared does not often mirror real-life events.
133
00:06:09,450 --> 00:06:12,780
However, it is often used in hypothesis testing
134
00:06:12,780 --> 00:06:15,270
to help determine goodness of fit.
135
00:06:15,270 --> 00:06:16,860
The next distribution on our list
136
00:06:16,860 --> 00:06:18,753
is the Exponential distribution.
137
00:06:19,650 --> 00:06:22,290
The Exponential distribution is usually present
138
00:06:22,290 --> 00:06:23,460
when we are dealing with events
139
00:06:23,460 --> 00:06:25,323
that are rapidly changing early on.
140
00:06:26,670 --> 00:06:28,350
An easy to understand example
141
00:06:28,350 --> 00:06:31,470
is how online news articles generate hits.
142
00:06:31,470 --> 00:06:34,650
They get most of their clicks when the topic is still fresh.
143
00:06:34,650 --> 00:06:35,970
The more time passes,
144
00:06:35,970 --> 00:06:39,183
the more irrelevant it becomes and interest dies off.
145
00:06:40,440 --> 00:06:42,540
The last Continuous distribution we will mention
146
00:06:42,540 --> 00:06:44,943
is the Logistic distribution.
147
00:06:45,930 --> 00:06:48,600
We often find it useful in forecast analysis
148
00:06:48,600 --> 00:06:50,370
when we try to determine a cutoff point
149
00:06:50,370 --> 00:06:52,440
for a successful outcome.
150
00:06:52,440 --> 00:06:56,640
For instance, take a competitive Esport like Dota 2.
151
00:06:56,640 --> 00:06:58,650
We can use the Logistic distribution
152
00:06:58,650 --> 00:07:00,960
to determine how much of an in-game advantage
153
00:07:00,960 --> 00:07:02,970
at the 10-minute mark is necessary
154
00:07:02,970 --> 00:07:05,523
to confidently predict victory for either team.
155
00:07:06,390 --> 00:07:08,550
Just like with other types of forecasting,
156
00:07:08,550 --> 00:07:11,730
our predictions would never reach true certainty,
157
00:07:11,730 --> 00:07:12,880
but more on that later.
158
00:07:15,750 --> 00:07:18,150
Whoa, good job, folks.
159
00:07:18,150 --> 00:07:19,140
In the next video,
160
00:07:19,140 --> 00:07:22,140
we are going to focus on Discrete distributions.
161
00:07:22,140 --> 00:07:25,020
We will introduce formulas for competing expected values
162
00:07:25,020 --> 00:07:26,310
and standard deviations
163
00:07:26,310 --> 00:07:29,790
before looking into each distribution individually.
164
00:07:29,790 --> 00:07:30,843
Thanks for watching.
12875
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.