Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:01,740 --> 00:00:03,180
Narrator: Before crunching any numbers
2
00:00:03,180 --> 00:00:05,340
and making decisions, we should introduce
3
00:00:05,340 --> 00:00:06,682
some key definitions.
4
00:00:06,682 --> 00:00:10,350
The first step of every statistical analysis you perform
5
00:00:10,350 --> 00:00:12,630
is determine whether the data you are dealing with
6
00:00:12,630 --> 00:00:14,823
is a population or a sample.
7
00:00:15,660 --> 00:00:18,300
A population is the collection of all items of interest
8
00:00:18,300 --> 00:00:21,903
to our study and is usually denoted with an uppercase N.
9
00:00:22,800 --> 00:00:25,320
The numbers we've obtained when using a population
10
00:00:25,320 --> 00:00:26,883
are called parameters.
11
00:00:27,930 --> 00:00:30,120
A sample is a subset of the population
12
00:00:30,120 --> 00:00:32,460
and is denoted with a lowercase n.
13
00:00:32,460 --> 00:00:34,170
And the numbers we've obtained when working
14
00:00:34,170 --> 00:00:36,900
with a sample are called statistics.
15
00:00:36,900 --> 00:00:38,820
Now you know why the field we are studying
16
00:00:38,820 --> 00:00:40,113
is called statistics.
17
00:00:41,250 --> 00:00:44,040
Let's say, we wanna perform a survey of the job prospects
18
00:00:44,040 --> 00:00:47,190
of the students studying in the New York University.
19
00:00:47,190 --> 00:00:49,080
What is the population?
20
00:00:49,080 --> 00:00:51,420
You can simply walk into New York University
21
00:00:51,420 --> 00:00:53,610
and find every student, right?
22
00:00:53,610 --> 00:00:56,160
Well, surely that would not be the population
23
00:00:56,160 --> 00:00:57,810
of NYU students.
24
00:00:57,810 --> 00:01:00,450
The population of interest includes not only the students
25
00:01:00,450 --> 00:01:03,960
on campus, but also the ones at home, on exchange,
26
00:01:03,960 --> 00:01:07,740
abroad, distant education students, part-time students,
27
00:01:07,740 --> 00:01:11,250
even the ones who enrolled but are still in high school.
28
00:01:11,250 --> 00:01:13,923
Though exhaustive, even this list misses someone.
29
00:01:14,760 --> 00:01:15,900
Point taken.
30
00:01:15,900 --> 00:01:17,790
Populations are hard to define
31
00:01:17,790 --> 00:01:19,653
and hard to observe in real life.
32
00:01:21,240 --> 00:01:24,180
A sample, however, is much easier to gather.
33
00:01:24,180 --> 00:01:27,120
It is less time consuming and less costly.
34
00:01:27,120 --> 00:01:29,490
Time and resources are the main reasons
35
00:01:29,490 --> 00:01:32,310
we prefer drawing samples compared to analyzing
36
00:01:32,310 --> 00:01:33,903
an entire population.
37
00:01:34,740 --> 00:01:37,590
So, let's draw a sample then.
38
00:01:37,590 --> 00:01:41,850
As we first wanted to do, we can just go to the NYU campus.
39
00:01:41,850 --> 00:01:44,010
Next, let's enter the canteen
40
00:01:44,010 --> 00:01:46,380
because we know it will be full of people.
41
00:01:46,380 --> 00:01:48,870
We can then interview 50 of them.
42
00:01:48,870 --> 00:01:50,070
Cool!
43
00:01:50,070 --> 00:01:54,750
This is a sample drawn from the population of NYU students.
44
00:01:54,750 --> 00:01:55,623
Good job!
45
00:01:56,640 --> 00:01:59,490
Populations are hard to observe and contact.
46
00:01:59,490 --> 00:02:01,740
That's why statistical tests are designed to work
47
00:02:01,740 --> 00:02:03,270
with incomplete data.
48
00:02:03,270 --> 00:02:05,730
You will almost always be working with sample data
49
00:02:05,730 --> 00:02:08,630
and make data-driven decisions and inferences based on it.
50
00:02:09,570 --> 00:02:10,590
All right.
51
00:02:10,590 --> 00:02:13,800
Since statistical tests are usually based on sample data,
52
00:02:13,800 --> 00:02:17,077
samples are key to accurate statistical insights.
53
00:02:17,077 --> 00:02:19,800
They have two defining characteristics,
54
00:02:19,800 --> 00:02:22,320
randomness and representativeness.
55
00:02:22,320 --> 00:02:25,380
A sample must be both random and representative
56
00:02:25,380 --> 00:02:27,093
for an insight to be precise.
57
00:02:28,140 --> 00:02:30,390
A random sample is collected when each member
58
00:02:30,390 --> 00:02:32,580
of the sample is chosen from the population
59
00:02:32,580 --> 00:02:34,083
strictly by chance.
60
00:02:35,370 --> 00:02:38,160
A representative sample is a subset of the population
61
00:02:38,160 --> 00:02:40,020
that accurately reflects the members
62
00:02:40,020 --> 00:02:41,523
of the entire population.
63
00:02:42,540 --> 00:02:45,060
Let's go back to the sample we just discussed.
64
00:02:45,060 --> 00:02:48,120
The 50 students from the NYU canteen.
65
00:02:48,120 --> 00:02:49,728
We walked into the university canteen
66
00:02:49,728 --> 00:02:53,070
and violated both conditions.
67
00:02:53,070 --> 00:02:55,500
People were not chosen by chance.
68
00:02:55,500 --> 00:02:59,070
They were a group of NYU students who were there for lunch.
69
00:02:59,070 --> 00:03:01,950
Most members did not even get the chance to be chosen
70
00:03:01,950 --> 00:03:04,110
as they were not in the canteen.
71
00:03:04,110 --> 00:03:08,001
Thus, we conclude the sample was not random
72
00:03:08,001 --> 00:03:11,010
but was it representative?
73
00:03:11,010 --> 00:03:13,179
Well, it represented a group of people
74
00:03:13,179 --> 00:03:16,470
but definitely not all students in the university.
75
00:03:16,470 --> 00:03:19,620
To be exact, it represented the people who have lunch
76
00:03:19,620 --> 00:03:22,020
at the university canteen.
77
00:03:22,020 --> 00:03:24,330
Had our survey been about job prospects
78
00:03:24,330 --> 00:03:27,480
of NYU students who eat in the university canteen,
79
00:03:27,480 --> 00:03:28,743
we would've done well.
80
00:03:30,240 --> 00:03:31,170
Okay.
81
00:03:31,170 --> 00:03:33,270
You must be wondering how to draw a sample
82
00:03:33,270 --> 00:03:36,210
that is both random and representative.
83
00:03:36,210 --> 00:03:38,700
Well, the safest way would be to get access
84
00:03:38,700 --> 00:03:40,170
to the student database
85
00:03:40,170 --> 00:03:43,080
and contact individuals in a random manner.
86
00:03:43,080 --> 00:03:46,020
However, such surveys are almost impossible to conduct
87
00:03:46,020 --> 00:03:47,970
without assistance from the university.
88
00:03:49,020 --> 00:03:50,040
All right.
89
00:03:50,040 --> 00:03:52,770
Throughout the course, we will explore both sample
90
00:03:52,770 --> 00:03:54,990
and population statistics.
91
00:03:54,990 --> 00:03:58,140
After completing this course, samples and populations
92
00:03:58,140 --> 00:04:00,570
will be a piece of cake for you.
93
00:04:00,570 --> 00:04:01,570
Thanks for watching.
7176
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.