Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,400 --> 00:00:01,770
Let's have a play with status checks.
2
00:00:01,770 --> 00:00:03,920
So if we go into my first instance
3
00:00:03,920 --> 00:00:05,560
and look at the status check tab.
4
00:00:05,560 --> 00:00:08,103
As you can see, we can see there's two status checks
5
00:00:08,103 --> 00:00:11,350
that are being run so there's a system status check
6
00:00:11,350 --> 00:00:14,470
and an instance status check, okay?
7
00:00:14,470 --> 00:00:16,670
And in case you are not happy with these checks
8
00:00:16,670 --> 00:00:17,830
and you believe that there's an error,
9
00:00:17,830 --> 00:00:19,760
you can click on report instance status
10
00:00:19,760 --> 00:00:22,090
to help address detect issues.
11
00:00:22,090 --> 00:00:24,720
So what we can do though because our thing's running,
12
00:00:24,720 --> 00:00:27,200
we can still go ahead and create our CloudWatch alarm
13
00:00:27,200 --> 00:00:30,480
that will have a reboot or recover action on our instance.
14
00:00:30,480 --> 00:00:31,870
So I'll do action
15
00:00:31,870 --> 00:00:35,300
and then I will click on create status check alarm.
16
00:00:35,300 --> 00:00:37,120
So we create a new alarm
17
00:00:37,120 --> 00:00:39,790
and then we have an alarm notification
18
00:00:39,790 --> 00:00:43,250
so we can send it to the default CloudWatch alarms topic
19
00:00:43,250 --> 00:00:44,950
if you want to have a notification,
20
00:00:44,950 --> 00:00:47,300
but you could disable it as well if you wanted to.
21
00:00:47,300 --> 00:00:48,340
And in the alarm action,
22
00:00:48,340 --> 00:00:52,030
so what do you want to do when the alarm is triggered?
23
00:00:52,030 --> 00:00:56,000
And we have two options that could be very helpful for us.
24
00:00:56,000 --> 00:00:58,380
So recover or reboot.
25
00:00:58,380 --> 00:01:01,000
So recover is going to be very helpful
26
00:01:01,000 --> 00:01:04,120
when we want to recover from a physical host issue from AWS.
27
00:01:04,120 --> 00:01:06,400
A reboot when it's a software issue.
28
00:01:06,400 --> 00:01:08,280
So I will choose recover
29
00:01:08,280 --> 00:01:11,310
and then we need to look at the alarm thresholds.
30
00:01:11,310 --> 00:01:15,040
So we want to have the status check failed,
31
00:01:15,040 --> 00:01:16,860
for example, either.
32
00:01:16,860 --> 00:01:19,540
So if it's either both or instance or system,
33
00:01:19,540 --> 00:01:21,570
so based on what you want,
34
00:01:21,570 --> 00:01:24,670
and then for one consecutive period of five minutes.
35
00:01:24,670 --> 00:01:26,280
Here's the alarm name
36
00:01:26,280 --> 00:01:27,470
and here's a sample metric data.
37
00:01:27,470 --> 00:01:28,480
So as we can see,
38
00:01:28,480 --> 00:01:31,860
we are zero because the alarm state hasn't been triggered
39
00:01:31,860 --> 00:01:34,270
but if there was an issue with the status check,
40
00:01:34,270 --> 00:01:35,550
then this would go to one.
41
00:01:35,550 --> 00:01:37,630
And so for one consecutive period of five minutes,
42
00:01:37,630 --> 00:01:39,150
then this would trigger the alarm.
43
00:01:39,150 --> 00:01:41,380
So I will click on create
44
00:01:43,690 --> 00:01:46,930
and something is wrong because it can only be done
45
00:01:46,930 --> 00:01:49,290
on the status check failed system, of course.
46
00:01:49,290 --> 00:01:53,820
So let's go onto status check failed system, here we go.
47
00:01:53,820 --> 00:01:57,620
And now we have the recover action, click on create,
48
00:01:57,620 --> 00:01:59,950
and now this CloudWatch alarm has been created.
49
00:01:59,950 --> 00:02:03,740
So what I'm going to do is, again,
50
00:02:03,740 --> 00:02:05,970
go into my CloudWatch alarm.
51
00:02:05,970 --> 00:02:10,780
So let's click on CloudWatch, here, alarms.
52
00:02:10,780 --> 00:02:12,550
And yes, I'm gonna go directly
53
00:02:12,550 --> 00:02:14,200
into CloudWatch alarms from here.
54
00:02:16,220 --> 00:02:18,160
And we can see that we have one alarm right here
55
00:02:18,160 --> 00:02:19,510
which has insufficient data,
56
00:02:19,510 --> 00:02:22,020
so very soon it's going to be okay.
57
00:02:22,020 --> 00:02:24,223
So let me wait until it is okay and green.
58
00:02:25,340 --> 00:02:28,317
My alarm is now in the okay state and so I can click on it,
59
00:02:28,317 --> 00:02:32,560
and as we can see, the actual instance metric value is zero,
60
00:02:32,560 --> 00:02:34,860
but we need a threshold of 0.99.
61
00:02:34,860 --> 00:02:38,038
So I want to go into the alarm state.
62
00:02:38,038 --> 00:02:38,871
So what we can do, though,
63
00:02:38,871 --> 00:02:42,390
is that we can simulate a failure of this alarm
64
00:02:42,390 --> 00:02:44,520
to go into the alarm state and see what happens.
65
00:02:44,520 --> 00:02:46,290
So if I scroll down, as we can see,
66
00:02:46,290 --> 00:02:51,250
we have the history of the actions of the alarm.
67
00:02:51,250 --> 00:02:52,600
So as you can see, we created it
68
00:02:52,600 --> 00:02:55,760
and then it went from insufficient data to okay.
69
00:02:55,760 --> 00:02:58,910
So let's issue an API call to make this alarm
70
00:02:58,910 --> 00:03:00,870
go into the alarm states.
71
00:03:00,870 --> 00:03:02,240
So I'm going to click on CloudShell
72
00:03:02,240 --> 00:03:04,980
to open a CLI directly from within the Cloud
73
00:03:04,980 --> 00:03:06,500
that's going to be properly configured
74
00:03:06,500 --> 00:03:08,050
and that will save me some time,
75
00:03:08,050 --> 00:03:10,680
but you can use this CLI on your own terminal
76
00:03:10,680 --> 00:03:13,730
if you have configured it in the past, okay?
77
00:03:13,730 --> 00:03:17,570
So what I'm going to do is to launch a CloudWatch alarm
78
00:03:20,350 --> 00:03:22,833
and set alarm state.
79
00:03:27,700 --> 00:03:30,270
And I'm going to look at the version two, here we go,
80
00:03:30,270 --> 00:03:31,960
so this is how you run the alarms.
81
00:03:31,960 --> 00:03:35,240
So we need to give the alarm name and the state value
82
00:03:35,240 --> 00:03:39,220
and the state reason, and the state value is alarm.
83
00:03:39,220 --> 00:03:40,653
So let's go into CloudShell.
84
00:03:41,550 --> 00:03:44,510
So first, let's get the alarm names.
85
00:03:44,510 --> 00:03:47,573
So the alarm name is right here, so I'm going to copy this.
86
00:03:49,270 --> 00:03:54,270
So I will type AWS CloudWatch set alarm state
87
00:03:54,550 --> 00:03:58,170
and then the alarm name is the one I just copied right here,
88
00:03:58,170 --> 00:04:03,170
and the alarm state is going to be alarm,
89
00:04:03,720 --> 00:04:07,520
and so sorry, state value is alarm and state reason,
90
00:04:07,520 --> 00:04:09,800
so let's just change this.
91
00:04:09,800 --> 00:04:14,800
So state value is alarm and state reason is,
92
00:04:17,800 --> 00:04:22,800
and I will just say, testing recovering action.
93
00:04:24,580 --> 00:04:25,413
Press enter
94
00:04:27,340 --> 00:04:30,090
and this is going to set my alarm into the alarm state.
95
00:04:30,090 --> 00:04:31,738
So let's refresh this.
96
00:04:31,738 --> 00:04:34,920
And as you can see now, the alarm is an alarm.
97
00:04:34,920 --> 00:04:37,593
And so if we scroll down and look at the actions,
98
00:04:38,810 --> 00:04:41,890
there's going to be a notification
99
00:04:41,890 --> 00:04:44,050
and the alarm went into the alarm state.
100
00:04:44,050 --> 00:04:46,060
And so, if we look at the history right here,
101
00:04:46,060 --> 00:04:48,150
so here's what I want to show you, sorry.
102
00:04:48,150 --> 00:04:50,820
So the alarm went from okay to an alarm
103
00:04:50,820 --> 00:04:52,800
and then two actions happened.
104
00:04:52,800 --> 00:04:56,710
There was a SNS message sends to an SNS topic right here
105
00:04:56,710 --> 00:04:58,830
and also, the second action that was executed
106
00:04:58,830 --> 00:05:02,310
was successfully executed action, ec2 recover.
107
00:05:02,310 --> 00:05:07,310
So my ec2 instance right here has been recovered,
108
00:05:07,760 --> 00:05:08,720
thanks to this action.
109
00:05:08,720 --> 00:05:10,670
It's not something that we can really, really see
110
00:05:10,670 --> 00:05:12,450
how it was being recovered, okay?
111
00:05:12,450 --> 00:05:14,420
But as we can see,
112
00:05:14,420 --> 00:05:16,940
we have the alarm status 1/1 in alarm,
113
00:05:16,940 --> 00:05:18,820
and then the ec2 instance has been recovered
114
00:05:18,820 --> 00:05:21,550
so it'll take a bit of time to be recovered entirely,
115
00:05:21,550 --> 00:05:23,080
but at least it shows you
116
00:05:23,080 --> 00:05:25,170
that when this alarm was being triggered,
117
00:05:25,170 --> 00:05:27,440
then the recovery action was being launched, okay?
118
00:05:27,440 --> 00:05:29,040
So that's it for this lecture.
119
00:05:29,040 --> 00:05:33,170
You could also launch correct another alarm as an exercise,
120
00:05:33,170 --> 00:05:34,850
and this one on the instance,
121
00:05:34,850 --> 00:05:37,750
and then reboot the ec2 instance as an action,
122
00:05:37,750 --> 00:05:40,227
and you can try it out and also set the alarm state,
123
00:05:40,227 --> 00:05:41,710
but for now, we're good to go.
124
00:05:41,710 --> 00:05:45,750
What I'm going to do is just delete this alarm right here.
125
00:05:45,750 --> 00:05:48,530
I can close CloudShell, I don't need any more for now
126
00:05:48,530 --> 00:05:49,940
and then I'm good to go.
127
00:05:49,940 --> 00:05:51,560
So that's it for this lecture, I hope you liked it,
128
00:05:51,560 --> 00:05:53,510
and I will see you in the next lecture.
10594
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.