AWS EC2 status checks are automated health checks that monitor the functionality and operability of your EC2 instances. They provide crucial insights into the state of the underlying hardware, network connectivity, and the operating system of the instance. These checks are fundamental to ensure the high availability and reliability of your workloads on AWS.
Contents
Types of EC2 Status Checks
- System Status Checks:
- Monitor AWS infrastructure hosting your instance.
- Detect issues like loss of network connectivity, host hardware failures, or software issues on the physical machine hosting the instance.
- Example: If an EC2 instance’s underlying hardware encounters a failure, the system status check will fail.
- Instance Status Checks:
- Monitor the software and network configuration of your individual EC2 instance.
- Detect issues like failed system processes, exhausted memory, or misconfigured networking within the instance itself.
- Example: If the instance runs out of memory, the instance status check will fail.
Default Configuration of Status Checks
By default, status checks are enabled for every EC2 instance upon launch. These checks are configured and managed by AWS automatically. The results of these checks are visible in the AWS Management Console under the “Status Checks” tab of an EC2 instance, or via the AWS CLI and SDKs.
Can We Modify Default Configuration?
AWS does not provide options to directly alter the predefined system and instance status checks. However, you can customize the handling of failed checks by configuring CloudWatch Alarms:
- Create CloudWatch Alarms for Status Checks:
- Navigate to the CloudWatch console.
- Create an alarm for metrics
StatusCheckFailed_Instance
orStatusCheckFailed_System
. - Configure notifications or automated actions (e.g., reboot, stop/start the instance) based on the alarm.
- Automated Recovery Actions: AWS offers a feature called “Recover an Instance” for system status check failures, which automatically recovers an impaired instance by launching it on new hardware.
Defining Custom Health Checks
While AWS EC2 status checks focus on the infrastructure and OS-level health, you might need additional monitoring tailored to your application or workload. This is where custom health checks come in.
Here’s how to implement custom checks:
- Use CloudWatch Agent:
- Install and configure the CloudWatch Agent on your EC2 instance.
- Collect custom metrics like disk usage, application logs, or database queries.
sudo yum install amazon-cloudwatch-agent
sudo vi /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
Example configuration snippet:{
"metrics": {
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"disk": {
"measurement": ["used_percent"],
"resources": ["/"]
},
"mem": {
"measurement": ["used_percent"]
}
}
}
}
Start the cloudwatch agent:sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a start
- Use Third-Party Monitoring Tools: Integrate tools like Datadog, New Relic, or Prometheus to set up advanced custom health checks for your application and workloads.
- Custom Health Endpoint: For applications, create a health-check endpoint (e.g.,
/health
) that returns the status of key services. Combine this with a tool like AWS Elastic Load Balancer (ELB) or Route 53 to manage traffic based on application health. Example Node.js health-check endpoint
app.get('/health', (req, res) => {
const health = { status: "UP", uptime: process.uptime() };
res.json(health);
});
Example: Status Check Handling
Scenario: Automate recovery when a system status check fails.
- Set up an IAM role with EC2 recovery permissions.
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [“ec2:RebootInstances”],
“Resource”: “*”
}
]
}
2. Create a CloudWatch alarm:
- Metric:
StatusCheckFailed_System
- Action: Reboot the instance.
3. Test:
- Manually introduce an issue (e.g., network misconfiguration).
- Verify the alarm triggers and the instance recovers.
Interview Questions and Answers
- What are the types of AWS EC2 status checks?
- System Status Checks: Monitor AWS’s infrastructure hosting your instance.
- Instance Status Checks: Monitor the instance’s OS, network, and processes.
- Can you change the configuration of system and instance status checks?
- No, but you can create CloudWatch Alarms and define automated recovery actions.
- How do you implement custom health checks on EC2 instances?
- Install the CloudWatch Agent to collect custom metrics.
- Use third-party monitoring tools.
- Implement application-specific health-check endpoints.
- How does AWS handle system status check failures?
- AWS provides an automated recovery feature to migrate the instance to healthy hardware.
- What actions can you take when a status check fails?
- Reboot or recover the instance.
- Notify administrators via SNS.
- Stop/start or replace the instance.