AWS EC2 status checks are automated health checks that monitor the functionality and operability of your EC2 instances. They provide crucial insights into the state of the underlying hardware, network connectivity, and the operating system of the instance. These checks are fundamental to ensure the high availability and reliability of your workloads on AWS.

Types of EC2 Status Checks

  1. System Status Checks:
    • Monitor AWS infrastructure hosting your instance.
    • Detect issues like loss of network connectivity, host hardware failures, or software issues on the physical machine hosting the instance.
    • Example: If an EC2 instance’s underlying hardware encounters a failure, the system status check will fail.
  2. Instance Status Checks:
    • Monitor the software and network configuration of your individual EC2 instance.
    • Detect issues like failed system processes, exhausted memory, or misconfigured networking within the instance itself.
    • Example: If the instance runs out of memory, the instance status check will fail.

Default Configuration of Status Checks

By default, status checks are enabled for every EC2 instance upon launch. These checks are configured and managed by AWS automatically. The results of these checks are visible in the AWS Management Console under the “Status Checks” tab of an EC2 instance, or via the AWS CLI and SDKs.

Can We Modify Default Configuration?

AWS does not provide options to directly alter the predefined system and instance status checks. However, you can customize the handling of failed checks by configuring CloudWatch Alarms:

  1. Create CloudWatch Alarms for Status Checks:
    • Navigate to the CloudWatch console.
    • Create an alarm for metrics StatusCheckFailed_Instance or StatusCheckFailed_System.
    • Configure notifications or automated actions (e.g., reboot, stop/start the instance) based on the alarm.
  2. Automated Recovery Actions: AWS offers a feature called “Recover an Instance” for system status check failures, which automatically recovers an impaired instance by launching it on new hardware.

Defining Custom Health Checks

While AWS EC2 status checks focus on the infrastructure and OS-level health, you might need additional monitoring tailored to your application or workload. This is where custom health checks come in.

Here’s how to implement custom checks:

  1. Use CloudWatch Agent:
    • Install and configure the CloudWatch Agent on your EC2 instance.
    • Collect custom metrics like disk usage, application logs, or database queries.

sudo yum install amazon-cloudwatch-agent
sudo vi /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json


Example configuration snippet:

{
"metrics": {
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"disk": {
"measurement": ["used_percent"],
"resources": ["/"]
},
"mem": {
"measurement": ["used_percent"]
}
}
}
}

Start the cloudwatch agent:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a start

  1. Use Third-Party Monitoring Tools: Integrate tools like Datadog, New Relic, or Prometheus to set up advanced custom health checks for your application and workloads.
  2. Custom Health Endpoint: For applications, create a health-check endpoint (e.g., /health) that returns the status of key services. Combine this with a tool like AWS Elastic Load Balancer (ELB) or Route 53 to manage traffic based on application health. Example Node.js health-check endpoint
app.get('/health', (req, res) => {
  const health = { status: "UP", uptime: process.uptime() };
  res.json(health);
});


Example: Status Check Handling

Scenario: Automate recovery when a system status check fails.

  1. Set up an IAM role with EC2 recovery permissions.

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [“ec2:RebootInstances”],
“Resource”: “*”
}
]
}

2. Create a CloudWatch alarm:

3. Test:


    Interview Questions and Answers

    1. What are the types of AWS EC2 status checks?
      • System Status Checks: Monitor AWS’s infrastructure hosting your instance.
      • Instance Status Checks: Monitor the instance’s OS, network, and processes.
    2. Can you change the configuration of system and instance status checks?
      • No, but you can create CloudWatch Alarms and define automated recovery actions.
    3. How do you implement custom health checks on EC2 instances?
      • Install the CloudWatch Agent to collect custom metrics.
      • Use third-party monitoring tools.
      • Implement application-specific health-check endpoints.
    4. How does AWS handle system status check failures?
      • AWS provides an automated recovery feature to migrate the instance to healthy hardware.
    5. What actions can you take when a status check fails?
      • Reboot or recover the instance.
      • Notify administrators via SNS.
      • Stop/start or replace the instance.


    Leave a Reply

    Your email address will not be published. Required fields are marked *