Skip to content

AWS Services - CloudWatch

Overview

Amazon CloudWatch is a monitoring and observability service built into AWS.
It provides metrics, logs, and alarms to help developers, DevOps engineers, and system administrators monitor resources, applications, and services in real-time.

Use cases:

  • Collecting and analyzing metrics from AWS services (EC2, RDS, Lambda, etc.).
  • Centralized log collection with CloudWatch Logs.
  • Setting up alarms and automated actions.
  • Creating dashboards for visualization.
  • Detecting anomalies and performance issues.

Key Concepts

  • Metrics: Time-ordered data points (e.g., CPU utilization, network traffic).
  • Logs: Application and system logs collected and stored.
  • Alarms: Rules to trigger actions based on metric thresholds.
  • Events (EventBridge): React to AWS resource changes in near real-time.
  • Dashboards: Visual representation of metrics and logs.
  • Namespaces: Categories for metrics (e.g., AWS/EC2, AWS/RDS).
  • Dimensions: Key-value pairs to filter and identify metrics.

CloudWatch Metrics

Metrics are collected from:

  • AWS services (default, no setup required).
  • Custom metrics (via API or CloudWatch Agent).
  • Applications/On-premises servers (via CloudWatch Agent).

Examples:

  • EC2: CPUUtilization, DiskReadOps
  • RDS: DatabaseConnections, FreeStorageSpace
  • Lambda: Invocations, Duration, Errors

CloudWatch Logs

CloudWatch Logs allows you to:

  • Collect logs from EC2, Lambda, or custom applications.
  • Search, filter, and analyze logs.
  • Set log retention policies.
  • Export logs to S3 or stream to Lambda for further processing.

Common use cases:

  • Application logs for debugging.
  • System logs for security auditing.
  • Custom logs for business metrics.

CloudWatch Agent Installation (on EC2 or On-Prem)

To collect system-level metrics and logs, you install the CloudWatch Agent:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
sudo rpm -U ./amazon-cloudwatch-agent.rpm
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
  • The config wizard helps create a JSON config for which metrics/logs to collect.
  • Once configured, start the agent:
sudo systemctl start amazon-cloudwatch-agent
sudo systemctl enable amazon-cloudwatch-agent

CloudWatch Alarms

  • Alarms are created for specific metrics.
  • Actions:
    • Send an SNS notification (email, SMS, webhook).
    • Trigger Auto Scaling policies.
    • Execute EC2 recovery or stop/terminate actions.

Example:
Alarm on CPUUtilization > 80% for 5 minutes → send email via SNS.


CloudWatch Dashboards

  • Create custom dashboards in the console.
  • Add widgets (line charts, bar charts, numbers).
  • Combine multiple service metrics into one view.
  • Useful for NOC screens or real-time monitoring.

CloudWatch Events (Amazon EventBridge)

  • Rules that match incoming events and route them to targets.
  • Example targets: Lambda, Step Functions, SNS, SQS, EC2 actions.
  • Example: On RDS failover event, trigger a Lambda function to send Slack notification.

CloudWatch Logs Insights

  • Interactive query capability for logs.
  • SQL-like query language.
  • Example:
  fields @timestamp, @message
  | filter @message like /ERROR/
  | sort @timestamp desc
  | limit 20

CloudWatch Contributor Insights

  • Analyze who or what is contributing the most to a metric.
  • Example: Identify which IP is causing high API Gateway 5xx errors.

CloudWatch Synthetics

  • Canaries: scripted tests to simulate user actions.
  • Monitor endpoints and APIs proactively.
  • Detect issues before customers are impacted.

CloudWatch Application Insights

  • Automatically sets up monitoring for common application stacks (e.g., Java, .NET, databases).
  • Detects and alerts on common problems.

Monitoring Examples

EC2 Instance Monitoring

  • By default, CloudWatch provides metrics every 5 minutes.
  • With detailed monitoring enabled: 1-minute granularity.
  • Metrics: CPUUtilization, NetworkIn, DiskReadOps.

RDS Monitoring

  • Metrics: CPUUtilization, FreeableMemory, DatabaseConnections.
  • Performance Insights integration for query-level analysis.

Best Practices

  • Use tags to organize metrics and logs per environment (dev, test, prod).
  • Aggregate logs from multiple sources into CloudWatch Logs.
  • Enable detailed monitoring for critical workloads.
  • Set alarms with multiple actions (e.g., SNS + Auto Scaling).
  • Use dashboards for real-time visualization.
  • Optimize costs by setting log retention policies.

Summary

  • CloudWatch Metrics → monitor system and application performance.
  • CloudWatch Logs → centralize log management.
  • Alarms → automated responses to performance thresholds.
  • Dashboards → visualize performance data.
  • Events/EventBridge → automate workflows.
  • CloudWatch Agent → extend monitoring to OS-level metrics and custom logs.

CloudWatch is the central monitoring service for AWS, enabling observability across infrastructure, applications, and business KPIs.