Article summary
In my current project, we use AWS Chatbot to send alerts from CloudWatch alarms to Slack. With a large team and lots of deployed environments, these Slack notifications act as a hub for visibility into failures that our team cares about. In this post, I’ll walk you through how to create an AWS CloudWatch metric filter and alarm in Terraform.
CloudWatch Who?
AWS CloudWatch Alarms hook up to CloudWatch logs to provide custom alerts to developers. To determine its state, an AWS CloudWatch alarm depends on one of two things: a CloudWatch metric filter or a mathematical expression applied against a metric filter. A metric filter watches CloudWatch logs and filters them based on patterns and expressions. This way, CloudWatch logs can be quantified into metrics for alarms to determine their state from. For example, a metric filter could count the number of logs in an environment, expose 400 errors, find logs containing a certain string, etc. Based on the evaluation of the filter, an alarm will execute a series of defined actions. On my project, the action performed is forwarding a notification to Slack.
In this tutorial, we’ll create a CloudWatch metric filter to find CloudWatch logs that include a specific string and attach it to an alarm. These steps assume that you’re already using Terraform infrastructure in your code.
Creating the Filter
First, we’ll want to create the CloudWatch metric filter. We’ll use the aws_cloudwatch_log_metric_filter
Terraform resource and define several fields on it. The filter will need a name as well as a log group name, which tells the filter which group of logs to evaluate. The pattern is a term or regular expression that we want the filter to match on. In our case, we’ll use the string “ERROR_WE_CARE_ABOUT” to find errors containing that string. The metric_transformation
defines things about the metric that the filter will be published to. The metric is the numerical representation of the filter’s data. It needs a name, a namespace to live under, and a value to publish to the metric when the filter is matched. In our case, this will be 1 since we want to count the occurrences of logs containing our pattern.
# main.tf
resource "aws_cloudwatch_log_metric_filter" "error-we-care-about-metric-filter" {
name = "OurMetricFilter"
log_group_name = aws_cloudwatch_log_group.job-runner-cloudwatch-log-group.name
pattern = "ERROR_WE_CARE_ABOUT"
metric_transformation {
name = "ErrorWeCareAboutMetric"
namespace = "ImportantMetrics"
value = "1"
}
}
Creating the Alarm
Now that we have a metric filter, we can hook it up to an alarm. We’ll define the alarm resource in the same file where we created the metric filter. The alarm needs a name and the name of the metric filter. We’ll give it a threshold, a statistic, and a comparison operator to determine when the metric should receive a datapoint. In this case, we’re telling the alarm that when the sum of the filtered logs is greater than the threshold of zero, the metric should get data. The “datapoint to alarm” field defines the number of metric datapoints that must fulfill our comparison to sound an alarm. Here, we’ll tell the alarm to trigger for each datapoint that breaches our conditions.
The period is the number of seconds that the statistic should be applied over, and the evaluation period tells the alarm after how many periods the data should be compared to the threshold. The namespace here will be the same one that we used for our metric filter. Finally, the alarm actions are a list of actions to be performed when the alarm is sounding. These are listed as ARNs, or Amazon Resource Names. In this example, the chatbot ARN is passed as the resulting action.
# main.tf
resource "aws_cloudwatch_metric_alarm" "error-we-care-about-alarm" {
alarm_name = "error-we-care-about"
metric_name = aws_cloudwatch_log_metric_filter.error-we-care-about-metric-filter.name
threshold = "0"
statistic = "Sum"
comparison_operator = "GreaterThanThreshold"
datapoints_to_alarm = "1"
evaluation_periods = "1"
period = "60"
namespace = "ImportantMetrics"
alarm_actions = [var.chatbot_sns_topic_arn]
}
Next Steps
Now that we know how to build a CloudWatch metric filter and alarm, we can extend this pattern in ways that suit our project. Of course, there are lots of other ways to build both CloudWatch metric filters and alarms. Alarms, for example, can be grouped into composite alarms, and filters have many other abilities beyond searching for a term. We know that alarms can kick off a workflow to send a message to Slack, but they can also do things like starting or stopping EC2 instances and sending notifications to other systems. Hopefully, this post has helped you get started with CloudWatch alarms in Terraform.
Thanks Grace. This was helpful.
I have a question –
I have the log_group_name already existing and am trying to create the log_metric_filter.
Can I pass the log_group_name directly?
Having some trouble here.
Hi Grace
This was very useful. One of the few resources available on the topic of creating a terraform alarm from cloudwatch logs.
I think I have spotted a problem with your solution though :(
You are filter name as `metric_name` in alarm, which I am afraid is not going to work. Instead you should be extracting metric name from metric created inside the filter, so the `metric_name` should be as following:
“`hcl
metric_name = lookup(aws_cloudwatch_log_metric_filter.error-we-care-about-metric-filter.name.metric_transformation[0], “name”)
“`
sorry, typo
“`hcl
metric_name = lookup(aws_cloudwatch_log_metric_filter.error-we-care-about-metric-filter.metric_transformation[0], “name”)
“`
Thanks Grace, I found this more helpful (particularly having the metric_filter and the metric_alarm together in a single example) than the official AWS docs.
Extremally helpful, THANK YOU!
Thanks Grace, this was spot-on helpful. Cheers!