As we've discussed in the previous part of this series, observability as code means automating the configuration of your observability tools. You learned the basics of automating dashboards, and now we’re going to expand to ways you can apply observability as code approaches to automate both alerts and synthetic monitoring.

This three-part blog series is your guide to observability as code, providing tips, examples, and advice. In this series, we'll walk through examples of automating the configuration of your observability tools, starting with dashboards in part one. Here in part two, we’ll show examples with alerts and synthetic monitoring.

By the end of the series, you'll have worked with a total of five examples of observability as code using New Relic and Hashicorp's Terraform: dashboards as code, alerts as code, synthetic monitoring as code, tags as code, and workloads as code. You'll be working with data from the sample FoodMe restaurant ordering app.

Before you begin working on this part two, make sure you have completed Automate configurations with observability as code, part one.

Alerts as code

Engineers often ask me about alerts as code, because setting up alerts can take a lot of time. Of course you can set up alerts in the New Relic UI. But if you need to set up large volumes of alerts, you can consider automating the configuration.

In New Relic, you can set up powerful, customizable alert policies for any data that you can monitor, such as APM, infrastructure monitoring, browser monitoring, mobile monitoring, and NRQL queries. You can configure alerts to send notifications when key performance metrics change.

As seen in part one, dashboards as code is the easiest configuration to apply as code as you are getting started in New Relic. With dashboards as code, you don’t need a target entity (or a valid object) to attach on. You can reuse a working NRQL in nrql_query as defined in your main.tf file.

For alerts as code, you'll need a valid data source so that Terraform can use information that is defined externally. A separate Terraform configuration defines a data source, which is an identifier for a resource. Each data source must be unique within a module. Check out this sample code based on Data Source: newrelic_entity in the Terraform registry.

data "newrelic_entity" "app" {
  # The unique name of the entity in New Relic
	name = "my-app"
	# Valid values are APM, BROWSER, INFRA, MOBILE, SYNTH, and VIZ. 
	# If not specified, all domains are searched.
  domain = "APM"
	# Valid values are APPLICATION, DASHBOARD, HOST, MONITOR, and WORKLOAD.
  type = "APPLICATION"
}

For more details, see Data Source: newrelic_entity.

In this alerts as code tutorial, we'll continue to use our sample FoodMe app. After you have a target entity, you'll need to add an alert policy, alert conditions, an alert workflow, destination, and notification channel. Here's an illustration of how everything will work together.

Create alerts as code

To see everything we're covering in this section, watch this video. If you need a refresher on working with Terraform, go to Getting started with New Relic and Terraform. You can also work along with these steps with code samples in GitHub and the hands-on workshop in Instruqt.

Example newrelic_alert_policy resource

A policy is a group of one or more alert conditions. You must first create a policy before you can add additional conditions to it. Here's an example from Resources: newrelic_alert_policy in the Terraform registry:

resource "newrelic_alert_policy" "foo" {
	# The name of the policy
  name = "example"
	# The rollup strategy for the policy. 
	# Options include: PER_POLICY, PER_CONDITION, or PER_CONDITION_AND_TARGET.
  incident_preference = "PER_POLICY" # PER_POLICY is default
}

For more details on this alert policy, see Resources: newrelic_alert_policy.

Example newrelic_nrql_alert_condition resource

condition can include a data source and thresholds that define behavior that's considered a violation. For example, you might describe a condition like this: "If the response time for any page load in my app goes above 8 seconds and lasts for more than 5 minutes, that's a violation." Here's an example from Resources: newrelic_nrql_alert_condition in the Terraform registry.

resource "newrelic_notification_channel" "foo" {
	# Determines the New Relic account where the notification channel will be created.
  account_id = 12345678
	# The name of the channel.
  name = "webhook-example"
	# The type of channel.
  type = "WEBHOOK"
	# The id of the destination.
  destination_id = "00b6bd1d-ac06-4d3d-bd72-49551e70f7a8"
	# The type of product.
  product = "IINT" // (Workflows)

  // must be valid json
  property {
		# The notification property key.
    key = "payload"
		# The notification property value.
    value = "name: {{ foo }}"
		# The notification property label.
    label = "Payload Template"
  }
}

For more details, see Resources: newrelic_nrql_alert_condition.

Example newrelic_workflow resource

You use workflows in New Relic to manage when and where notifications are sent about issues, so the relevant info is sent to the right individual or team who needs it. Here's an example from Resource newrelic_workflow in the Terraform registry:

resource "newrelic_workflow" "foo" {
	# The name of the workflow.
  name = "workflow-example"
	# How to handle muted issues.
  muting_rules_handling = "NOTIFY_ALL_ISSUES"

  issues_filter {
		# The name of the filter.
    name = "filter-name"
		# Type of the filter.
    type = "FILTER"

    predicate {
			# Issue event attribute to check.
      attribute = "accumulations.tag.team"
			# An operator to use to compare the attribute.
      operator = "EXACTLY_MATCHES"
			# The attribute must match any of the values in this list
      values = [ "growth" ]
    }
  }

  destination {
		# id of a notification_channel to use for notifications.
    channel_id = newrelic_notification_channel.some_channel.id
  }
}

For more details, see our documentation on New Relic workflows and Resource newrelic_workflow in the Terraform registry.

So far so good! You understand newrelic_alert_policy, newrelic_nrql_alert_condition, and newrelic_workflow resources in Terraform.

To complete our automation of configuring alerts in New Relic, we can't forget the notifications. We'll also need two additional resources that are nested destination blocks in Terraform:

  • notification_destination
  • notification_channel

Example notification_destination resource

You'll need a notification_destination in Terraform that defines reusable credentials or settings for a notification provider. Examples of destinations include webhook basic credentials, Slack OAuth credentials, and PagerDuty API keys.

Destinations are areas where we send notifications about your New Relic data. A destination is a unique identifier for a third-party system that you use. Destination settings contain the connection details to integrate with third-party systems and can be used across a variety of tools in New Relic. Here's an example from Resource: newrelic_notification_destination in the Terraform registry.

resource "newrelic_notification_destination" "foo" {
	# Determines the New Relic account where the notification destination will be created.
  account_id = 12345678
	# The name of the destination
  name = "foo"
	# The type of destination.
  type = "WEBHOOK"

  property {
		# The notification property key.
    key = "url"
		# The notification property value.
    value = "https://webhook.mywebhook.com"
  }

  auth_basic {
		# The username of the basic auth.
    user = "username"
		# Specifies an authentication password for use with a destination.
    password = "password"
  }
}

For more details, see our New Relic docs on New Relic destinations and Resource: newrelic_notification_destination in the Terraform registry.

Example newrelic_notification_channel resource

For your specific workflow, you'll also need a notification_channel that describes additional notification parameters. There are different configuration options for the notification channel depending on destination type. For example, an email channel could include an email subject and details for the email body, and a webhook channel could define a payload template.

Alert policies in New Relic are where you designate which team members are notified when an incident occurs, and how they're notified. Options for notification channels, include webhooks, Slack, and email. You might want to include charts about the incident to provide context and share them in a notification to your team. Here's an example from  Resource newrelic_notification_channel in the Terraform registry.

# your unique New Relic account ID 
variable "nr_account_id" {
  default = "XXXXX"
}

# your User API key
variable "nr_api_key" {
  default = "XXXXX"
}

# valid regions are US and EU
variable "nr_region" {
  default = "US"
}

For more details, see our New Relic docs on notification channels and Resource newrelic_notification_channel in the Terraform registry.

Now let's combine what we covered in the earlier sections with our new example, the Terraform data sources for alerts as code. The next two example main.tf and variariables.tf files use concepts described in Google Site Reliability Engineering, The Four Golden Signals: latency, traffic, errors, and throughput. These examples are based on code samples in the Getting Started with the New Relic Provider documentation.

Example main.tf file complete code

# get the New Relic terraform provider
terraform {
  required_version = "~> 1.0"
  required_providers {
    newrelic = {
      source  = "newrelic/newrelic"
      version = "3.6.1"
    }
  }
}

# configure the New Relic provider
provider "newrelic" {
  account_id = (var.nr_account_id)
  api_key    = (var.nr_api_key) # usually prefixed with 'NRAK'
  region     = (var.nr_region)  # Valid regions are US and EU
}

# data source to get information about a specific entity in New Relic that already exists. 
data "newrelic_entity" "app_name" {
  name   = (var.nr_appname) # Note: This must be an exact match of your app name in New Relic (Case sensitive)
  type   = "APPLICATION"
  domain = "APM"
}

# resource to create, update, and delete alerts in New Relic
resource "newrelic_alert_policy" "alert_policy_name" {
  name                = "O11y_asCode-FoodMe-Alerts-TF"
  incident_preference = "PER_CONDITION"
}

# NRQL alert condition - Latency (static)
resource "newrelic_nrql_alert_condition" "GoldenSignals-Latency" {
  policy_id          = newrelic_alert_policy.alert_policy_name.id
  type               = "static"
  name               = "GoldenSignals-Latency"
  description        = "Alert when Latency transactions are taking too long"
  runbook_url        = "https://www.example.com"
  enabled            = true
  aggregation_method = "event_flow"
  aggregation_delay  = 60

  nrql {
    query = "SELECT average(apm.service.overview.web) * 1000 FROM Metric WHERE appName like '%FoodMe%'"
  }

  critical {
    operator              = "above"
    threshold             = 80
    threshold_duration    = 60
    threshold_occurrences = "at_least_once"
  }

  warning {
    operator              = "above"
    threshold             = 40
    threshold_duration    = 60
    threshold_occurrences = "at_least_once"
  }
}

# NRQL alert condition - Errors (static)
resource "newrelic_nrql_alert_condition" "GoldenSignals-Errors" {
  policy_id          = newrelic_alert_policy.alert_policy_name.id
  type               = "static"
  name               = "GoldenSignals-Errors"
  description        = "Alert when Errors are too high"
  runbook_url        = "https://www.example.com"
  enabled            = true
  aggregation_method = "event_flow"
  aggregation_delay  = 60

  nrql {
    query = "SELECT (count(apm.service.error.count) / count(apm.service.transaction.duration))*100 FROM Metric WHERE (appName like '%FoodMe%') AND (transactionType = 'Web')"
  }

  critical {
    operator              = "above"
    threshold             = 2
    threshold_duration    = 60
    threshold_occurrences = "at_least_once"
  }

  warning {
    operator              = "above"
    threshold             = 1
    threshold_duration    = 60
    threshold_occurrences = "at_least_once"
  }
}

# NRQL alert condition - Traffic (baseline)
resource "newrelic_nrql_alert_condition" "GoldenSignals-Traffic" {
  policy_id          = newrelic_alert_policy.alert_policy_name.id
  type               = "baseline"
  name               = "GoldenSignals-Traffic"
  description        = "Alert when Traffic transactions are odd"
  runbook_url        = "https://www.example.com"
  enabled            = true
  aggregation_method = "event_flow"
  aggregation_delay  = 60

  # baseline type only
  baseline_direction = "upper_only"

  nrql {
    query = "SELECT rate(count(apm.service.transaction.duration), 1 minute) FROM Metric WHERE (appName LIKE '%FoodMe%') AND (transactionType = 'Web')"
  }

  critical {
    operator              = "above"
    threshold             = 4
    threshold_duration    = 180
    threshold_occurrences = "at_least_once"
  }

  warning {
    operator              = "above"
    threshold             = 3
    threshold_duration    = 120
    threshold_occurrences = "at_least_once"
  }
}

# NRQL alert condition - Saturation (static)
resource "newrelic_nrql_alert_condition" "GoldenSignals-Saturation" {
  policy_id          = newrelic_alert_policy.alert_policy_name.id
  type               = "static"
  name               = "GoldenSignals-Saturation"
  description        = "Alert when Saturation is high"
  runbook_url        = "https://www.example.com"
  enabled            = true
  aggregation_method = "event_flow"
  aggregation_delay  = 60

  nrql {
    query = "SELECT average(apm.service.memory.physical) * rate(count(apm.service.instance.count), 1 minute) / 1000 FROM Metric WHERE appName LIKE '%FoodMe%'"
  }

  critical {
    operator              = "above"
    threshold             = 20
    threshold_duration    = 60
    threshold_occurrences = "at_least_once"
  }

  warning {
    operator              = "above"
    threshold             = 10
    threshold_duration    = 60
    threshold_occurrences = "at_least_once"
  }
}

#  resource to create and manage New Relic workflows
resource "newrelic_workflow" "workflow-example" {
  name                  = "workflow-example"
  account_id            = var.nr_account_id
  muting_rules_handling = "NOTIFY_ALL_ISSUES"
  destinations_enabled  = true
  enabled               = true
  issues_filter {
    name = "Filter-name"
    type = "FILTER"

    predicate {
      attribute = "accumulations.policyName"
      operator  = "CONTAINS"
      values    = ["O11y_asCode-FoodMe-Alerts-TF"]
    }
  }
  destination {
    channel_id = newrelic_notification_channel.alert_notification_email.id
  }
}

# resource to create and manage New Relic notification destinations
resource "newrelic_notification_destination" "alert_email_destination" {
  name = "email-example"
  type = "EMAIL"

  property {
    key   = "email"
    value = var.nr_email
  }
}

# resource to create and manage New Relic notification channels
resource "newrelic_notification_channel" "alert_notification_email" {
  account_id     = var.nr_account_id
  name           = "email example"
  type           = "EMAIL"
  destination_id = newrelic_notification_destination.alert_email_destination.id
  product        = "IINT"

  property {
    key   = "subject"
    value = "name: {{ alert_notification_email }}"
  }
}

Example variables.tf file complete code

# your unique New Relic account ID 
variable "nr_account_id" {
  default = "XXXXX"
}

# your User API key
variable "nr_api_key" {
  default = "XXXXX"
}

# valid regions are US and EU
variable "nr_region" {
  default = "US"
}

# your unique New Relic App ID 
variable "nr_appname" {
  default = "FoodMe-XXXXX"
}

# your email address to send notification 
variable "nr_email" {
  default = "XXXXX"
}

What the final result looks like

Now that you have deployed alerts as code, your final result should look like this:

Synthetic monitoring as code 

Synthetic monitoring is an easy way to get started with New Relic because there's not an agent for you to configure. You can start configuring synthetic monitoring with a URL for a website. Synthetic monitoring uses virtual browsers to measure how your website is performing by capturing aggregate numbers for load time, uptime, and average download size. 

You can choose the types of synthetic monitor based on what you need. Ping monitors are a simple type of monitor to start with because they check to see if apps are online. The ping synthetic monitor uses a simple HTTP client to make requests to your website. A powerful part of New Relic synthetic monitoring is that you can use aggregated metrics to look for patterns and identify why an app has performance issues. Each monitor result is stored, so you can see specifics.

Most New Relic customers start out with configuring synthetic monitoring in the UI. (See the getting started guide in our documentation.) But as you increase the volume of your synthetic monitoring throughout your environment, you’ll want to consider advantages of provisioning your synthetics configurations via code.  This tutorial shows how you can use Terraform with New Relic to automate the creation of your synthetic monitors.

Related reading: Smoke testing in production with synthetic monitors.

Create synthetic monitoring as code

To see everything we are covering in this section, watch this video. If you need a refresher on working with Terraform, go to Getting started with New Relic and Terraform. You can also work along with these steps with code samples in GitHub and the hands-on workshop in Instruqt.

Observability as code example: Automate your synthetic monitoring.

Example newrelic_synthetics_monitor resource

You can use Terraform with New Relic to automate the creation of your synthetic monitors. Let's start with this simple example from Resource: newrelic_synthetics_monitor in the Terraform registry.

resource "newrelic_synthetics_monitor" "monitor" {

	# The human-readable identifier for the monitor.
  name             = "Simple Monitor"

	# The monitor type. Valid values are SIMPLE and BROWSER.
  type             = "SIMPLE"

	# The interval at which this monitor should run. 
	# Valid values are EVERY_MINUTE, EVERY_5_MINUTES, EVERY_10_MINUTES, 
	# EVERY_15_MINUTES, EVERY_30_MINUTES, EVERY_HOUR, EVERY_6_HOURS, 
	# EVERY_12_HOURS, or EVERY_DAY.
  period           = "EVERY_15_MINUTES"

	# The run state of the monitor.	
  status           = "ENABLED"

	# The location the monitor will run from. 
	# Valid public locations are https://docs.newrelic.com/docs/synthetics/synthetic-monitoring/administration/synthetic-public-minion-ips
  locations_public = ["AP_SOUTH_1"]

	# The URI the monitor runs against.
  uri              = "https://www.one.newrelic.com"
}

For more details, see our documentation on public minion locations and location labels and the Terraform documentation about the newrelic_synthetics_monitor resource.

The next two example main.tf and variariables.tf files show the complete code for automating synthetic monitoring with New Relic and Terraform. These examples are based on code samples in the Getting Started with the New Relic Provider documentation.

Example main.tf full code

# get the New Relic terraform provider
terraform {
  required_version = "~> 1.0"
  required_providers {
    newrelic = {
      source  = "newrelic/newrelic"
    }
  }
}

# configure the New Relic provider
provider "newrelic" {
  account_id = (var.nr_account_id)
  api_key = (var.nr_api_key)    # usually prefixed with 'NRAK'
  region = (var.nr_region)      # Valid regions are US and EU
}

# resource to create, update, and delete a synthetics monitor in New Relic.
resource "newrelic_synthetics_monitor" "O11y_asCode-SimpleBrowser-TF" {
  # The human-readable identifier for the monitor.
  name = "O11y_asCode-SimpleBrowser-TF"
  # The monitor type. Valid values are SIMPLE and BROWSER.
  type = "BROWSER"

  # The interval (in minutes) at which this monitor should run.
  period = "EVERY_30_MINUTES"
  # The run state of the monitor.	
  status = "ENABLED"

  # Public minion location
  # https://docs.newrelic.com/docs/synthetics/synthetic-monitoring/administration/synthetic-public-minion-ips/#location
  locations_public = ["AP_SOUTHEAST_1"]
	
  # The URI the monitor runs against.
  uri                       = (var.nr_uri)                      
}

Example variables.tf full code

# your unique New Relic account ID 
variable "nr_account_id" {
  default = "XXXXX"
}

# your User API key
variable "nr_api_key" {
  default = "XXXXX"
}

# valid regions are US and EU
variable "nr_region" {
  default = "US"
}

# the URI the monitor runs against.
variable "nr_uri" {
  default = "XXXXX"
}

What the final result looks like

Now that you have deployed synthetic monitoring as code, your final result should look like this: