What is observability as code?
Observability as code (also known as o11y as code) is the process of automating the configuration of observability tools. You manage your infrastructure with code, so why not manage your observability the same way—and your dashboards as well?
Why is observability as code important?
Observability as code brings several benefits to the field of observability, which is crucial for understanding and maintaining complex systems. Here are some reasons why observability as code is important:
Consistency and reproducibility
Defining observability configurations in code ensures consistency across different environments. This reduces the chances of errors caused by manual configuration discrepancies between development, testing, and production environments. It enables the recreation of observability setups in a predictable and reproducible manner. This is essential for troubleshooting and debugging issues that may arise in different environments.
Version control
Observability configurations are treated as code and can be stored in version control systems (e.g., Git), allowing teams to track changes, collaborate effectively, and roll back to previous configurations if needed. Version control also provides an audit trail, helping to understand when and why specific changes were made.
Automation
Observability as code facilitates the automation of configuration tasks. Automated processes ensure that configurations are applied consistently and quickly across the entire infrastructure. This is particularly valuable in dynamic and rapidly changing environments.
Infrastructure as Code (IaC) integration
Many organizations adopt Infrastructure as Code (IaC) practices to manage their infrastructure. Observability as code can seamlessly integrate with IaC tools, allowing observability configurations to be included in the same codebase as infrastructure definitions. This enhances the overall manageability and maintainability of the system.
Collaboration and documentation
Code serves as a form of documentation. Describing observability configurations in code helps teams have a clear and centralized source of information about the monitoring and logging setup. This aids in onboarding new team members and provides a shared understanding of how the system is observed.
Scalability
As systems grow in complexity, the manual management of observability configurations becomes increasingly challenging. Observability as code supports the scalability of observability practices by allowing teams to efficiently manage configurations for large and intricate systems.
Flexibility and agility
Code-based observability allows for more flexible and agile development processes. Changes to observability configurations can be made alongside code changes, ensuring that monitoring and logging are aligned with the evolving requirements of the application.
This three-part blog series is your guide to o11y as code, providing tips, examples, and guidance. In this series, we'll walk through examples of how you can automate the configuration of your observability tools, starting with dashboards here in part one. Part one covers the basics of Terraform, how to provision a sample app, and how to create dashboards as code.
By the end of the series, you'll have worked with a total of five examples of observability as code using New Relic and Hashicorp's Terraform: dashboards as code, alerts as code, synthetic monitoring as code, tags as code, and workloads as code. You'll be working with data from the sample FoodMe restaurant ordering app. You'll be working with data from the sample FoodMe restaurant ordering app.
Typically, the content in this blog series would be presented as a workshop. We've made Implement automated configurations with observability as code a blog series, so anyone at anytime can follow along "hands-on," using sample code, the Instruqt online workshop, and videos in the blog posts. For the best experience, use all three:
- See and run the full code that we'll be walking through in this three-part series.
- Follow the hands-on workshop in Instruqt.
- Read along with this blog post, including the embedded videos.
If you haven't used Instruqt before, these two videos give an overview and some instructions:
How did we get here? Infrastructure as code
Since infrastructure as code (also known as IaC) appeared on the scene more than a decade ago, it’s become a core requirement in the modern cloud era. The terminology “as code” means treating infrastructure configuration just like we treat code, pushing configuration into source control, then carefully pushing out changes again to the infrastructure layer.
With the rise of modern distributed systems, we also see more outages, and finding the root cause of the issue can be challenging when something goes wrong. Observability fits into the new paradigm because we need to determine the internal states of our systems from their outputs. Observability uses different system outputs such as tracing, logs, and metrics to understand the internal state of the distributed components, diagnose where the problems are, and get to the root cause.
Unfortunately, the operational practices we rely on didn’t change much, and developers and operations engineers might find they still look at hundreds of alerts or dashboards. This approach leads to non-repeatable, non-standardized dashboard configurations or adjusting alerts dynamically to avoid signals fatigue and drifting from organizational best practices.
But we can use what we know about infrastructure as code to automate observability. Meet the new approach: observability as code, which treats observability configurations as code. As explained in Observability as code simplifies your life, observability as code represents a shift of intention to an auditable code-managed solution that reduces the work needed to maintain and develop a configuration.
Understand the basics of Terraform
Terraform by Hashicorp is an infrastructure as code tool that you can use to define and manage infrastructure resources in configuration files that are easily readable by humans. You can declaratively manage services and automate your changes to those services.
In most examples, a Terraform module is a set of Terraform configuration files in one directory. When you run Terraform commands directly from that single directory, it is considered the root module. Here's what it looks like, as shown in the Terraform docs:
Terraform files used in this blog series
The examples in the tutorial exercises in this blog series focus on two important files:
- The
main.tf
file contains the main set of configurations for your module. You can also create other configuration files and organize them in a way that makes sense for your project. -
The
variables.tf
file contains the variable definitions for your module. If you want others to use your module, configure the variables as arguments in the module block.
Example of a New Relic Terraform provider
Here’s an example of a New Relic provider in Terraform from Configuring the New Relic Terraform Provider.
You can also use environment variables to configure the provider, which can simplify your provider
block. Each provider has key schema attribute, such as account_id
, api_key
, and region
.
Terraform commands to remember
To initialize and run Terraform effectively, remember these four commands:
- The
terraform init
command performs initialization steps to prepare the current working directory for use with Terraform. This command is safe to run multiple times, to update the working directory with configuration changes. - The
terraform plan
command creates an execution plan, which lets you preview the changes that Terraform will make to your infrastructure. You can use this command to check whether the proposed changes match what you expect before you apply the changes. - The
terraform apply
command automatically creates an execution plan, prompting you to approve that plan, and then takes the indicated actions. Follow the prompts, and answer yes to apply the changes. Terraform will then provision the resources. - The
terraform destroy
command is a convenient way to remove all the remote objects managed by a particular Terraform configuration. Follow the prompts, and Terraform will delete all the resources.
For more information on Terraform commands, see Provisioning infrastructure with Terraform.
The examples in the next sections show key concepts in Terraform such as providers, data sources, and resources. You'll be automating configuration of New Relic dashboards to view data from the sample FoodMe restaurant app.
This blog post demo uses the newrelic_one_dashboard
resource. As an alternative, if you want to use the newrelic_one_dashboard_json
resource, see the Creating dashboards with Terraform and JSON templates tutorial.
Before you begin provisioning your first Terraform module
For this tutorial, we’re going to provision a sample app. But before you provision your first Terraform module, you’ll need to get an account ID, your user key, and point to the correct data center:
- Getting your unique account ID (account_id)
- Getting your user key (api_key)
- Pointing to the right data center (region)
This video walkthrough covers prerequisite work.
Provision the sample app
Before we work on implementing observability as code, let’s start by provisioning our sample app!
1. Generate your unique URL for the FoodMe example app with this Glitch link: glitch.com/edit/#!/remix/nr-devrel-o11yascode
2. Set the environment variables. Go to .env
and insert these values:
LICENSE_KEY
: Insert your New Relic ingest API keys.APP_NAME
: Insert your name or initials to the name of the appFoodMe-XXX
(for example,FoodMe-Jan
).
3. Preview your URI.
Go to Tools (bottom of the panel), and select Preview in a new window.
4. Record your URL.
Note your newly generated URL. You’ll use this later on in part two of the series for synthetic monitoring as code.
5. Generate some workloads. Now that you’re in the sample app, enter an example name, delivery address, and select Find Restaurants! After you are on the main page, click around to generate some workloads for the sample app. We'll need some data to look at in the dashboards.
Create dashboards as code
Now we're ready for our first observability as code example: dashboards as code. With New Relic custom dashboards, you can collect and visualize the specific data that you want to see and display in New Relic. You'll learn how to automate configuring dashboards in New Relic using Terraform.
There are three main steps. To see everything we are covering in this section, watch this video. For more details, go to Getting started with New Relic and Terraform. You can also work along with these steps with code samples in GitHub and the hands-on workshop in Instruqt.
In Terraform, each resource block describes one or more observability objects, such as dashboards, alerts, notification workflows, or workloads. We'll use examples from Resource: newrelic_one_dashboard:
1. Create a resource block and declare a type (newrelic_one_dashboard
) with a given name (exampledash
). The type and the name of the resource are the identifier for the resource, so they must be unique within a module. Here's a simple example for deploying dashboards as code in New Relic, based on Resource: newrelic_one_dashboard.
For more details on attribute reference, see the attribute reference for the newrelic provider in Terraform.
For more details on New Relic Query Language (NRQL), see syntax, clauses, and functions.
2. Next, you'll include a variables.tf
file in Terraform. You can customize Terraform modules with input variables instead of modifying the source code of the module. Then it's easy to share and reuse modules across other configurations in Terraform. At the end of this section, you'll see an example variables.tf
file.
3. Finally, you'll combine what we covered about the New Relic provider, the resources, the main.tf
file, and the corresponding variariables.tf
file to deploy dashboards as code.
The next two example main.tf
and variariables.tf
files use concepts described in Google Site Reliability Engineering, The Four Golden Signals: latency, traffic, errors, and throughput. These examples are based on code samples in the Getting Started with the New Relic Provider documentation.
Example main.tf file complete code
Example variables.tf file complete code
What the final result looks like
Now that you have deployed dashboards as code, your final result should look like this in New Relic:
Next steps
For more in depth details on observability as code, see Observability as code: automating your observability configuration to drive value and Creating dashboards with Terraform and JSON templates.
Ready to move on to part two of this series? You'll use what you learned here in part one to use alerts as code and synthetic monitoring as code! You’ll learn how to set up robust and customizable alert policies for anything that you can monitor. And then you'll learn how to automate creating synthetic monitors, virtual browsers that measure the performance of your website and capture aggregate numbers for load time, uptime, and average download size.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.