New Relic Now Start training on Intelligent Observability February 25th.
Save your seat.
Derzeit ist diese Seite nur auf Englisch verfügbar.

Tools such as Prometheus and OpenTelemetry help us monitor the health, performance, and availability of our complex distributed systems. Both are open-source projects under the Cloud Native Computing Foundation (CNCF) umbrella. But what role does each play in observability?

OpenTelemetry (OTel for short), is a vendor-neutral open standard for instrumenting, generating, collecting, and exporting telemetry data. Prometheus is a fixture of the observability landscape, widely relied upon for monitoring and alerting within organizations.

While both Prometheus and OTel emit metrics, there’s a lot to cover on the differences and similarities, and it’s out of scope for this blog post. Rather, we want to show you how OTel supports Prometheus, specifically in a Kubernetes (K8s) environment. You'll learn:

  • How the OTel Collector's Prometheus Receiver can be used to ingest Prometheus metrics.
  • Alternative methods for Prometheus metric collection through OTel-native options such as the K8s cluster receiver and Kubelet stats receiver.
  • How the Target Allocator (TA) can be used for Prometheus service discovery, and how it ensures an even distribution of Prometheus targets.

Finally, learn about some New Relic options for your Prometheus and OTel data.

OTel and Prometheus

Since OTel is primarily focused on the instrumentation part of observability, it doesn't provide a backend for storing telemetry; you have to forward the data to a backend vendor for storage, alerting, and querying.

Prometheus, on the other hand, provides a time-series data store you can use for your metrics, in addition to instrumentation clients. You can view graphs and charts, set up alerts, and query your data via their web user interface. It also encompasses a data format known as Prometheus text-based exposition format.

Prometheus data is stored as a dimensional time-series, meaning that the data has attributes (for example, labels or dimensions) and a timestamp.

The Prometheus server collects Prometheus metrics data from targets defined in a configuration file. A target is an endpoint that exposes metrics for the Prometheus server to store.

Prometheus is so ubiquitous in the monitoring space that many tools natively emit metrics in Prometheus format, including Kubernetes and HashiCorp's Nomad. For those that don’t, there are a number of vendor- and community-built Prometheus exporters to aggregate and import data into Prometheus.

While you can use Prometheus to monitor a variety of infrastructure and application metrics, one of its most popular use cases is to monitor Kubernetes. This is the aspect of Prometheus monitoring that we will focus on in this article.

Prometheus metrics with OpenTelemetry

In this section, you’ll learn about a few OTel Collector components that demonstrate the interoperability between OTel and Prometheus.

The OTel Collector

First, let’s do a quick refresher on the Collector; it’s an OTel component that can be used to collect telemetry from multiple sources and export data to multiple destinations. The Collector also handles telemetry processing, such as modifying data attributes and scrubbing personally identifiable information. For example, you can use Prometheus SDKs to generate metrics, ingest them with the Collector, do some processing (if desired), and then forward them to your chosen backend.

Prometheus receiver

The Prometheus receiver allows you to collect metrics from any software that exposes Prometheus metrics. It serves as a drop-in replacement for Prometheus to scrape your services, and supports the full set of configurations in scrape_config.

If you’re interested in exemplars, which is a recorded value that associates OTel context with a metric event, you can also use the Prometheus receiver to ingest them in the Prometheus format and convert it to OpenTelemetry Protocol (OTLP) format. This enables you to correlate traces with metrics. 

Something to consider with this component is that it’s under active development; as such, it has several limitations, including that it’s a stateful component. Additionally, the OTel community recommends to not use this component when multiple replicas of the Collector are running, because in this state:

  • The Collector is unable to auto-scale the scraping.
  • Replicas running with the same config will scrape the targets multiple times.
  • Each replica will need to have a different scraping config to manually shard the scraping.

Exporters

For exporting metrics from the OTel Collector to Prometheus, you have two options: the Prometheus exporter and the Prometheus remote write exporter.

The Prometheus exporter allows you to ship data in the Prometheus format, which is then scraped by a Prometheus server. It's used to report metrics via the Prometheus scrape HTTP endpoint. You can learn more by trying out this example. However, the scraping won't really scale because all the metrics are sent in a single scrape.

To get around the scaling concern, you can alternatively use the Prometheus remote write exporter, which allows you to push data to Prometheus from multiple Collector instances with no issues. Since Prometheus supports remote write ingestion, you can also use this exporter to generate OTel metrics and ship them to a backend that is compatible with Prometheus remote write. Learn more about the architecture of both exporters here.

The target allocator 

Scalability, which is the ability to effectively maintain performance and resource allocation while managing an increasing number of monitored targets and metrics, is a common challenge with Prometheus. One option to help with this is sharding the workload based on labels or dimensions, which means using multiple Prometheus instances to handle your metrics according to specific parameters. This could help decrease the burden on individual instances. However, there are two things to consider with this approach.

The first is that to get around querying sharded instances, you need a management instance. This means that you need to have N+1 Prometheus instances, where the +1’s memory is equal to N, thereby doubling your memory requests. Secondly, Prometheus sharding requires that each instance scrape the target, even if it’s going to be dropped.

Something to note is that if you can have a Prometheus instance with the combined amount of memory of individual instances, there’s not much benefit to sharding, since you can scrape everything directly using the larger instance. A reason that people shard is usually for some amount of fault tolerance. For example, if one Prometheus instance is out of memory (OOM), then your entire alerting pipeline won't be offline.

Luckily, the OTel Operator’s TA is able to help with some of this. For instance, it can automatically drop any targets it knows won’t be scraped. Whereas if you shard with hashmod, you'll need to update your config based on the number of replicas you have. Plus, if you’re already collecting Prometheus metrics about your Kubernetes infrastructure, using the TA is a great option.

The TA is part of the OTel Operator. The OTel Operator is a Kubernetes Operator that:

In fact, the Operator creates two new custom resource (CR) types in Kubernetes to support this functionality: the OpenTelemetry Collector CR and the Auto-Instrumentation CR.

The TA is an optional component of the Operator’s OTel Collector management capabilities that serves as a mechanism for decoupling the service discovery and metric collection functions of Prometheus such that they can be scaled independently. The OTel Collector manages Prometheus metrics without needing to install Prometheus. The TA manages the configuration of the Collector’s Prometheus Receiver, but these are its two main functions:

  • Even distribution of Prometheus targets among a pool of OTel Collectors
  • Discovery of Prometheus custom resources.

Let’s dig into each of these.

Even distribution of Prometheus targets

The TA’s first job is to discover targets to scrape and OTel Collectors to allocate targets to. It does so as follows:

  1. The TA finds all of the metrics targets to scrape.
  2. The TA finds all of the available Collectors.
  3. The TA determines which Collectors scrape which metrics.
  4. The Collectors query the TA to find out what metrics to scrape.
  5. The Collectors scrape their assigned targets.

This means that the OTel collectors--not a Prometheus scraper--collect the metrics.

A target is an endpoint that supplies Metrics for Prometheus to store. A scrape is the action of collecting metrics through an HTTP request from a targeted instance, parsing the response, and ingesting the collected samples to storage.

Discovery of Prometheus custom resources

The TA’s second job is to provide the discovery of Prometheus Operator CRs, namely the ServiceMonitor and PodMonitor

In the past, all Prometheus scrape configurations had to be done via the Prometheus Receiver. When the TA’s service discovery feature is enabled, the TA simplifies the configuration of the Prometheus receiver, by creating scrape configurations in the Prometheus Receiver from the PodMonitor and ServiceMonitor instances deployed in your cluster.

Even though Prometheus isn’t required to be installed in your Kubernetes cluster to use the TA for Prometheus CR discovery, the TA does require that the ServiceMonitor and PodMonitor be installed. These CRs are bundled with Prometheus Operator; however, they can be installed standalone as well. The easiest way to do this is to grab a copy of the individual PodMonitor YAML and ServiceMonitor YAML custom resource definitions (CRDs).

OTel supports the PodMonitor and ServiceMonitor Prometheus resources because these are widely used in Kubernetes infrastructure monitoring. As a result, the OTel Operator developers wanted to make it easy to add them to the OTel ecosystem.

Note that the PodMonitor and ServiceMonitor are not useful for cluster-wide metrics collection, such as for Kubelet metrics collection. In that case, you still have to rely on Prometheus scrape configs in the Collector’s Prometheus Receiver. See this documentation to learn about configuring the TA. 

Additional OTel components for Kubernetes

There are additional OTel collector components you can use to capture Kubernetes metrics:

Receiving data:

Processing data:

  • Kubernetes Attributes Processor: Adds Kubernetes context, thereby enabling you to correlate application telemetry with your Kubernetes telemetry. This is one of the most important components for monitoring Kubernetes with OpenTelemetry. 

You can also use the Kubernetes Attributes Processor to set custom resource attributes for traces, metrics, and logs using the Kubernetes labels and Kubernetes annotations you’ve added to your pods and namespaces.

There are a few more Collector components you can implement to monitor Kubernetes, including Kubernetes-specific ones as well as general-use processors, such as the batch, memory limiter, and resource processors. You can read more about them here

New Relic integrations for Prometheus and Kubernetes

New Relic offers a few Prometheus integrations:

New Relic recommends getting started with the remote write integration if you already have a Prometheus server. Learn more about which option is best for you here. Regardless of which solution you choose, you get the following benefits:

  • New Relic provides more nuanced security and user management options.
  • You can use the New Relic database (NRDB) as the centralized long-term data store for your Prometheus metrics, and consolidate your observability tooling.
  • You can use query tools such as Grafana via the Prometheus’ API of New Relic.
  • New Relic supports executing queries to scale.

New Relic also offers a Kubernetes integration solution for monitoring your Kubernetes cluster. If you’re already using New Relic APM agents to monitor your services, you can surface Kubernetes metadata and link it to your APM agents. This enables you to learn more about any issues and troubleshoot transaction errors. 

Alternatively, if you have apps instrumented with OTel and running in Kubernetes, you can use the New Relic UI to correlate app-level metrics from OTel with Kubernetes infrastructure metrics. This will enable you to have a holistic view of your telemetry data, leading to more effective work across teams and, ultimately, faster mean time to resolution for any issues in your Kubernetes environment. Learn how to link your OTel apps to Kubernetes here.

Conclusion

Prometheus maintainers have also been further developing the interoperability between the two projects from the Prometheus side to make it easier for it to be the backend for OTLP metrics. For instance, Prometheus can now accept OTLP, and soon, you’ll be able to use Prometheus exporters to export OTLP. So if you have a service instrumented with a Prometheus SDK, you’ll be able to push OTLP and take advantage of the rich Prometheus exporter ecosystem for OTel users. The maintainers are also working on adding support for delta temporality. This component will aggregate delta samples to their respective cumulative counterparts. Read more about Prometheus' commitment to OTel here.

However you decide to use OTel to gather Prometheus metrics, ultimately what’s right for your organization depends on your business needs. Using the OTel components discussed previously, you could convert all your metrics into the  Prometheus format, or you could convert your Prometheus metrics into OTLP. Since New Relic also provides out-of-the-box solutions for your Prometheus data, as well as a platform UI that supports OTLP natively, you also have this option.