Why OpenTelemetry? Why bother entering yet another new realm in the observability universe? What is even OpenTelemetry anyway?

Let’s see how the community defines it: OpenTelemetry is an Observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs. Crucially, OpenTelemetry is vendor- and tool-agnostic, meaning that it can be used with a broad variety of Observability backends, including open-source tools like Jaeger and Prometheus, as well as commercial offerings. OpenTelemetry is a Cloud Native Computing Foundation (CNCF) project.

Sounds cool. Reading the community mission, vision, and values, it sounds even cooler. But still, we probably can find a ton of CNCF sandbox projects that have very similar goals. What makes OpenTelemetry (also referred to as OTel) so special?

Check out how the project has been evolving over time and which companies are actively contributing to it; you can see how it’s growing and what it will be capable of in the coming years.

So, the actual question is: Why not be a part of it right now?

Understanding Kubernetes

The purpose of this blog isn’t to discuss what most of us already know about Kubernetes, but a bit of context might be helpful.

Kubernetes is meant to be a highly available platform. To achieve that, it’s designed to offload various responsibilities to multiple scalable components. Let’s take a quick look at the summary of the official docs:

There are quite a few moving parts, and they should operate successfully so that your workloads can also run smoothly. Fortunately, these components expose their performance insights in a detailed manner so that you can have a broader perspective of what’s happening in your cluster. Even better, they all use a common language: Prometheus.

Collecting cluster-wide metrics

Since Kubernetes talks to us in Prometheus, why wouldn’t we want to talk back the same? Although it sounds straightforward, it definitely isn’t.

For starters, Prometheus operates based on a pull-mechanism, meaning you’re responsible for defining which endpoints Prometheus needs to find and scrape the metrics it exposes. That’s quite a challenge considering that Kubernetes is by its nature extremely dynamic in the sense of creating and removing containers rapidly.

Fortunately, Prometheus has a built-in service discovery for Kubernetes that you can configure different scrape jobs for so that your Prometheus instances can determine where to check in advance.

What about scalability?

As our cluster grows larger and larger, one instance of Prometheus might require a ton of resources and thereby a huge machine to run on. We would rather have multiple smaller instances running on smaller machines. However, since we’re pulling the metrics, having multiple instances only means that we would scrape the same endpoints multiple times.

All that being said, now is a proper time to introduce the OpenTelemetry collector. Briefly summarized, the OTel collector consists of three major components: receivers, processors, and exporters. By using various options of these three components, you can build various pipelines to handle the telemetry data and send them to different backends according to your needs.

How can we leverage the OTel collector to talk to Kubernetes?

It comes with a Prometheus receiver, which we can configure just like we can configure a Prometheus server, meaning that we can simply use the same scrape job definitions. The nice thing about it is that it doesn’t need to store the fetched metrics locally (other than its memory). You can think of it as Prometheus in agent mode.

So far, so good. Yet, the sharding problem still remains: we still cannot scale our OTel collectors horizontally. That’s where the Target Allocator comes to the rescue. The Target Allocator sits in front of your stateful OTel collectors and manipulates their Prometheus configurations, depending on to-be-scraped targets, by distributing them across the instances. Every endpoint gets scraped by one OTel collector instance at a time. To learn more about how scraper scaling works, refer to the official documentation.

We talked a lot about how Prometheus works, how to leverage OTel collectors, and how to scale them, though we haven’t said what we actually need to collect. Let’s see what Kubernetes has to offer.

First, every control plane component exposes themselves in their dedicated ports and provides us with their metrics under /metrics endpoint. For example, let’s take a look at a scrape job config of the kube-api-server:

scrape_configs:
- job_name: 'kubernetes-apiservers'
  scrape_interval: 60s
  kubernetes_sd_configs:
    - role: endpoints
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;https

Second, we need visibility on the machines (nodes). A popular and mature product for collecting and exposing node metrics is Node Exporter. We can run this as a daemonset on every (Linux) node and scrape what it collects from the underlying machine as follows:

scrape_configs:
- job_name: 'kubernetes-node-exporter'
  scrape_interval: 60s
  honor_labels: true
  kubernetes_sd_configs:
    - role: endpoints
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
      action: drop
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: (.+?)(?::\d+)?;(\d+)
      replacement: $$1:$$2
    - source_labels: [__meta_kubernetes_service_name]
      action: keep
      regex: <NODE_EXPORTER_SERVICE_NAME>
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: service
    - source_labels: [__meta_kubernetes_pod_node_name]
      action: replace
      target_label: node

Now that we have the machine level metrics, we can go another level up to monitor our containers. The container metrics are collected by the tool cAdvisor (Container Advisor), which runs on every node. We can scrape them as follows:

scrape_configs:
- job_name: 'kubernetes-nodes-cadvisor'
  scrape_interval: 60s
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
    - role: node
  relabel_configs:
    - target_label: __address__
      replacement: kubernetes.default.svc:443
    - source_labels: [__meta_kubernetes_node_name]
      regex: (.+)
      target_label: __metrics_path__
      replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor

Last but not least, we now need to know the state of the components in our cluster. The tool kube-state-metrics asks various endpoints for that information and generates metrics, which we can again scrape with the Prometheus receiver of our OTel collectors:

scrape_configs:
- job_name: 'kubernetes-kube-state-metrics'
  scrape_interval: 60s
  honor_labels: true
  kubernetes_sd_configs:
    - role: endpoints
  relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
      action: drop
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: (.+?)(?::\d+)?;(\d+)
      replacement: $$1:$$2
    - source_labels: [__meta_kubernetes_service_name]
      action: keep
      regex: <KUBE_STATE_METRICS_SERVICE_NAME>
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: service
    - source_labels: [__meta_kubernetes_pod_node_name]
      action: replace
      target_label: node

At this point, there’s no need to state that we simply can create a dedicated scrape job for every single component that we want to monitor.

What else can we get from the cluster?

On top of the metrics that we’ve successfully collected with the Prometheus receiver, we can also collect events and logs from our clusters. (Note: Traces will not be covered in this blog.)

Let’s start with events. Kubernetes fires events whenever a creation, deletion, or mutation happens in the cluster. Those events can be fetched by querying the kube-api-server. Fortunately, the OTel collector has a receiver for that: the k8sevents receiver. (Note: It’s currently being discussed to deprecate this receiver and use the k8sobject receiver in the future.)

You can run a single collector instance to fetch the Kubernetes events and forward them to New Relic as logs. Beware: Multiple instances of the k8sevent receiver will fetch and send the same events to New Relic.

What about logs? The OTel collector includes a filelog receiver, which you can run as a daemonset and mount the log directory of the node to it. You can use various regular expressions (regex) to collect, parse, filter, and enrich the logs from any component in your cluster.

Deploying OTel collectors and monitoring with New Relic

We know where to find the required telemetry data and how to collect them. Now we need to:

  1. Create a strategy to deploy our OTel collector instances with necessary configurations.
  2. Build proper dashboards and alerts with extensive New Relic Query Language (NRQL) queries to monitor our clusters.

Good news! There’s a repository that contains everything we’ve talked about so far:

  • A public Helm chart that you can use to roll out your OTel collectors.
  • A Terraform deployment that you can use to bootstrap a pre-built monitoring into your New Relic account.

For more information, check out the following resources:

  • Read this documentation to understand why we need various types (daemonset, deployment, statefulset, singleton) of collectors.
  • Refer to this documentation to deploy the monitoring solution onto your cluster.
  • Learn how to properly query and make sense out of the collected Prometheus data per this documentation.

Refer to this documentation to create dashboards and alerts in your New Relic account.