New Relic Now Start training on Intelligent Observability February 25th.
Save your seat.
현재 이 페이지는 영어로만 제공됩니다.

Deploying the OpenTelemetry Collector to ensure full coverage for telemetry in your Kubernetes cluster can be a daunting task. Determining the right configuration for the Collector in your Kubernetes environment requires not just technical know-how, but also a deep understanding of your observability needs and the data flow within your system. 

A good way to simplify this process is to familiarize yourself with "Collector deployment modes"—the various methods for setting up and managing the Collector to gather, process, and export application and system data within Kubernetes. It’s important to note that “deployment modes” differ from “deployment patterns,” a distinction that can be confusing. 

This blog post guides you through these key concepts so you’ll have the foundational knowledge you need to choose the right deployment mode for your observability strategy. Here's what we’ll cover: 

  • The difference between Collector deployment patterns and modes.
  • Why Collector deployment modes matter.
  • The most common Collector deployment modes used in Kubernetes environments.

Collector deployment patterns versus modes

Collector deployment pattern refers to the high-level architectural approach for how you structure your overall telemetry collection strategy using the Collector (or not). 

Generally, the main patterns are:

  • No Collector: You send your telemetry directly from your application to your backend. This approach is often best for smaller applications that don’t require data processing, or for testing purposes. 
  • Agent: The simplest way to deploy a Collector is to route your application telemetry to a Collector agent, which then handles any processing as needed before exporting the data to one or more backends. This pattern typically refers to running the Collector on the same node as the application, or as a sidecar container in the application pod. 
  • Gateway: This pattern creates a centralized collection point. Applications or other Collectors send their telemetry to a single OpenTelemetry Protocol (OTLP) endpoint backed by one or more Collector instances running as a standalone, yet scalable, service. 
  • Layered: This pattern refers to a setup of two layers of Collectors, typically in which the first layer Collector is configured with the Trace ID/Service-name aware load-balancing exporter, and the second layer of Collectors handle the scaling out. When using the tail sampling processor and the load-balancing exporter, this pattern supports load-balancing tail-sampled spans so that all spans with the same trace ID reach the same Collector instance, preventing fragmented traces. 

Collector deployment mode refers to the specific Kubernetes mechanisms you use to implement these patterns in your Kubernetes environment. They're the "how" of running your collectors in Kubernetes: 

  • Deployments/StatefulSets run standalone Collector pods.
  • DaemonSets ensure one Collector pod per node.
  • Sidecars run a Collector in a container adjacent in the same pod as the application container.

To illustrate the distinction between patterns and modes, let’s say you want to combine the agent and gateway patterns as your strategy. To implement this in Kubernetes, you might configure the following modes for the Collectors being deployed in your cluster: 

  • DaemonSet for your agent-level Collectors.
  • Deployment for your gateway Collectors.

Note: Yes, there is a deployment mode named “Deployment.” See this configuration example for further information. 

Why is it important to consider deployment modes? 

Each mode has its own benefits and trade-offs, depending on the specific needs and architecture of your system. Additionally, as we just discussed, modes can be combined to fit different use cases. 

Consider the following: 

  • Scalability needs: How much data do you expect to handle? This will influence the number of Collector instances you need. 
  • Performance requirements: How important is low latency data collection and processing to you? 
  • Fault tolerance and high availability: Likely, a high degree of resiliency and reliable data collection is a top priority for you.
  • Operational complexity: What does your team’s capacity for managing multiple instances and configurations look like? Do you require granular configuration options? 
  • Resource utilization: What computational and network resources are available to you? Can the capacity and structure of your network support scaling your Collector instances? 

Another aspect to consider is that running the Collector as an agent, whether as a sidecar or DaemonSet, can be expensive from a resource standpoint since there are many Collector instances. However, because the Collector is running on the same physical node as the application, it means that telemetry exports from the app are much faster than if the Collector were in a different node, or external to the cluster. Additionally, this allows easier association of contextual information, such as the host ID attribute, to be appended onto the telemetry data before it arrives at the backend. 

Collector deployment modes in Kubernetes

In a Kubernetes environment, the primary deployment modes you can apply to the Collector are Deployment for a centralized collector, and DaemonSet for a distributed collector where an instance runs on each Kubernetes node, allowing for collection from individual hosts. You can also use StatefulSet mode for a controlled number of replicas with predictable names, depending on your specific needs. 

Deployment mode

Installing your Collector in Deployment mode is commonly used for collecting cluster-level data, such as:

  • Kubernetes events, which are useful for troubleshooting issues within the cluster.
  • Control plane metrics to gain visibility into the performance of the control plane components.
  • kube-state-metrics, which provides metrics about the state of various Kubernetes objects in the cluster.

Because these data sources represent the entire cluster, using DaemonSet mode is not recommended as you could have multiple Collector instances that are all trying to collect the same metrics from the same endpoints.

However, there are a few things to keep in mind with Deployment mode. One is that it can increase network overhead and latency, as the data must travel to the centralized Collectors, potentially causing bottlenecks during periods of high traffic. Additionally, to scale your Deployment mode Collector, you’ll also need to consider load balancing, which may require complex configurations. If you don’t scale your Collector, it may lead to a single point of failure, risking data loss if it becomes unavailable or overwhelmed. 

These considerations are why teams often implement multiple collectors running in different modes. Each collector serves a specific use case where these limitations are particularly impactful. 

DaemonSet mode

In DaemonSet mode, the Collector runs as a pod on every node in your Kubernetes cluster, which makes it ideal for collecting infrastructure performance metrics, such as system metrics, local log collection, or network monitoring. In the context of applications, it provides a node-local endpoint for your telemetry, which reduces latency and the possibility of network errors.

The following image shows the architecture of the Collector deployed as a DaemonSet:

Diagram that shows the architecture of the OpenTelemetry Collector deployed as a DaemonSet.

You have a couple options for collecting metrics from your Kubernetes nodes. The first one is the Host Metrics receiver, which collects essential system metrics directly from the node, such as CPU usage, memory, and disk input/output (I/O). Use this component if you want basic system metrics; plus, it’s easy to set up and is supported out of the box in OpenTelemetry. We recommend checking out the examples in this repository, which demonstrate how to use this receiver. 

If you want a broader set of metrics from your nodes, another option is to pair the Prometheus receiver and Prometheus Node exporter. This combination offers over 300 unique metric names, including more advanced system metrics not available through the Host Metrics receiver. However, a large number of metrics can lead to significant costs:

  • High cardinality and data storage: Managing a large volume of metrics can increase costs for storage and processing.
  • Operational complexity: Requires fine-tuning alerts and queries to avoid generating excessive, non-actionable data.

To gather workload metrics, you can use the Kubelet Stats receiver, which fetches metrics directly from the Kubernetes Kubelet on each node. It’s an efficient way to directly monitor Kubernetes containers without needing additional tools. It provides useful metrics, such as container CPU and Memory usage and pod-level resource allocation. 

Alternatively, you can rely on the Prometheus receiver to scrape metrics from sources like Kubernetes cAdvisor (container-level metrics aggregation). These metrics are commonly used in existing observability setups.

When deploying the Collector as a DaemonSet, you can configure the File Log receiver to capture container logs written to a node’s file system. However, you should consult with your application developers as they could already be shipping logs directly from the app to the Collector. 

The key difference between Deployment and DaemonSet is in their deployment strategy: a standard OpenTelemetry Collector Deployment provides a centralized, scalable collection point, while a DaemonSet Collector ensures ubiquitous, node-level data collection. You’ll often see organizations use a combination of these deployment patterns to create a comprehensive observability strategy that captures both service-level and infrastructure-level telemetry data.

StatefulSet mode

Deploying the Collector in StatefulSet mode supports use cases where persistent storage or a stable, unique network identity is needed.  Two of the most common use cases for running the Collector in StatefulSet mode are: 

  • Large Kubernetes clusters with many Prometheus endpoints: A StatefulSet can be used in combination with the Target Allocator to evenly distribute Prometheus targets across a pool of collector instances.
  • Load balancing and tail sampling: StatefulSets are ideal when you need data persistence and stable host names for processors like the load balancing exporter. For tail sampling to be successful, all spans for a given trace must be sent to the same Collector instance. 

Putting it all together

While there are numerous ways to architect your OpenTelemetry Collectors in Kubernetes, we've found this example to be one of the more universal “patterns.” When implementing OpenTelemetry in Kubernetes, you'll likely need to layer or chain collectors to satisfy all use cases. As with most complex architectures, there are various caveats and considerations to think about, but hopefully this will at least provide food for thought and a solid starting point. 

Diagram showing the architecture of multiple layers of Collectors, including a Collector gateway behind a load balancer.

Deployment 

The Deployment collector runs standalone and is responsible for collecting "cluster-wide" data, including Kube State Metrics, Kubernetes Events, and API Server Metrics. By collecting this data with a Deployment, you can safeguard against data duplication that might occur if you tried to run these configurations as part of a DaemonSet.

DaemonSet

The DaemonSet collector runs on each node as an agent. It's responsible for collecting Kubernetes node metrics, workload metrics from Kubelet and cAdvisor, as well as logs from the node file system. Applications can also send their traces to the agent co-located on the same node.

Gateway

A Collector gateway is used for final processing and data transformations, and has the ability to fan data out to multiple backends. This can be a Deployment or a StatefulSet. A load balancer, either off the shelf or a Collector running the load-balancing exporter (for example, for tail-sampling use cases) sits in front of the Collector gateway to ensure telemetry is evenly distributed across the gateway collector instances. 

Additional considerations

While each collector deployment mode has its own advantages and disadvantages to consider, here are some additional items one should take into account when architecting your collectors:

  • Tenancy: Are your clusters single-tenant or multi-tenant? How and where will data be routed based on this?
  • Cluster architecture: Do you plan to have a small number of large nodes with many pods per node or many smaller nodes with fewer pods per node?
  • Data egress and management: Do you want to route all telemetry through a single gateway collector within each cluster for easier data management or process data in an external gateway collector?

As you build and test your collector architecture, we strongly recommend enabling the OpenTelemetry Collector's internal telemetry, also known as "self metrics," in your collector config. These metrics can be extremely helpful in identifying collector bottlenecks, performance issues, or situations where you may be unknowingly dropping telemetry within your pipelines.

The following image shows an example of an OpenTelemetry Collector Data Flow dashboard in New Relic that focuses on some of the key metrics, such as:

  • Accepted and refused spans, metrics, and logs
  • Failed exports for spans, metrics, and logs
  • Export ratio for spans, metrics, and logs
  • Exporter queue capacity and size
Screen shot of a Collector Flow dashboard in New Relic.

Summary

The OpenTelemetry Collector has become a go-to tool for collecting observability telemetry from applications and Kubernetes environments. Considering it’s a versatile, vendor-neutral solution that can handle a wide range of tasks, that’s not too surprising. However, there’s definitely a bit of a learning curve. 

Setting up and managing collectors at scale takes some planning. By getting a good grasp on configuration basics, exploring the different deployment modes, and understanding how to chain collectors together when needed, you'll be ready to take your Kubernetes observability to the next level with the Collector. 

You can also check out the OpenTelemetry Collector Kubernetes distribution, which bundles the components you’d commonly use to monitor Kubernetes with OpenTelemetry.