Monitoring Application Performance in Kubernetes: How New Relic Sets You Up for Success

9 min read

This post was updated from a previous version published in April 2019.

Kubernetes has emerged as the de facto standard for orchestrating and managing application containers. Virtually all the significant players in the cloud-native application space—many of them fierce competitors—have thrown their support behind Kubernetes as an industry standard.

Here at New Relic, the Kubernetes story boils down to a couple of very strategic points. First, we believe that for all of its success to date, Kubernetes is actually—and amazingly—still in its early journey towards achieving its full potential. Tomorrow’s Kubernetes environments will operate at a scale that will dwarf what we see today. As Kubernetes environments continue to scale, they will also continue to get more complex—and to present bigger challenges for efforts to monitor their performance and health.

Second, many of our customers tell us they get tremendous value by leveraging New Relic’s observability platform as they transition to running their application workloads in Kubernetes. That’s why we’re thrilled to provide instant Kubernetes observability delivered by Auto-telemetry with Pixie. You no longer need to manually instrument your applications, update your code, redeploy your apps, or navigate long organizational standardization processes. Now, you can monitor your Kubernetes clusters and workloads in minutes and debug your applications faster than ever.

New Relic and Kubernetes: A powerful pairing

Most organizations that deploy Kubernetes environments today share the same basic goal: To maximize the benefits of running containerized applications, at virtually any scale, through the use of automation, workload orchestration, and other enabling technologies. It’s difficult to overstate the importance of orchestration in general, and Kubernetes in particular, in achieving this goal. Without these capabilities, the sheer complexity of managing and optimizing containerized workloads, microservices, and related innovations—especially when deployed at scale—would easily outweigh their benefits.

While Kubernetes excels at abstracting away much of the toil associated with managing containerized workloads, it also introduces some complexities of its own. The Kubernetes environment includes an additional layer that comes into play between the application and the underlying infrastructure. The challenge of monitoring and maintaining the performance and health of these Kubernetes environments, or of troubleshooting issues when they occur, can be daunting—especially as organizations deploy these environments at massive scale.

To help you succeed, New Relic is all in on Kubernetes, which is why we have been steadily expanding and upgrading our ability to instrument, monitor, and deliver observability across your Kubernetes clusters. That begins with setting up instrumentation quickly and easily. Then you can analyze the performance and health of your clusters (nodes, containers, application, and more) to identify and troubleshoot performance issues.

Using New Relic to achieve Kubernetes observability

By design, applications typically aren’t aware of whether they’re running in a container or on an orchestration platform. A Java web application, for example, doesn’t have any special code paths that get executed only when running inside a Docker container on a Kubernetes cluster. This is a key benefit of containers (and container orchestration engines): your application and its business logic are decoupled from the specific details of its runtime environment. If you ever have to shift the underlying infrastructure to a new container runtime or Linux version, you won’t have to completely rewrite the application code.

Similarly, neither Auto-telemetry with Pixie nor our Application Performance Monitoring (APM) language agents care where an application is running. It could be running on an ancient Linux server in a forgotten rack or on the latest Amazon Elastic Compute Cloud (Amazon EC2) instance. However, when monitoring applications managed by an orchestration layer, it’s very useful for debugging or troubleshooting to be able to relate an application error trace, for example, to the container, pod, or host that it’s running on.

All of these are must-have capabilities for a Kubernetes application monitoring solution because they help you answer an all-too-common question: Does a performance problem really reside in the code, or is it actually tied to the underlying infrastructure?

Linking Kubernetes metadata to application instrumentation

Traditionally, for application transaction traces or distributed traces, New Relic APM agents tell you exactly which server the code was running on. However, this gets more complex in container environments: the worker nodes (hosts) where the application runs (in the containers/pods) are often ephemeral—they come and go. It’s fairly common to configure policies for applications running in Kubernetes to automatically scale their host count in response to traffic fluctuations, so you may investigate a slow transaction trace, but the container or host where that application was running no longer exists. Knowing the containers or hosts where the application is currently running is not necessarily an indication of where it was running 5, 15, or 30 minutes ago—when the issue occurred.

But with New Relic, you can link Kubernetes metadata to your application performance data as transaction traces in order to debug application transaction errors. And Auto-telemetry with Pixie means that you don’t even have to install an APM language agent, making this the easiest way to get started. If you’re already using language agents, continue reading for how to link this data.

There are a few compatibilities and requirements you’ll want to review before getting started. For example, your cluster needs to have the MutatingAdmissionWebhook controller enabled (it might not be enabled by default). And be sure to review which APM agents can be linked to Kubernetes metadata.

To inject Kubernetes metadata into your agent:

  1. Download the following yaml file:

    curl -O http://download.newrelic.com/infrastructure_agent/integrations/kubernetes/k8s-metadata-injection-latest.yaml

  2. Edit this file, replacing <YOUR_CLUSTER_NAME> with the name of your cluster.
  3. Apply the yaml file to your Kubernetes cluster:

    kubectl apply -f k8s-metadata-injection-latest.yaml

After you have the injection file in place, you can optionally limit the injection of metadata to specific namespaces in your cluster, or configure the metadata injection to work with custom certificates, if you’re using them. For more information, including steps for validating and troubleshooting metadata injection, see our documentation.

Note: Our Kubernetes metadata injection project is open source. Here's the code to link APM and Infrastructure  and the code to automatically manage certificates.

Exploring application performance in Kubernetes

After deploying the metadata injection file to our Kubernetes cluster, custom parameters from Kubernetes begin to appear in the New Relic UI.

In the following screen capture, you can see that  the transaction attributes show the Kubernetes hostname and the IP address where the error occurred, among other details:

Kubernetes pod metadata exposed

The same metadata shows up under Distributed Trace attributes.

Next, we queried our data to see the performance of application transactions based on pod names by writing the following custom New Relic Query Language (NRQL) query:

SELECT percentile(duration, 95) from Transaction where
appName='newrelic-k8s-node-redis' and
name='WebTransaction/Expressjs/GET//'FACET K8S_POD_NAME TIMESERIES
auto

And here’s the result:

Custom transaction query

Kubernetes exposes a good deal of useful metadata you can use to gain helpful information about performance outliers and track down individual errors.

In the following example, notice how the APM error profile automatically identifies that nearly 57% of errors come from the same pods and pod IP addresses:

APM error profile

APM error profiles automatically incorporate custom parameters and use different statistical measures to determine if an unusual number of errors is coming from a certain pod, IP, or host within the container cluster. From there, you can zero in on infrastructure- or cluster-specific root causes of the errors (or maybe you’ll just discover some bad code).

APM and the Kubernetes cluster explorer

If you start investigating issues beginning with your infrastructure, Kubernetes cluster explorer shows you application performance metrics in a single, connected experience. If you identify issues within a node or a pod that aren’t driven by the underlying infrastructure, the root cause might lie within your application, impacting throughput, response time, error rate, and violations.

Kubernetes Cluster Explorer Applications in a Pod

Use Kubernetes cluster explorer to get visibility into applications and infrastructure metrics.

Effectively monitoring applications running in Kubernetes requires not just code-level visibility into applications, but also the ability to correlate applications with the containers, pods, or hosts in which they’re running. By identifying pod-specific performance issues for an application’s workload, you can troubleshoot performance issues faster and have confidence that your application is always available, running fast, and doing what it is supposed to do.

Kubernetes cluster explorer APM overview page

Kubernetes cluster explorer gives you an understanding of how your application is performing. Clicking on the application (in this case, Order-Packaging) opens the APM overview page where you can troubleshoot and debug that specific application.

Your customers may not care about Kubernetes … but you must

Your customers likely don’t care if you’re using traditional virtual machines; a bleeding-edge, multi-cloud federated Kubernetes cluster; or a home-grown artisanal orchestration layer. They just want the applications and services you provide to be available, reliable, and fast.

But with the rise of Kubernetes, you need the ability to understand and explore your cluster and application performance quickly so you can deliver the best experiences to your customers.