This post was originally published on November 27, 2017. It’s since been updated with information on the New Relic cluster explorer and steps for injecting Kubernetes metadata into APM agents.
Over the last few years, Kubernetes has emerged as the de facto standard for orchestrating and managing application containers. Virtually all the significant players in the cloud-native application space—many of them fierce competitors—have thrown their support behind Kubernetes as an industry standard.
Here at New Relic, the Kubernetes story boils down to a couple of very strategic points. First, we believe that for all of its success to date, Kubernetes is actually—and amazingly—still in its early journey towards achieving its full potential. Tomorrow’s Kubernetes environments will operate at a scale that will dwarf what we see today. As Kubernetes environments continue to scale, they will also continue to get more complex—and to present bigger challenges for efforts to monitor their performance and health.
Second, many of our customers tell us they see tremendous value in leveraging the New Relic platform to transition to and run their application workloads in Kubernetes. So we have been working hard to expand and upgrade the New Relic platform with a new generation of monitoring tools designed for Kubernetes environments. These capabilities will help our customers to keep moving quickly and to act with confidence when they adopt Kubernetes to manage and orchestrate their containerized workloads.
New Relic and Kubernetes: A powerful pairing
Most organizations that deploy Kubernetes environments today share the same basic goal: To maximize the benefits of running containerized applications, at virtually any scale, through the use of automation, workload orchestration, and other enabling technologies. It’s difficult to overstate the importance of orchestration in general, and Kubernetes in particular, in achieving this goal. Without these capabilities, the sheer complexity of managing and optimizing containerized workloads, microservices, and related innovations—especially when deployed at scale—might easily outweigh their benefits.
While Kubernetes excels at abstracting away much of the toil associated with managing containerized workloads, it also introduces some complexities of its own. The Kubernetes environment includes an additional layer that comes into play between the application and the underlying infrastructure. The challenge of monitoring and maintaining the performance and health of these Kubernetes environments, or of troubleshooting issues when they occur, can be daunting—especially as organizations deploy these environments at massive scale.
This is where the New Relic platform enters the picture as a powerful and uniquely capable complement for today’s Kubernetes environments. New Relic is all in on Kubernetes, which is why we have been steadily expanding and upgrading our ability to instrument, monitor, and support full observability across Kubernetes environments. This includes giving you the ability to move quickly and easily from high-level, holistic views of the performance and health of Kubernetes clusters, down to the node, container, and application-level visibility required to identify and troubleshoot performance issues. It also includes the ability to instrument and monitor Kubernetes infrastructure alongside applications, which critical for understanding dependencies and isolating potential performance issues.
Using New Relic to achieve end-to-end Kubernetes monitoring
With that in mind, we’d like to introduce you to some of the most powerful capabilities currently being introduced in the New Relic platform for monitoring Kubernetes applications and resolving performance issues in these environments. In particular, we’ll demonstrate how to surface Kubernetes-specific metadata by injecting it into APM agents, so you can explore performance issues and troubleshoot application transaction errors.
The importance of an end-to-end view
By design, applications typically aren’t aware of whether they’re running in a container or on an orchestration platform. A Java web application, for example, doesn’t have any special code paths that get executed only when running inside a Docker container on a Kubernetes cluster. This is a key benefit of containers (and container orchestration engines): your application and its business logic are decoupled from the specific details of its runtime environment. If you ever have to shift the underlying infrastructure to a new container runtime or Linux version, you won’t have to completely rewrite the application code.
Similarly, New Relic APM’s language agent—which instruments the application code to track rich events, metrics, and traces—doesn’t care where the application is running. It could be running in an ancient Linux server in a forgotten rack or on the latest 72-CPU Amazon EC2 instance. When monitoring applications managed by an orchestration layer, however, being able to relate an application error trace, for instance, to the container, pod, or host it’s running in can be very useful for debugging or troubleshooting.
All of these are must-have capabilities for a Kubernetes application monitoring solution because they help our customers answer an all-too-common question: Does a performance problem really reside in the code, or is it actually tied to the underlying infrastructure?
Linking Kubernetes metadata to application instrumentation
Traditionally, for application transaction traces or distributed traces you collect using New Relic APM, our agent can tell you exactly which server the code was running on. In many container environments, though, this gets more complex: the worker nodes (hosts) where the application runs (in the containers/pods) are often ephemeral—they come and go. It’s fairly common to configure policies for applications running in Kubernetes to automatically scale their host count in response to traffic fluctuations, so you may investigate a slow transaction trace, but the container or host where that application was running no longer exists. Knowing the containers or hosts where the application is currently running is not necessarily an indication of where it was running 5, 15, or 30 minutes ago—when the issue occurred.
Fortunately, you can now surface Kubernetes metadata and link it to your APM agents as transaction traces in order to explore performance issues and troubleshoot application transaction errors.
There are a few compatibilities and requirements you’ll want to review before getting started. For example, your cluster needs to have the MutatingAdmissionWebhook controller enabled, which (as of this writing) requires Kubernetes 1.9 or later, and might not be enabled by default. And be sure to review which APM agents can be linked to Kubernetes metadata.
To inject Kubernetes metadata into your agent:
- Download the following yaml file:
curl -O http://download.newrelic.com/infrastructure_agent/integrations/kubernetes/k8s-metadata-injection-latest.yaml
- Edit this file, replacing
<YOUR_CLUSTER_NAME>with the name of your cluster.
- Apply the yaml file to your Kubernetes cluster:
kubectl apply -f k8s-metadata-injection-latest.yaml
After you have the injection file in place, you can optionally limit the injection of metadata to specific namespaces in your cluster, or configure the metadata injection to work with custom certificates, if you’re using them. For more information, including steps for validating and troubleshooting metadata injection, see our documentation.
Exploring application performance in Kubernetes with New Relic
After we deploy the metadata injection file to our Kubernetes cluster, custom parameters from Kubernetes begin to appear in the New Relic UI.
In the following screenshot, we see some error details; the transaction attributes shows us, among other details, the Kubernetes hostname and IP address where the error occurred:
We also see the same metadata show up under Distributed Trace attributes.
Next, we used New Relic Insights to look at the same instrumentation to see the performance of application transactions based on pod names. To do this in Insights, we simply wrote the following custom New Relic Query Language (NRQL) query:
SELECT percentile(duration, 95) from Transaction where appName='newrelic-k8s-node-redis' and name='WebTransaction/Expressjs/GET//'FACET K8S_POD_NAME TIMESERIES auto
And here’s the result:
Kubernetes exposes a good deal of useful metadata you can use to gain useful information about performance outliers and track down individual errors.
For instance, in this example, APM error profiles automatically notices that nearly 57% of errors come from the same pods and pod IP addresses:
APM error profiles automatically incorporates the custom parameters and uses different statistical measures to determine if an unusual number of errors is coming from a certain pod, IP, or host within the container cluster. From there, you can zero in on infrastructure or cluster-specific root causes of the errors (or maybe you’ll just discover some bad code).
APM and the Kubernetes cluster explorer
The discussion above shows how to leverage Kubernetes metadata within APM to help troubleshoot issues. Conversely, if you are starting your investigation into an issue from the infrastructure side of the house, you can also visualize application performance metrics within the newly announced New Relic Kubernetes cluster explorer. If you identify issues within a node or a pod but don't see any issues with the infrastructure itself, you can quickly check application metrics (throughput, response time, error rate and violations) for the application instance running on that specific node/pod.
Effectively monitoring applications running in Kubernetes requires not just code-level visibility into the applications but also the ability to correlate applications with the containers, pods, or hosts in which they’re running. By identifying pod-specific performance issues for an application’s workload, you can troubleshoot performance issues faster and have confidence that your application is always available, running fast, and doing what it is supposed to do.
Your customers may not care about Kubernetes … but you must
Your customers likely don’t care if you’re using traditional virtual machines; a bleeding-edge, multi-cloud federated Kubernetes cluster; or a home-grown artisanal orchestration layer. They just want the applications and services you provide to be available, reliable, and fast.
With the rise of Kubernetes, however, along with new technical concepts like orchestration layers, teams need new contexts for understanding and exploring performance in their applications and code. We believe that to efficiently build and run modern software, you need visibility into the application layer—especially when those applications are running inside Kubernetes clusters. We look forward to continuing our work with customers and the broader Kubernetes community, and to creating and providing the solutions they need to succeed.