Back to top icon

A Complete Introduction to Monitoring Kubernetes with New Relic

Monitoring Kubernetes 101: Discover the fundamentals of what you need to know to effectively monitor Kubernetes deployments.

Introduction

Before Kubernetes took over the world, cluster administrators, DevOps engineers, application developers, and operations teams had to perform many manual tasks in order to schedule, deploy, and manage their containerized applications. The rise of the Kubernetes container orchestration platform has altered many of these responsibilities, but these same teams now have new things to worry about.

In effect, while Kubernetes solves old problems, it can also create new ones. Specifically, adopting containers and container orchestration requires teams to rethink and adapt their monitoring strategies to account for the new infrastructure layers introduced in a distributed Kubernetes environment. You need to monitor:

  • Clusters: Track the capacity and resource utilization of your cluster, and be able to drill into specific parts of the cluster, including:

    • Nodes: Monitor the CPU, memory, and disk utilization for Kubernetes workers and masters to ensure all nodes are healthy.

    • Deployments/pods: Ensure all desired pods in a deployment are running and healthy.

    • Containers: Monitor CPU and memory consumption and how close it is to the limits you’ve configured. Check for containers that can’t start because they are stuck in a “crash loop backoff.”

  • Applications: Monitor the performance and availability of applications running inside your Kubernetes cluster. Measure request rate, throughput, error rate, and more.

  • End-user experience: Track and monitor mobile application and browser performance to gain insight into things like response time and errors. Monitor load time and availability to ensure customer satisfaction.

This guide is designed to highlight the fundamentals of what you need to know to effectively monitor Kubernetes deployments. Think of it as Monitoring Kubernetes 101.

Monitoring Kubernetes clusters

In addition to providing visibility into operational data such as the number of resources used and namespaces per cluster and per pod, the New Relic Infrastructure Kubernetes integration also lets you see the relationships between objects in the cluster by taking advantage of Kubernetes’ built-in labeling system.

kubernetes monitoring

Get started monitoring your Kubernetes cluster with New Relic

The New Relic Infrastructure Kubernetes integration brings in system-level metrics, allowing you to view, troubleshoot, and alert on the most important parts of your cluster. Use the integration’s out-of-the-box dashboard to inspect a single container, or scale up to see large, multi-cluster deployments across different Kubernetes entities, including nodes, pods, namespaces, and containers.

To view the Kubernetes integration dashboard:

  1. Go to infrastructure.newrelic.com  > Integrations > On Host Integrations, then select the Kubernetes dashboard link. 
  2.  Install Kube-state-metrics.
  3. Follow the New Relic integration documentation to complete the installation and configuration.

This opens the default Kubernetes dashboard:  

View your data in pre-built dashboards for immediate insight into your Kubernetes environment

Get a full cluster overview

Track and monitor the capacity and resource utilization of your cluster.

Because your Kubernetes deployment is likely managed by various teams including site reliability engineers (SREs), operations engineers, and developers, you may find it difficult to keep track of the current state of a cluster. Key questions include:

  • How big is my Kubernetes deployment?
  • How many nodes, namespaces, deployments, pods, and containers do I have running in a cluster?

How New Relic helps

Part of the default dashboard for the New Relic Infrastructure Kubernetes integration, the cluster overview widget provides a count of each object category that matters to you. Think of it as a snapshot, or a curated list, of the objects managed by Kubernetes in your cluster.

The New Relic Infrastructure default dashboard provides an overview of the objects in each cluster.

Suggested alerting

There’s no immediate need to set alerts on the cluster overview. But we recommend you keep this dashboard visible in your workspace so everyone on your team can see the current state of your Kubernetes deployment.

 

Track cluster resource usage

When you administer clusters, you need enough usable resources in your cluster to avoid running into issues when scheduling pods or deploying containers. If you don’t have enough capacity to meet the minimum resource requirements of all your containers, scale up your nodes’ capacity or add more nodes to distribute the workload.

You should know:

  • What percentage of cluster resources you’re using at any given time
  •  If your clusters are over- or under-provisioned
  •  How much demand have you’ve placed on your systems

 

How New Relic helps

Our Kubernetes integration monitors and tracks aggregated core and memory usage across all nodes in your cluster. This allows you to meet resources requirements for optimal application performance.

The New Relic Infrastructure default dashboard for core and memory usage.

Suggested alerting
Set alerts on the cores and memory usage of the hosts in your cluster.

Monitor node resource consumption

Beyond simply keeping track of nodes in your cluster, you need to monitor the CPU, memory, and disk usage for Kubernetes nodes (workers and masters) to ensure all nodes in your cluster are healthy.

Use this data to ensure:

  •  You have enough nodes in your cluster
  • The resource allocations to existing nodes is sufficient for deployed applications
  •  You’re not hitting any resource limits
  •   etcd is healthy

How New Relic helps 

New Relic tracks resource consumption (used cores and memory) for each Kubernetes node. That lets you track the number of network requests sent across containers on different nodes within a distributed service. You can also track resources metrics for all containers on a specific node—regardless of which service they belong to:

The New Relic Infrastructure default dashboard to monitor Node Resource Consumption.

Always ensure your current deployment has sufficient resources to scale. You don’t want new node deployments blocked by lack of resources.

Suggested alerting
Set alerts so you’ll be notified if hosts stop reporting or if a node’s CPU or memory usage drops below a desired threshold.

Monitor for missing pods

From time to time, you may find your cluster is missing a pod. A pod can go missing if the engineers did not provide sufficient resources when they scheduled it. The pod may have never started; it’s in a restart loop; or it went missing because of an error in its configuration.

To make sure Kubernetes does its job properly, you need to confirm the health and availability of pod deployments. A pod deployment defines the number of instances that need to be present for each pod, including backup instances. (In Kubernetes, this is referred to as a ReplicaSet). Sometimes the number of active pods is not specified in the Replicas field on each deployment. Even if they are, Kubernetes may determine if it can run another instance based on resources the administrator has defined. 

kubernetes pod deployments

How New Relic helps

New Relic makes it easier to avoid this issue by knowing the resource limitations of the cluster. 

If you don’t have enough resources to schedule a pod, add more container instances to the cluster or exchange a container instance for one with the appropriate amount of resources. In general, you can use the New Relic Kubernetes integration to monitor for missing pods and immediately identify deployments that require attention. This often creates an opportunity to resolve resource or configuration issues before they affect application availability or performance.

The New Relic Infrastructure default dashboard to monitor missing pods by deployment.

Suggested alerting

Set an alert for when a deployment’s missing pods value rises above a certain threshold for a certain period. If the number of available pods for a deployment falls below the number of pods you specified when you created the deployment, the alert will trigger. The alert will be applied to each deployment that matches the filters you set.

Find pods that aren’t running

Kubernetes dynamically schedules pods into the cluster; if you have resource issues or configuration errors, scheduling will likely fail. If a pod isn’t running or even scheduled, there’s an issue with either the pod or the cluster, or with your entire Kubernetes deployment.

When you see that pods aren’t running, you’ll want to know:

  •  If there are any pods in a restart loop
  • How often are requests failing
  • If there are resource issues or configuration errors

How New Relic helps

As noted, if you have resource issues or configuration errors, Kubernetes may not be able to schedule the pods. In such cases, you want to check the health of your deployments, and identify configuration errors or resource issues.

With the New Relic Infrastructure Kubernetes integration, you can use default deployment data to discover and track pods that may not be running and sort them by cluster and namespace.

With the New Relic Infrastructure Kubernetes integration, you can use default deployment data to discover and track pods that may not be running and sort them by cluster and namespace.

Suggested alerting

Set alerts on the status of your pods; alerts should trigger when a pod has a status of “Failed,” ”Pending,” or “Unknown” for the period of time you specify.

Troubleshoot container restarts

In normal conditions, containers should not restart. Container restarts are a sign that you’re likely hitting a memory limit in your containers. Restarts can also indicate an issue with either the container itself or its host. Additionally, because of the way Kubernetes schedules containers, it can be difficult to troubleshoot container resource issues since Kubernetes will restart—if not kill—containers when they hit their limits.

Monitoring container restarts helps you understand:

  • If any containers are in a restart loop
  • How many container restarts occured in X amount of time
  • Why containers are restarting

How New Relic helps

A running count of container restarts is part of the default container data New Relic gathers with the Kubernetes integration.

The New Relic Infrastructure default dashboard to monitor container restarts.

Suggested alerting

This an optional alerting scenario. Kubernetes automatically restarts containers, and setting up an alert will give you immediate, useful notification, but don’t let container restarts interrupt your sleep.

 

Track container resource usage

Monitoring container resource usage helps you ensure that their containers and applications remain healthy. For example, if a container hits its limit for memory usage, the kubelet agent might kill it.

When monitoring container resource usage, you’ll want to know:

  • If your containers are hitting resource limits and affecting the performance of their applications
  • If there are spikes in resource consumption
  •  If there is a pattern to the distribution of errors per container

How New Relic helps

First, identify the minimum amount of CPU and memory a container requires to run—which needs to be guaranteed by the by the cluster—and monitor those resources with New Relic.
Second, monitor container resource limits. These are the maximum amount of resources that the container will be allowed to consume. In Kubernetes, resource limits are unbounded by default.

The New Relic Infrastructure default dashboard to monitor container memory usage.
test

The blue line represents the maximum amount of the resource that the container will be allowed to consume. The yellow line represents the minimum amount of CPU or memory the container needs to run, which needs to be guaranteed by the system

This type of monitoring can help proactively resolve resource usage issues before they affect your application.

Suggested alerting

Set alerts on container CPU and memory usage and on limits for those metrics.

Monitoring applications running in Kubernetes

A key benefit of Kubernetes is that it decouples your application and its business logic from the specific details of its runtime environment. That means if you ever have to shift the underlying infrastructure to a new Linux version, for example, you won’t have to completely rewrite the application code.

When monitoring applications managed by an orchestration layer, being able to relate an application error trace, for instance, to the container, pod, or host it’s running in can be very useful for debugging or troubleshooting.

At the application layer, you need to monitor the performance and availability of applications running inside your Kubernetes cluster. You do that by tracking such metrics as request rate, throughput, and error rate.

Monitoring applications running in Kubernetes

Get started monitoring applications in Kubernetes

New Relic APM lets you add custom attributes, and that metadata is available in transaction traces gathered from your application. You can create custom attributes to collect information about the exact Kubernetes node, pod, or namespace where a transaction occurred.

To get started monitoring applications running in Kubernetes you will need to instrument your application with the Kubernetes Downward API. We’ve created a sample app to demonstrate how this works in a Node.js application—fork this repo for your own use. (Our Monitoring Application Performance in Kubernetes blog post explains how to add this type of Kubernetes metadata to APM-monitored application transactions.)

Track transaction traces

When you run applications in Kubernetes, the containers the apps run in often move around throughout your cluster as instances scale up or down. This scheduling happens automatically in Kubernetes, but could affect your application’s performance or availability. If you’re an application developer, being able to correlate Kubernetes objects to applications is important for debugging and troubleshooting.

You’ll want to know:

  • What applications are associated with which cluster
  • How many transactions are happening within a given pod

How New Relic helps

To monitor transaction traces in Kubernetes, you need a code-centric view of your applications. You need to correlate applications with the container, pod, or host it’s running in. You also need to identify pod-specific performance issues for any application’s workload.

: Transaction attributes shows the Kubernetes hostname and IP address where the error occurred

Knowing the names of the pod and node where the error occurred can speed up your troubleshooting. Visibility into transaction traces quickly highlights any abnormalities in your Kubernetes-hosted application.

To learn about how to gain visibility into transaction traces in Kubernetes, see the blog post, Monitoring Application Performance in Kubernetes.

Suggested alerting
You’ll want to set up alerts for all applications running in production. Specifically, you’ll want to alert on API service requests, transactions, service latencies, uptime, and throughput, sending alerts when any of these metrics fall below the thresholds you define.

Prevent errors

If a single pod or particular pod IP starts failing or throwing errors, you need to troubleshoot  before those errors harm your cluster or application. When something goes wrong, zero in on root causes as quickly as possible.

You’ll want to know:

  •  In which namespace/host/pod did a transaction fail
  • If your app is performing as expected in all pods
  • The performance of application X running on pod Y

How New Relic helps

With New Relic, you can get a code-centric view of the applications running inside your cluster, monitor your Kubernetes-hosted applications for performance outliers, and track down any individual errors.

APM Error Profiles automatically notices if errors are occurring within the same pods and from the pod IP addresses

The Monitoring Application Performance in Kubernetes blog post can help you pinpoint applications that might be causing performance issues in your cluster.

Suggested alerting

Set alerts to track error rates for any applications running in production environments in Kubernetes.

Monitoring end-user experience when running Kubernetes

If you order an item from a delivery service, and it arrives at your house broken or late, do you really care what part of the delivery process broke? Whether it was the fault of the manufacturer, the distributor, or the delivery service, the end result is equally annoying.

The same logic applies to companies hosting apps in Kubernetes: if a customer navigates to their site and a page doesn't load or takes too long, the customer isn’t interested in the technical reasons why. That’s why it’s not enough to track your own systems’ performance, it’s also essential to monitor the front-end performance of your applications to understand what your customers are actually experiencing.

Monitoring end-user experience when running Kubernetes

Even though your application is running in Kubernetes, you can still identify and track the key indicators of customer experience, and clarify how the mobile or browser performance of their application is affecting business.

How New Relic helps
For example, when developers first migrate to Kubernetes, they can set up a pre-migration baseline to compare their front-end application’s average load time before and after the migration. Developers can use the same strategies detailed in the Monitoring Application Performance in Kubernetes blog post to gain insight into key indicators such as response time and errors for mobile application and browser performance. It’s also imperative to monitor load time and availability to ensure customer satisfaction. New Relic Browser and New Relic Mobile are built to give you that crucial view into your users’ experiences.

 

The New Relic Browser overview page shows a summary of browser performance for that app
Quickly view crash occurrences, app launches, and more with New Relic Mobile

Additionally, developers and operators both need to understand the availability of any Kubernetes-hosted service, often from locations all around the world. New Relic Synthetics is designed to track application availability and performance from a wide variety of locations.


New Relic brings together business-level information and performance data in a single pane of glass. That helps customer from development, operations, product, and customer support teams identify potential areas of improvement in the product and find better ways to debug errors that may affecting specific customers.

Suggested alerting

New Relic Mobile Alerts:

  • Mobile network error rate and response time to assure you’re notified on the most critical endpoints

New Relic Browser Alerts

  • Browser session count drop to indicate availability issues
  • Browser Javascript error rate or error count
  • Browser interaction duration

New Relic Synthetics Alerts:

  • Synthetics error rate or error count?
  • Synthetics response times?
  • Synthetics ping checks

Such metrics as page load time, mobile crashes, JavaScript errors, ping checks, and scripted checks through a key user process.

Monitoring Kubernetes’ supporting infrastructure

The major cloud providers (AWS, Azure, GCP, and others) now offer Kubernetes-as-a-Service platforms. Unfortunately, due to Kubernetes’ dynamic scheduling, without proper monitoring in place, it can be difficult to diagnose points of failure or track down other issues in these cloud platforms.

Monitoring Kubernetes’ supporting infrastructure

When running Kubernetes in a cloud environment, you’ll want to know:

  • If your cluster is distributed across multiple regions 
  • If a container was running on a particular vendor’s platform or on-premise at the time of an outage

How New Relic helps

New Relic lets you monitor, query, and alert on usage metrics and errors within your cloud environment, including services based in multiple regions and availability zones. Look for metrics to ingest from all the different layers of your cloud-based Kubernetes infrastructure, even if your clusters are distributed across multiple data centers or cloud providers.

ook for metrics to ingest from all the different layers of your cloud-based Kubernetes infrastructure, even if your clusters are distributed across multiple data centers or cloud providers.

Suggested alerting

Set alerts to track service latency, uptimes, and throughput for applications running on any Kubernetes-as-a-Service platform.

New Relic’s Kubernetes integrations

Amazon Elastic Container Service for Kubernetes (Amazon EKS)

Amazon EKS provides Kubernetes as a managed service on AWS. It helps make it easier to deploy, manage and scale containerized applications on Kubernetes.

How New Relic helps

While EKS makes it easier to launch and run Kubernetes, but because of the ephemeral nature of Kubernetes-based workflows, customers need advanced monitoring at the cluster, node, pod, container, and application level. New Relic provides that crucial visibility into your clusters.

Your Amazon EKS environment includes thousands of labels and tags exposed by your infrastructure, containers, and microservices. New Relic automatically collects these labels and tags so you can more easily view all your Kubernetes entities, no matter how you arrange them.

Read the Amazon EKS integration documentation.

Learn More: Amazon EKS Is Here—and You Can Monitor It With New Relic

Google Kubernetes Engine (GKE)

GKE provides an environment for deploying, managing, and scaling your containerized applications using Google-supplied infrastructure.

How New Relic helps

While GKE manages your Kubernetes infrastructure, New Relic makes it easier to troubleshoot, scale, and manage your dynamic environment. New Relic helps ensure your clusters are running as expected and can help you quickly detect performance issues within your cluster before they affect make it to your customers.

Read the GKE integration documentation.

Learn More: Monitor Google Kubernetes Engine with New Relic

Microsoft Azure Kubernetes Service (AKS)

AKS manages your hosted Kubernetes environment, making it easier to deploy and manage containerized applications without container orchestration expertise. It also eliminates the burden of ongoing operations and maintenance by provisioning, upgrading, and scaling resources on demand, without taking your applications offline.\

How New Relic helps

If you’re running containers in AKS, use New Relic’s application and infrastructure-centric views to quickly see into all your deployments across AKS.

Read the Kubernetes integration installation documentation.

RedHat OpenShift

OpenShift provides developers with an integrated development environment (IDE) for building and deploying Docker-formatted containers, and then managing them with the Kubernetes container orchestration platform.

How New Relic helps

You always want to know which apps are running on which nodes in your Openshift environment. New Relic helps OpenShift admins see which apps are getting stopped and why. New Relic also makes it easy to map applications and their resources to namespaces and then monitor those namespaces.

Read the OpenShift integration documentation

 

 

Your modern DevOps toolset

New Relic is just one of many important technologies you’ll need for your DevOps efforts.

View the tools >

Article

6 Ways New Relic Can Help You Do DevOps Better

Article

Dashboards for DevOps: Examples of What to Measure