Allocating resources for an application running on Kubernetes can be very challenging. How many pods do you need? How much CPU and memory do you need to allocate to each of those pods to maintain uptime and optimal performance? How do you account for spikes in traffic during heavy usage when you provision your infrastructure?

Fortunately, Kubernetes includes Horizontal Pod Autoscaling (HPA), which allows you to automatically allocate more pods and resources with increased requests and then deallocate them when the load falls again based on key metrics like CPU and memory consumption, as well as external metrics. This helps maintain uptime for your application without using unnecessary resources. 

What is Kubernetes autoscaling?

Kubernetes autoscaling refers to the ability of the Kubernetes container orchestration platform to automatically adjust the number of running instances of a particular application or service based on the observed resource utilization or custom metrics. The goal of Kubernetes autoscaling is to ensure that the deployed applications can handle varying workloads efficiently without manual intervention.

There are two main types of autoscaling in Kubernetes:

Horizontal pod autoscaler (HPA): HPA automatically adjusts the number of pods in a deployment or replica set based on observed metrics such as CPU utilization or custom metrics. When the resource utilization exceeds or falls below a certain threshold, HPA either scales the deployment up by adding more replicas or scales it down by removing unnecessary replicas. This ensures that the application can handle varying levels of demand without over-provisioning or under-provisioning resources.

Vertical pod autoscaler (VPA): VPA adjusts the CPU and memory resource requests of individual pods based on their historical resource usage. It dynamically tunes the resource requirements to optimize resource allocation for each pod, helping to ensure that each pod gets the resources it needs without over-allocating resources.

NEW RELIC KUBERNETES INTEGRATION
kubernetes logo

Kubernetes autoscaling with HPA and New Relic

You can use metrics from New Relic to determine how and when your deployment autoscales. This post demonstrates how to autoscale a Kubernetes deployment with New Relic metrics by walking through an example using metrics reported by Pixie. This solution has the following advantages:

  • No need to manually add or remove pods based on your real-time demands.
  • Improved availability by scaling resources when load increases or performance degrades.
  • More cost-effective solution than allocating a fixed number of pods.

The New Relic metric adapter will automatically autoscale your Kubernetes deployments based on metrics gathered from your applications and infrastructure services such as the number of requests received per second by a web server.

The metric adapter gets the metric value from the New Relic NerdGraph API based on a NRQL query and submits this value to the Kubernetes external metrics API. Then, the Horizontal Pod Autoscaler is ready to scale the deployments based on an external metric. The next graphic shows where the metrics adapter fits in.

Prerequisites

1. Deploy minikube cluster 

Clone the following repo from Github:

 

$ git clone https://github.com/newrelic-experimental/pixie-lab-materials 

$ ​​cd pixie-lab-materials/main 

$ ./setup.sh

 

The setup.sh script spins up a new minikube cluster using the Pixie-supported hyperkit driver. It also configures your network memory and CPU for optimal performance with Pixie. Finally, it creates all the pods and services that make up the demo application.

In a new terminal window, open a minikube tunnel:

$ minikube tunnel -p minikube-pixie-lab

You should now have two terminal windows: 

  • One that contains your tunnel. This needs to remain open to access your demo application.
  • One for running the other commands in this tutorial.

2. Install Kubernetes Integration with Pixie and Metrics Adapter

Next, set up New Relic’s Kubernetes Integration with this guided install. Make sure to enable Pixie by checking Instant service-level insights, full-body requests, and application profiles through Pixie. The other checked items in the next image (Kube state metrics, Prometheus metrics, Kubernetes events, and Log data) are checked by default.

After you select Continue, New Relic generates a command. Copy and paste the command into your dev environment.

To install the New Relic Metrics Adapter, use the newrelic-k8s-metrics-adapter Helm chart, which is also included in the nri-bundle chart used to deploy all New Relic Kubernetes components.

helm upgrade --install newrelic newrelic/nri-bundle \
--namespace newrelic --create-namespace --reuse-values \
--set metrics-adapter.enabled=true \
--set newrelic-k8s-metrics-adapter.personalAPIKey=YOUR_NEW_RELIC_PERSONAL_API_KEY \
--set newrelic-k8s-metrics-adapter.config.accountID=YOUR_NEW_RELIC_ACCOUNT_ID \
--set newrelic-k8s-metrics-adapter.config.externalMetrics.manipulate_average_requests.query='FROM Metric SELECT average(http.server.duration) WHERE instrumentation.provider='pixie''

Here is additional context on the flags in the previous command:

  • metrics-adapter.enabled: Must be set to true so the metrics adapter chart is installed.
  • newrelic-k8s-metrics-adapter.personalAPIKey: Must be set to your New Relic Personal API key.
  • newrelic-k8s-metrics-adapter.accountID: Must be set to the New Relic account where you will be fetching metrics.
  • newrelic-k8s-metrics-adapter.config.externalMetrics.external_metric_name.query: Adds a new external metric with the following information:
    • external_metric_name: The metric name.
    • query: The base NRQL query for the metric.


3. Test your NRQL query

Before you set the cluster to scale based on a metric from New Relic, you need to make sure the query is getting the right data. Test your NRQL query by selecting Query your Data. Then in the Query builder tab, copy and paste the following NRQL query:

FROM Metric SELECT average(http.server.duration) WHERE instrumentation.provider='pixie'

 

4. Configure your Horizontal Pod Autoscaler

Create a new file called hpa.yaml in the pixie-lab-materials/main/kube directory. Based on the HPA definition in the YAML file, the controller manager fetches the metrics from the external metrics API which are served by the New Relic metrics adapter.

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta2
metadata:
  name: manipulate-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: manipulation-service
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: manipulate_average_requests
        target:
          type: Value
          value: 100
  • The HPA is configured to autoscale the manipulation-service deployment.
  • The maximum number of replicas created is 10 and the minimum is one.
  • The HPA will autoscale based on the metric manipulate_average_requests. This should be the same name as the metric defined above in the helm chart.

Every 30 seconds, Kubernetes queries the New Relic API for the value of the metric and autoscales the manipulation-service deployment if necessary. You can autoscale your Kubernetes deployments based on multiple metrics. The autoscaler will select the metric that creates the most replicas and change the frequency of fetching metrics. 

Make sure to apply the new YAML file by running 

$ cd pixie-lab-materials/main/kube

$ ​​kubectl apply -f hpa.yaml

5. Add load to trigger autoscaling

Navigate to the url of the deployed site by running:

$ kubectl get services

Then open the EXTERNAL-IP that you see for frontend-service in your browser.

Next, install hey (and Go v1.17) with the following command in your terminal:

$ brew install hey

After installing hey, confirm it’s installed correctly with the which hey command in your terminal.

Now you can  send GET requests to the EXTERNAL-IP of the frontend with the following command:

$ hey -n 10 -c 2 -m GET http://<EXTERNAL-IP>

This opens 2 connections and sends 10 requests. Each request is a GET request to frontend-service.

You can see the HPA autoscaling by running:

$ watch kubectl get hpa

As you can see, the pod autoscales the number of replicas as the average HTTP request time reported by New Relic increases. You can adjust the configuration for your own services so that New Relic and HPA automatically help you autoscale as needed.

Kubernetes autoscaling best practices

The following best practices will help you optimize the effectiveness of Kubernetes autoscaling and ensure your applications perform well under varying workloads while maintaining cost efficiency.

Define clear metrics:

Choosing meaningful metrics is fundamental to autoscaling. Understand the behavior of your application and select metrics like CPU utilization, memory usage, or custom metrics that accurately reflect performance.

Set realistic resource requests and limits:

Accurate resource requests and limits are essential for autoscaling decisions. Ensure your pod specifications have realistic values to avoid over-provisioning or under-provisioning resources.

Establish Horizontal Pod Autoscaler (HPA) min and max values:

Set appropriate minimum and maximum values for the number of replicas in HPA. This prevents autoscaling from excessively scaling up or down and helps maintain a balance between resource efficiency and application performance.

Implement cluster autoscaler:

Enabling the Cluster Autoscaler is especially important in cloud environments. It ensures the overall cluster size adjusts dynamically based on demand, helping optimize resource utilization.

Monitor and analyze performance:

Regular monitoring and analysis of autoscaling performance are critical. Use monitoring tools to track how your applications scale in response to varying workloads and make adjustments as needed.

Test and validate autoscaling:

Thoroughly test autoscaling configurations in staging or testing environments. Simulate different workload scenarios to ensure that autoscaling responds appropriately and doesn't introduce issues in production.

Combine horizontal and vertical autoscaling:

Using both Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) together provides a comprehensive autoscaling strategy. HPA adjusts the number of replicas, while VPA adjusts the resource requests of individual pods.

Understand pod disruption budgets (PDB):

Configuring Pod Disruption Budgets helps maintain application availability and stability during scale-down events. Limiting the number of disrupted pods ensures a smooth scaling experience without negatively impacting the application.

What's next?

After implementing autoscaling in your Kubernetes pods, the focus shifts to continuous refinement and maintenance of your system. Key to this process is the monitoring of cluster performance and resource utilization. Such vigilance allows you to fine-tune your autoscaling parameters, simultaneously managing cloud costs effectively. Leveraging New Relic’s comprehensive monitoring tools, you can track cluster performance and resource usage with precision in real-time. The platform’s advanced analytics and reporting capabilities give you a lucid understanding of your Kubernetes environment, enabling informed decision-making for a resilient and cost-effective infrastructure.