As an observability company, New Relic creates and maintains multiple language and technology-specific agents to collect telemetry data from our customers’ environments. When these agent teams release new updates manually, they conduct numerous verifications to ensure the process doesn’t introduce any regressions caused by human error. To reduce the time required to deploy a new release, the Kubernetes agent team fully automated the software agent release process. Reusable GitHub Actions workflows keep track of vulnerable dependencies, write documentation, and sync with partners like Amazon Elastic Kubernetes Service (Amazon EKS) anywhere. Previously, the shipping updates to our Kubernetes integration was manual and took up to two weeks; now an automated process takes an hour per week.

Decreasing response time for security incidents

One of our challenges was our response to security vulnerabilities: we would react to a vulnerability only when a customer contacted Global Technical Support (GTS) with an escalation. That would lead to both customer frustration with our integrations and developer stress because we would need to stop planned work in favor of patching our software.

As part of our continuous integration (CI) pipeline, we enabled code-scanning tools to keep us informed of the latest vulnerabilities discovered in our code base. We enabled CodeQL to look for vulnerabilities in our codebase, and we use Trivy to ensure our Docker images do not include vulnerabilities injected from the base image and the libraries included with it. 

One of our common use cases is detecting vulnerabilities in our base image (Alpine). This process alerts us to current issues that need to be fixed. By combining vulnerability scanning with automatic dependency management, we’re able to automatically patch fixes to the code base without the need for human interaction. Our workflows run weekly, which means customers get a patched version within a week of a fix being available.

As a concrete example of our increased response time, our security dashboard alerted us to the following security vulnerabilities in alpine:3.18.4 (the image we’re using in the nri-kubernetes integration at the time):

Those vulnerabilities were fixed in alpine:3.18.5, released November 30, 2023, and alpine:3.19.0, released December 7, 2023. Renovate, our universal dependency management tool, created pull requests for both releases the same day the releases were published, and they were included in our release on December 8, 2023, just one day after the release of alpine:3:19.0.

All three mentioned alpine images have two more vulnerabilities that were detected afterwards:

Those two pending vulnerabilities are currently flagged in our Security dashboard.

As soon as a fix is released by Alpine, our customers can expect a fixed release version from our integrations within a week.

Provide cutting-edge support for the latest version of Kubernetes

Supporting new versions of Kubernetes involves updating third-party testing tools and performing extensive conformance testing to ensure a new Kubernetes version doesn’t have breaking changes to our integrations. One common issue that we need to validate is a Kubernetes API that’s in alpha or beta version, since there can be changes without any previous notice.

With our fully automated dependency management tooling, once the tools are in place, we have immediate access to them. Also, because our conformance testing is fully automated, we can speed up validation time, allowing us to be on the cutting-edge of Kubernetes support.

When a new release of Kubernetes becomes available, it's crucial to update our testing workflows to incorporate the latest version. Renovate, our dependency management tool, automatically opened a pull request (PR) to update Minikube to the latest version. We use minikube to quickly spin-up a cluster and then run end-to-end tests, and then we run all the battery of tests in each Kubernetes version that we support. Once that minikube is updated with the latest Kubernetes version, we enable testing for that version in our testing framework. If tests confirm integration is working as expected, we can declare support for the latest Kubernetes version. Because of our automated workflows, we updated our test suite to leverage the latest version of Kubernetes the same day that minikube announced the release. This allowed us to test for compatibility within one day and communicate that our Kubernetes integration was compatible with the latest release seven days after minikube was released.

Sync latest agent release with AWS EKS Anywhere Add Ons

New Relic supports the Amazon EKS Anywhere Partner program by offering our Kubernetes agent as an out-of-the-box add-on for Amazon EKS Anywhere clusters. We developed a GitHub Actions workflow to automatically open a pull request when we cut a new release of our agent. This keeps our end users of Amazon EKS Anywhere up to date with the latest agent releases, ensuring that New Relic continues passing the latest conformance testing and remains as an Amazon EKS Anywhere validated partner.

Automate the changelog, communications, and documentation

The creation and update of documentation is a big time sink. To get a weekly cadence of releases, we needed to update all communication channels for our internal stakeholders, external customers, and external partners. To automate communications to all of these stakeholders, we created reusable workflows that run every week and automatically update release notes, send Slack messages of latest releases in internal New Relic stakeholder channels, and update developer documentation.

Then, our own GitHub workflow compiles the documentation and release notes, and sends communications through all of the channels. To better communicate with internal stakeholders, we created our own K8s Agent bot linked to our GitHub Actions workflow, so our customers and partners can be automatically notified of release updates.

Conclusion

A CI pipeline provides more benefits than simply making life easier for developers—customers get access to more secure, well-documented, and cutting-edge software. 

An effective CI pipeline is more than just creating automation in the agent release process; it involves improving testing to guarantee no regressions happen in the process, detecting security vulnerabilities quickly and addressing them promptly, and communicating with all stakeholders clearly and effectively.