The Access Group takes alert noise down by 99% to focus on innovation and shipping

Industry
Region
Business Challenge

Scaling is a top business priority for The Access Group. With over 60,000 customers, Access has many and varied products across multiple sectors, gaining a hybrid, multi-cloud environment. Its goal is to become 100% SaaS within five years and to gain an end-to-end view of its business transactions. Critical to Access is momentum and growth, with the customer at the heart of it all.

99%
reduction in alert noise
£1000
savings in monthly AWS bill
30%
baseline for database usage, previously 60%

A hybrid environment needs to be available 24/7 

One challenge, in addition to integrating in-house services, was pinpointing uptime and reliability issues across the stack. Software engineers were up against non-stop alert noise. They could wake up to multiple messages in their inbox with alerts from different environments. In order to distinguish noise from critical alerts, engineers had to dig through past tickets and search code, which could take hours. This was an unwanted distraction from mission-critical tasks, like scaling and shipping features. The group wanted to implement new tech to gain a centralized view across systems and capitalize on the momentum and growth it had already achieved.

Visibility across systems saves time and costs

New Relic application performance monitoring (APM) maps internal and external systems, giving software engineers a data-driven approach to work. New Relic alerts automate and streamline the incident management process by allowing engineers to focus on shipping new features and other mission-critical work. New Relic helps Access improve performance and processes by unifying insights in addition to resolving issues proactively before incidents have the chance to impact customers.

I used to spend every waking hour investigating noise. Now, if a New Relic alert comes in, I see the issue straight away. All the information is already there on the incident page through the server. Now I spend 10 minutes investigating and can figure out the issue faster.

Best practice in alerts lets engineers focus on what they do best

Noise has been reduced by 99% with alerts. Teams now get an average of nine alerts per day. Configurable alerts surface the most important signals and the team can fine-tune alerts based on metrics, events, and more. The team can also mute alerts based on their workflows.

“We’re on things a lot faster with alerts. If something happens, it’s not death by notification. We’ve saved a lot of tears and a lot of meetings when the team knows exactly what to look at.” says Richard Bowen, data architect at The Access Group.

When there’s a slow process, it’s identified, to the line of code. New Relic alerts can plug into many productivity platforms, such as text messaging and Slack, including information like CPU usage and the number of threads running. This process is automated and runs in the background, so information is readily available and easy to dig into. And because the discovery work is eliminated, the necessary piece of code can be worked on to make it into the next sprint. With alerts and data all on one unified platform, with no need to context switch across tools, software engineers and developers can proactively detect issues for faster, simpler incident response. 

Configuration with New Relic is easy: You install the agent. You get a nice visualization of how your services talk to each other. You can see the metrics from all of your service interactions—even the performance issues outside your system. Other similar cloud platform tools require more configuration and overhead.

Real-time infrastructure and database insights

Gaining visibility into distributed systems lets Access pinpoint slow transactions impacting customers and other transactions in the ecosystem—with the ability to distinguish between internal and third-party processes. For example, for several years, a third-party process was thought to take 27 seconds. When New Relic was introduced, the team discovered that the slow third-party process was internal. The old code was rewritten in a day and added to the next sprint. This was discovered by a software engineer that had been onboarded with New Relic just the day before.

Access was also able to reduce the number of machines in its clusters, therefore, reducing cloud costs by identifying which servers were busy. A database that previously ran at 60% is now at 25%-30%, saving about £1,000 per month. The daily database—which had been running at 60%-80% during peak time—is now running at 4%. 

Engineering teams focus on mission-critical tasks

Access is getting ahead of customer issues, and proactively fixing them before the issues get to customers or the support team. This lets teams focus on what they were brought in to do—ship features and innovate—rather than chasing errors. Now that alerting is in place, Access engineers are digging into other metrics and KPIs to learn more about how customers are using the product, such as the number of customers using an app or process. This information is key to prioritizing the product backlog—engineers can see which work would have the most impact on customers. “OpenTelemetry is a big part of the final piece, which is business metrics and customer instrumentation,” says Richard.