Engineers need to support hundreds of applications while delivering new features and functionality but get tied up in identifying and troubleshooting software issues. According to our customers, some of their biggest application challenges include maintaining distributed systems and dependencies, ensuring good user experience, and moving beyond simple software performance. Here’s how and why our customers use New Relic and observability to monitor and improve application performance monitoring (APM):

Managing distributed systems, tools, and dependencies

Prior to New Relic, sports streaming service DAZN had one tool for client-side app telemetry and one for logs, in addition to CloudWatch dashboards and hundreds of Amazon Web Services (AWS) accounts. Teams could observe their services and alert when they weren’t performing, but worked in isolation. According to Pete Tanton, principal site reliability engineer at DAZN, a tiny alert for one team could cause a massive incident for another later on. During incidents, it was a challenge to correlate data because teams had to jump between different dashboards. From an administrative perspective, the burden of managing users and passwords—DAZN had a highly trained SRE team doing JML for three or four different tools—wasn’t working. Pushing all of that into New Relic allowed DAZN to see and query all CloudWatch metrics and client-side telemetry data in one dashboard. New Relic dashboards pull data from everywhere, even third-party sources, and are easily customizable.

Bookmaker William Hill also had a number of monitoring tools, including one for infrastructure and one for APM. “It was absolute chaos,” says Stephen Wild, engineering manager for observability and automation at William Hill. The solutions couldn't cope with the amount of data coming in from the 18,000 nodes, not to mention the extra containers that went through the cloud, says Stephen. This resulted in overnight callouts for teams. Since deploying New Relic, the mean time to resolve (MTTR) is over 80% better. In terms of reliability, “it's 100%. There's been absolutely no downtime,” says Stephen.

Stephen Wild, engineering manager for observability and automation at William Hill, discusses how New Relic helped them improve MTTR by 80%.

When troubleshooting an issue, speed is of the essence, says Kristian Lee, head of DevOps engineering at sports technology company Sportradar. New Relic brings that information together, with login, application monitoring, and infrastructure all in one single pane. All of the information is in one place to identify a misconfiguration, where there’s too much load, and what's broken—and then solve any issues quickly, says Kristian.

Actionable insights on issues, before customers are impacted

Agritech innovator IGS has a tech stack based on microservices and multiple environments, with lots of different systems generating logs. IGS uses logs in context to see the root cause of problems across systems in one view. According to IGS, that capability has been a big part of its recent success. Because IGS can quickly identify what’s happened and where it’s happened in the code, this has reduced MTTR, according to Owen Adams, head of platform engineering at IGS. 

As IGS brings more data into its system, if there’s an issue, it’s critical to identify where the issue is. With infrastructure monitoring and APM, IGS can visualize all infrastructure data. “Now we focus on areas which actually deliver value to our customers or to the development team as opposed to needing to keep people assigned to work on the monitoring observability platform,” says Owen.

 

Intelligent Growth Solutions (IGS) grows nutritious and tasty food via automated growth towers. As IGS scales up and brings more data and complexity into operations, they're shifting left to predict issues before they occur by seeing what software performance will look like during development and staging pipelines.

Improving workflows and user experience

It’s common for companies to use different vendors for real user monitoring (RUM), browser monitoring, and distributed tracing, often in combination with open source components. But managing multiple tools and systems takes a mental toll on teams. Engineer hours are spent trying to gain visibility across the system, in addition to dealing with administrative costs such as adding users and managing passwords. When there’s an issue, troubleshooting includes cross-referencing different tools, data sources, and measurements. 

New Relic powers much of DAZN’s alert intelligence, says Pete. Applied intelligence proactively notifies DAZN teams about potential problems by identifying anomalous behavior, correlating relevant incidents, and providing a root cause analysis. New Relic also lowers the amount of time it takes to deploy services by having logs, metrics, APM, client-side telemetry, and synthetics, all in one place. Instead of dedicating people to monitoring, IGS can focus on delivering hard value for the business of customers. In addition, New Relic instrumentation was easy. IGS added the New Relic APM agent into a container. It covered 90% of the estate and the entire estate was instrumented in a day and a half.

"Trying to only wake your engineers up for stuff that your customer actually cares about seems obvious, but it can be sometimes hard to think of it from that perspective. So it's been really interesting changing the way that we think about observability," says Pete Tanton, Principal Site Reliability Engineer at DAZN.

Rightsizing for cost savings

Prior to adopting New Relic, IGS was spending somewhere between £20-24,000 a month on logging and monitoring. That figure has been cut down by more than half, freeing up a lot of resources. If IGS does have a problem, New Relic gives IGS the confidence to change many things at once. That allows IGS to take more risks and innovate faster with a live product, says Dave Scott, founder and CTO at IGS.