Since deploying New Relic, Zip has seen a number of benefits. One of the key features has been root cause analysis: it previously took the team up to half an hour to diagnose a problem, now it takes minutes.
"We can look really quickly, see where the problem is, and jump straight on it. It’s not necessarily an engineer who has to look into the issue, or the person who wrote the code. A product manager could jump in there and start to see where the problems are. This also gives them insight into issues and what needs to be addressed," David says.
With key events like Cyber Weekend (Black Friday and Cyber Monday) resulting in significant traffic, the appetite for error at Zip is exceptionally low. To prepare for these events, Zip carries out performance testing and incident simulations ahead of time. They use New Relic heavily for this, with distributed tracing to narrow down any bottlenecks. Cyber Weekend in 2019 resulted in over 850,000 sales transactions totalling $44m with only one incident which was pinpointed and resolved in minutes.
Another challenging day is Boxing Day which in 2019, was their busiest day. Being a public holiday, it’s also a day when many staff members are on leave. "In those scenarios we rely heavily on New Relic’s alerting,” says David.
Overall, New Relic has significantly reduced downtime and given Zip “a lot more visibility in downtime”, he says. This has enabled the company to measure and compare it month-on-month to look at ways for further improvement.
Zip also uses New Relic dashboards to monitor its SLAs with customers and has found New Relic’s AI capabilities vital.
"Sometimes we don't know the traffic and we don't necessarily know what we should be alerting on. Setting it manually can lead to false positives … so the AI functionality, where it will alert based on what the system detects as an anomaly, has been really useful," David explains.