Navan has grown from a startup to an international corporate travel and expense management app, working with some of the biggest brands in tech and retail. Along the way, their engineering team accumulated a range of monitoring tools that began to challenge their scale-up ambitions. With New Relic, Navan created a unified view of their software and data as a common point for their engineers to work from.
Moving from errors-based monitoring to outcome-based monitoring
Before moving to New Relic, Navan's infrastructure and application monitoring was a mix of tools that grew in number as new logging or monitoring needs surfaced. As Navan scaled, it was increasingly challenging to understand and dig into the root cause of a single incident. That could be anything from slow page loading times or a security threat. To streamline tooling and processes, Navan turned to New Relic.
"We had a multiplicity of monitoring when I first joined Navan," explains Patrick Beckhelm, director of Observability at Navan. "It raised multiple challenges, mostly related to our lack of ability to understand quality and uptime in general. We had multiple observability tools, and it wasn't really telling us the story of what was happening with our systems at any one time. We were overlogging everything and it was difficult to discern what was actually wrong and what the causes of the issues were."
That got Patrick starting to think in terms of outcome-based monitoring. Could Navan users easily book travel? Were searches performing as expected? "That helped us a great deal in understanding what needed to be improved with the system,” says Patrick.
"Consolidating on New Relic was a great start for us. Once we got the platform instrumented, we were able to solve problems and move to what we consider a post-uptime world, where we know what's happening at the core and we are confident that near 100% uptime is achieved. Now we can focus as a team on business drivers. New Relic helped us with that evolution."
With New Relic in place, the Navan team took on an observability mindset. First putting alerts on success metrics such as the ability for users to create bookings without errors. This was followed by tuning existing infrastructure metrics, and finally anomaly-based alerts.
Deploying 100x per day
The single pane of glass and consolidated observability tooling is what Patrick calls "engineering efficiency lubricant.”
This key advantage—having New Relic as a whole-of-operations observability tool—enables teams to take a more service ownership approach. "We now had observability at the service level. It gave us the confidence to enable CI/CD globally. We could move faster, without breaking things."
Services that were previously being deployed once a day are now being deployed multiple times a day, Patrick says. "Teams don't have to go over to one tool for one thing, over here for another: it's all in one place and can be easily shared," says Patrick.
Monitoring what is happening in real time
One major goal for Navan’s observability team was to set up key performance indicators (KPIs) for observability: the proportion of incidents identified through monitoring practices, rather than through customer complaints. Before New Relic, Beckhelm estimates that about 20% of incidents in the customer journey were identified by observability tooling, whereas now closer to 90% of incidents.
Patrick points to one example where New Relic impacted Navan’s revenue. One day, New Relic dashboards showed low conversions for users looking to book flights. Patrick’s team identified the root cause as an incomplete flight list from one provider. “It’s not something that would normally blare at you from alerts,” says Patrick. "Being able to detect and fix this as it happened, in real time, had a great impact on the day’s conversion revenue."
Chris Cholette, VP of Engineering and SRE at Navan, says that this is the kind of subtle problem that New Relic is helping uncover. “Real-time data during the operational day is hugely powerful for us,” says Chris. "Now, New Relic is used by our customer service team and others throughout the organization who haven’t previously had visibility into these metrics. They can see the outcomes of the technical processes that they depend on. That’s given them more agency with customers and in collaborating with us,” says Chris.