Delivery Hero scales operations with dashboards and APM
Share this Story
Delivery Hero, founded in Berlin in 2011, is an online food ordering platform that has grown astronomically, and is now the world’s largest service of its kind outside of China. With more than 250,000 restaurant partners, operating in more than 40 countries across Asia, Europe, Middle East and America, Delivery Hero processes more than 2 million orders per day.
"When I joined in 2013, foodpanda, one of the many Delivery Hero brands, was doing a few hundred orders a day," says Mathias Nitzsche, VP of engineering at Delivery Hero. "Today we’re doing hundred thousands of orders a day, still growing more than 100% a year."
With that kind of growth, the ability to scale is everything. But to do that, Nitzsche and his team needed to see clearly into what was happening across their infrastructure and applications.
"For us, speed is not as important as scale, and scale is nothing without visibility," he explains. "It is impossible to run a platform like ours without visibility."
Today, Delivery Hero’s engineering organization has more than 1,000 developers working across 15 platforms. Pandora, the internal name for their largest platform, powers foodpanda, foodora, and a few other brands.
Meaningful visibility to inform business excellence
To get valuable insight into its business, Delivery Hero has relied on New Relic for several years. Having originally adopted New Relic to monitor application availability and performance, the company has since expanded its usage of the platform to monitor business-critical key performance indicators (KPIs)—from the number of restaurants requested per area, to orders per platform and country, to payment charge-back rates per payment provider, just to name a few.
Dashboards are the most critical component, says Nitzsche, and all Delivery Hero teams rely on them daily to understand how their platforms are performing. With over 500 New Relic applications to oversee, Nitzsche says, "nobody has the time to look through an individual application in any detail," so he uses dashboards to make sense of those hundreds of applications. A few cross-application dashboards give him a high-level view of these applications and help him track business metrics like the number of orders processed globally and the number of errors found in their applications.
"Without dashboards, our monitoring would be only half as useful. Dashboards is what makes New Relic beautiful," he says.
And because Insights is connected to every product in the New Relic platform, engineers can stream and track data with APM, for deeper analysis, segmentation, and filtering within dashboards.
New Relic provides both the narrow focus and broad overviews Nitzsche’s team needs. "We have more than one hundred dashboards created by engineers, QAs, and product managers, shown on many screens all over the office," he says. These dashboards provide granular insight into a range of business indicators, displaying the information that matters most at any given time.
From monolith to microservices to DevOps
Unsurprisingly, the global Pandora IT infrastructure has undergone a tremendous transformation. To better accommodate the growth of their business, the team behind Pandora migrated their infrastructure from a monolithic platform to a microservices architecture running on Amazon Web Services (AWS).
Not only did Nitzsche’s team use New Relic to monitor its microservices migration in real time, but it continues to use New Relic to monitor other migrations from new acquisitions, or migrations from various regional platforms to its global platform. For example, the company recently migrated its Finnish and Swedish applications to the global platform so the teams serving those countries could better leverage core Delivery Hero services like search, payments, and infrastructure. During such migrations and rollouts, it uses New Relic to monitor things like speed, number of requests and errors, and database queries. "You double the traffic in some of these rollouts, and you want to see how it behaves," Nitzsche says.
That original migration to microservices introduced a high level of complexity, however. Where there was once only one repository for the Pandora platform, Nitzsche’s team now manages hundreds of repositories spread across dozens of microservices running in their Kubernetes clusters.
And these changes in the platform meant changes for the teams as well. Today, Delivery Hero is a true DevOps company, split into cross-functional teams. Rather than being structured by IT functions, they’re structured according to the services they manage. For instance, there is a payment team, a checkout team, and a search and discovery team, among others. Each team oversees its own product design, its own frontend and backend development, and its own infrastructure resourcing.
This transformation has helped Delivery Hero scale the engineering teams, and as a result their DevOps culture has flourished, says Nitzsche. Now all teams are cross-functional and focused on the direct needs of the business.
"We conduct a lot of sessions to show what other teams are doing in terms of monitoring and looking past errors," says Nitzsche. "We can then use that information to accelerate development processes and decision-making."
His priority through 2020 is to massively increase the size of the engineering team to many hundreds of engineers. It is not just a numbers game, though; he needs engineers with the right mindset. Engineers need to use New Relic to create alerts and dashboards as they code, and not simply react when something breaks. Systematically operationalising the company’s data is vital in promoting a DevOps culture.
Dashboards has helped Delivery Hero connect technical metrics to cost optimisations. "Many tools provide a lot of technical telemetry," Nitzsche says, "but New Relic can connect that with business metrics and costs."
Nitzsche says another huge benefit of New Relic is that it allows him to view the team’s infrastructure consumption and then use that data to optimise their environment. In fact, his biggest revelation came when he realised he could use New Relic to optimise the size of the Pandora Kubernetes cluster.
Before using New Relic, it hadn’t been easy to say which applications consumed which cluster resources. For example, in July 2018, Pandora’s biggest app used 700,000 distributed compute units; but after monitoring it with New Relic, they optimised it to use only 200,000 units—that’s a 71% reduction in costly resources!
"Without New Relic, we wouldn’t have known where to start’, Nitzsche says. ‘We now use less than half of the compute units we used a year ago. It was an eye-opener in terms of how much visibility we could get into our AWS consumption."