Gett is a leading ground travel platform for businesses, and its advanced technology makes business ground travel simpler, safer, and more efficient. The company’s mobility software is transforming the way businesses move their teams by combining clients’ prefered ride-hailing apps and car services onto a single SaaS platform. In 2010 Gett launched one of the first-ever on-demand B2B mobility services, and the company’s software powers, among other clients, a third of the Fortune 500.
Rapid incident management is critical to the smooth running of Gett’s international operation. Understanding and fixing performance problems affecting drivers and riders is complicated by a dynamic microservices cloud architecture, but New Relic observability is helping a talented team more than meet the challenge and deliver a superior digital experience.
Meeting 99% SLAs
One of Gett’s major challenges is ensuring its technology is reliable and available to its drivers and riders at all times, at rates of more than 99%, especially when the business experiences unexpected spikes in traffic.
In these scenarios, it is crucial that the research and development team—which includes tech support and incident management—works closely with customer care on how technology is being developed, deployed, and monitored, and the impact that has on drivers and riders.
Getting a comprehensive and swift understanding of how everyone is experiencing the digital service in real time is of paramount importance. But five years ago, Gett didn’t have a proper tech support team, nor a precise incident management process with the right monitoring tools in place.
As Dani Konstantinovski, Global Tech Support manager at Gett, recalls: ‘When there was a problem, the first we used to hear of it was from the field. Drivers used to call our customer care team who then called us. It simply wasn't the best way to deal with putting out the fires.’
‘In my book, that’s a failure’, adds Lior Avni, Global Incident Manager, who works closely with Konstantinovski whenever there’s a critical incident that needs to be escalated.
Since then, Gett has invested significant time and resources to ensuring they deliver a superb customer experience.
‘We had many challenges before, which we dealt with over the years: organisation, mapping of services, missing alerts, things like that. One by one we took care of everything’, explains Avni. ‘So right now, the only two challenges are shortening the mean time to understand, and mean time to detect.’
When the team works 24/7, reducing mean time to resolve an issue is critical, but the size of the production environment presents real challenges, as Konstantinovski reveals: ‘We are working with a microservices architecture with close to 200 microservices in our system. When something goes down, there’s usually a butterfly effect and chain reaction, and we need to find the source quickly to put out what we at Gett term as “fires.”’
‘The breadth, length and width of our production system keeps on growing’, adds Avni. ‘The challenge is to monitor so many services and machines and get the work done in an organized way.’
Answering the microservices observability challenge
As a major AWS user, Gett had access in the past to different monitoring tools. Although these were helpful, those tools fell short of what was needed. Having full, real-time observability over these microservices was what drove Gett to choose and then expand their use of New Relic to improve incident management and how they delivered strong digital customer experiences.
‘New Relic makes our lives much, much easier. We can precisely identify the problem by jumping into New Relic to understand exactly what service is affected, what’s the reason, and what we need to do. With microservices, when one service is going down, you need to understand exactly what and where it is impacting. New Relic gives us this observability—and without it, this job would be very, very hard to do’, says Konstantinovski.