World Fuel Services reduces troubleshooting from 1 hour to 15 minutes
Share this Story
Watch how World Fuel Services created a centralized platform for monitoring and observability with New Relic.
reduction in time to troubleshoot
minutes to identify errors with log management
applications monitored by 50+ engineers daily
- Company size
- Featured Use Cases
World Fuel Services is a leader in the global energy industry, delivering trusted energy solutions for aviation, marine, commercial, industrial, and land transportation customers. To support these solutions, 50 engineers run more than 100 applications for World Fuel. Together, they must work together effectively to build and maintain a platform that runs 24/7, 365 days a year to meet the world’s fueling needs.
“Imagine you're visiting the Google website and you find that Google is down,” says Sunith Ravindran, Vice President of Applications. “Similarly, if a pilot lands at an airport and needs fuel, our systems need to be up. We need to be able to figure out when something went wrong and react quickly to fix the issue. Ideally, we can be proactive and identify problems before they become an issue.”
Challenge: Minimize downtime and pinpoint issues faster
The engineering team engaged with New Relic to reduce downtime, identify and resolve issues faster, and improve infrastructure monitoring. World Fuel also worked with New Relic to migrate to the cloud.
Time-consuming process for monitoring
The less downtime the better. That’s a primary goal of the infrastructure team. If the infrastructure goes down, the applications and their features go down too. But when Vaidehi Chaukulkar, Cloud Engineer II, started at World Fuel, the team did not have an easy way to troubleshoot.
“Whenever we had any application performance issue, we had to log into the system, look at the logs, and then try to troubleshoot what the timeframe was,” says Vaidehi.
It took at least 15 minutes to log into their systems, collect data, investigate the logs, and then determine where the logs were going. But if the problem originated months or even years ago, the 15 minutes stretched to an hour plus. And without access to unified data across its applications, it was very hard to see the source, number, and timestamp of errors, further delaying resolution.
An accurate synthetic script for applications
When customers log in to the World Fuel application, it’s important that they don’t encounter any application errors. Errors can create delays in fuel orders, which can then snowball into massive travel delays for passengers.
To find errors before they affect users, Jenish Patel, Automation Engineer, runs a synthetic script.
“The synthetic script validates that a user can log into and use the application successfully,” says Jenish. “If a user might face a problem with the portal, the synthetic monitor should be smart enough to catch that.”
However, Jenish needed the ability to detect issues at any time across three regions—a technical challenge. Prior to New Relic, Jenish was able to build a smoke test to catch issues, but he was unable to run it every hour.
Solution: Full observability into the health of applications
With New Relic, the engineers at World Fuel have greater visibility into the health of their infrastructure and its applications. They’re able to pinpoint issues quickly with log management and alerts, and they can troubleshoot faster with a consolidated view of their data.
Troubleshoot quickly with a single pane of glass
The World Fuel infrastructure team has reduced troubleshooting from 1 hour to 15 minutes, with help from New Relic. A single view lets them see all their data, application, and underlying hosts in one view. Logging into each system individually is a thing of the past. Teams can drill down to the exact timeframe of any given problem, whether it’s the last hour or the last year.
“We get alerted if something is going above the threshold,” says Vaidehi. “We get notified when something's going wrong before things actually break down. Because we have all the data already connected in New Relic, we don't have to go in and look for it on the server. It helps us in our troubleshooting and it's reduced our mean time to resolution.”
New Relic has also helped the engineering team proactively monitor issues. The team, which works with an on-call rotation, works together better, too. The engineer on-call can quickly identify issues and reach out to the right members of the team for support, freeing up engineering time and providing peace of mind.
Fix issues for customers immediately with alerts and logs
Today, Jenish runs a synthetic script every hour across three different regions. With log management, the team gets all the information about a failure immediately. Alerts pinpoint the exact logs to look at, which helps engineering identify the error, its customer number, and its root cause to quickly and easily resolve it.
“Previously, it took around two hours to identify errors. Now, with New Relic, we can address issues within 30-60 minutes,” says Jenish. “We can look for the specific time a failure occurred and find errors for any given customer. We can then see what error occurred and what time it happened.”
Faster resolution in the synthetic script also means a better customer experience. Alerts have enabled the team to debug issues faster, giving them greater confidence when communicating with customers about issues and resolution.
Results: Peace of mind to software development and troubleshooting
The engineers at World Fuel work better with New Relic in place. New Relic has streamlined their work while generating a healthier engineering team culture. The engineering culture has moved from being reactive to being proactive, with engineers more confident they can deliver reliable solutions and quality service. Teams and their data are no longer siloed; any engineer can quickly access data across functions. As a result, the engineers at World Fuel are more confident, purposeful, and in control of their own software development.
- Environment: AWS Cloud
- Team organization: Squad-type model, with teams focused around assets or products
- Favorite New Relic features: Logs, APM, dashboards, and synthetic monitors
- Lack of visibility into performance of on-prem environments
- No consolidated view of application environments with disparate tools
- Lack of unified data on their applications delays troubleshooting
- Proactive monitoring and alerting
- Business KPI monitoring
- System-wide monitoring of application and infrastructure performance