No momento, esta página está disponível apenas em inglês.

Onefootball reduces incidents and frees up developer time

Região
Desafio comercial

There’s no doubt thatOnefootball and its employees are passionate about football—its headquarters boasts artificial turf, stadium-like seating for meetings, foosball tables, and goal posts. But the company is also passionate about the technology that powers the Onefootball user experience—it must be stable, reliable, and perform well at all times. If a fan has to wait too long to see a video with the best goals of the day, it’s not a good experience.

"The stability of the systems serving the user experience is core to staying competitive," says Holger Hammel, Onefootball’s vice president of engineering. "If we’re not able to provide fast, reliable service, that can be a showstopper."

A red card for scalability

Onefootball has been growing its user base rapidly, but as more fans flocked to the platform, the dramatic growth began causing scalability and stability issues, especially during major football events where usage would spike as much as fivefold. These issues frequently impacted the user experience, which included not only fans, but Onefootball’s newsroom team as well, with journalists unable to edit articles at times.

"The company was so successful in gaining more users that the system behind it couldn’t cope anymore," says Hammel. "It was clear that we needed to address scalability."

Onefootball had two main issues to resolve if it wanted to keep fans happy while continuing to grow. The first issue was improving scalability, reliability, and efficiency of its workloads on Amazon Web Services (AWS). The answer was to migrate to Kubernetes to automate scaling and management of containerized applications and microservices. Moving to Kubernetes would unify Onefootball’s stack, streamline provisioning and deployments, and enable rapid scaling to support spikes in application usage.

The second issue to tackle was visibility. "We had no application monitoring in the cloud," says Hammel. "When instabilities in backend services occurred, it took too long to find the root cause." He was also concerned that the lack of visibility was resulting in over-scaling of the AWS environment to compensate for performance issues, driving costs unnecessarily higher. According to Tiago Queiroz, software architect at Onefootball, "The only information we had were metrics in AWS that showed things like CPU usage, number of errors, and slow queries. But we couldn’t see what was going on in the application. Was it running slowly because of processing power? Was it because of the database? We didn’t have the information we needed to address performance issues."

Monitoring Kubernetes, applications, and infrastructure 

Although migrating to Kubernetes helped solve the scalability, reliability, and efficiency issues Onefootball was experiencing with its platform, it also created new complexity by adding a layer of abstraction between the applications and the underlying infrastructure. The new abstraction layer makes it more difficult to know what’s happening within the environment and inside applications. That’s why Onefootball turned to New Relic to monitor all of its applications as well as the Kubernetes environment. Onefootball uses New Relic to get deep visibility into the Kubernetes environment and connect what’s happening inside Kubernetes clusters to application performance and user experience.

"When you move to Kubernetes, the biggest challenge is understanding what’s going on at an infrastructure level," says Queiroz. "New Relic gives us that insight." Rodrigo Vieira Del Monte, DevOps engineer at Onefootball, agrees: "Today we run everything on Kubernetes, and with New Relic, we get insights about the applications and the environment in minutes," he says.

By delivering visibility across Onefootball’s stack, Hammel says that New Relic improves alignment across the various engineering teams, making it easier to share knowledge and collaborate in a DevOps culture. "Two important aspects of DevOps are end-to-end ownership and visibility," says Hammel. "This is where New Relic monitoring really helps, because as a developer, you can see what you deployed and the positive or negative impact of the deployment. That gives you the feedback you need to help you take ownership and improve software quality."

Addressing tech debt

While the migration to Kubernetes increased the efficiency of running Onefootball’s applications and enabled the company to more quickly and easily scale resources to support spikes in traffic, addressing technical debt would improve the user experience even more.

With the next major event still several months away, Hammel and his team decided to shift their focus to paying down the technical debt to further optimize reliability, performance, and scalability. "We knew that we wanted to add new product features, but first we chose to invest in making the system even more stable," he says. "That turned out to be a very, very good decision."

After conducting load tests on the applications and infrastructure and further optimizing the platform with insights from New Relic, Hammel and the team believed that they were capable of delivering the scalability and performance needed to support the dramatic increase in traffic for the global tournament. Despite their faith in the tests, they took no chances. Onefootball instrumented New Relic alerts for all of its systems. Then it created a cross-functional group to fix any potential issues at any point, with group members taking shifts during the tournament so that experienced engineers were available at all critical times.

New Relic was a star player during this time, helping Onefootball visualize user experience in new ways. "We could track hour by hour how many users we were acquiring, which got the attention of our Growth team," says Queiroz. Customized dashboards gave the engineering team continuous visibility into throughput, response time, transactions, and error rate of Onefootball’s critical applications.

80%
reduction in incidents
40%
of developer time freed up
K8s
migration support