For New Zealanders, Lightbox is the place to go to watch the latest award-winning TV series, blockbuster movies, favourite children’s shows, and other entertainment on the device of their choice. Owned by New Zealand’s chief digital services company, Spark New Zealand, Lightbox is the leading local entertainment streaming business.
As anyone who has ever streamed a suspenseful drama or exciting action movie knows, there’s no room for problematic performance anywhere in the digital customer experience. And Lightbox knows as well that its competitive advantage is dependent on delivering the kind of experience that keeps customers streaming and renewing their subscriptions.
Maintaining reliability and performance is the highest priority for Mike Robinson, chapter lead of IT applications and development for Lightbox at Spark New Zealand. ‘Our infrastructure and applications must be in top form at all times’, he says. ‘Our number one priority is making sure everything is stable and performing well, even during peak usage.’
Going behind the scenes was a no-go
A third-party company created the core platform for Lightbox, which Lightbox runs on Amazon Web Services (AWS). The platform was delivered without instrumentation, which meant that Robinson and his team had little visibility into what was happening inside the platform. ‘Often, if we made changes, the platform would crash’, says Robinson. ‘We had no ability to see where the problems were originating.’
However, stability and service interruptions weren’t the only issues the Lightbox team faced. It also didn’t have the insight into application performance and resource usage in the AWS environment it needed to properly scale the environment to handle peak usage. ‘If you can’t see what’s happening at a detailed level in your environment, you can’t make intelligent resource choices because you don’t know how it will affect performance’, says Robinson.
Garnering high ratings for stability
After hearing about New Relic, the team at Lightbox were anxious to try it. ‘When we switched on New Relic, the insight we had into the current platform state meant we had to quickly shift all our focus to stabilizing the platform over a four week period.’ Says Robinson.
After successfully completing the stabilization, the team turned its attention to fixing additional bugs and optimising performance, says Robinson, ‘as everything started getting fixed, our overall stability improved significantly.’ Lightbox quickly deployed New Relic across its entire stack, including the website, backend infrastructure, middleware, testing environment, and across its various mobile and TV applications.
Finding an episode ten times faster
In addition to the stability issues the Lightbox team had discovered, it was also surprised to learn that the platform was constantly operating at peak capacity. ‘The “engine” was basically redlining every night and things were catching fire, but we weren’t aware until we deployed New Relic’, says Robinson.
‘Using data from New Relic, we improved performance to the point where we can now easily handle traffic increases of at least four or five times our current peak usage’, says Robinson. ‘For example, we reduced the response time on average from about 500 milliseconds to less than 50 milliseconds. That’s 10 times faster for normal users and as much as 30 times faster for others. We also improved video load time from as long as 40 seconds to as little as 2 seconds. And we did all this in just 4 weeks’ time. With New Relic delivering code-level information, we’ve been able to rapidly improve the customer experience for all our users.’
The Lightbox team used New Relic to determine that a major bottleneck was occurring because the platform was performing hundreds of queries to a database every time a certain transaction ran, which meant queries happened thousands of times a minute. ‘We thought, “What if we use caching so the application doesn’t have to query every time someone asks for a show?” That reduced the load on the database by a factor of 1,000’. New Relic makes it really easy to identify quick wins like this,’ he says.
Giving data a starring role in DevOps success
New Relic gives the Lightbox team confidence in its DevOps processes and resulting code that it didn’t have before. ‘New Relic helps us optimise the resources we use for end-to-end testing, identify and debug errors easily during tests, and understand performance and stability before and after changes’, says Robinson. ‘It gives us trust and confidence in automated deployments because everything has been vetted in testing using New Relic to track and understand any potential impact.’
Robinson credits New Relic with helping the team deploy faster, with greater quality. ‘The time between code going into master and going live decreased from 600 hours to less than 20 minutes’, says Robinson. ‘The lead time from when a ticket was entered to code going into production dropped from six to eight weeks to roughly one week. New Relic helped us dramatically increase productivity to reduce both cycle time and lead time while improving quality and stability.’
‘With New Relic, we moved from needing to monitor deployments, to having enough insight and trust in the platform that we could literally push the button and start working on the next feature. Visibility into the deep workings of the platform enables us to reconsider how things are architected, make decisions based on real data, and focus on what will deliver value.’