M&S shifts to digital-first retail with observability

Published 4 min read

M&S changed its traditional retail approach in 2018. Steven Gonsalvez, Principal Engineer at M&S, explains how his team pivoted to become a digital-first retailer.

Marks & Spencer is gearing up for a jump in online sales this shopping season. And online retail buying is only set to increase—NASDAQ forecasts that 95% of purchases will be online by 2040—which means we need to ensure we deliver customers the best user experience possible.

Omnichannel is the new normal

Omnichannel has been a topic for a long time: a customer might start their buying journey on mobile, then go onto their laptop, they may even come into the store. Yet many retailers can't offer a unified brand experience across all channels. This is increasingly important for customers, who are now just as knowledgeable about a product as the brand’s salespeople. Some 81% of consumers research a product before going to the store. The brand’s challenge now is to add value to that engagement. 

M&S dramatically shifted our traditional retail approach for the omnichannel consumer in the drive to be digital-first in 2018. To keep up with this digital transformation, my engineering team had to introduce new tech. This included agile app development and aggressive cloud deployments, with partners like Azure, to cope with evolving customer demands and the rise in traffic. My team also turned to DevOps, AppOps, and AIOps to manage the hundreds of microservices now used by the retail platform.

The pandemic played a key role in accelerating digital avenues with a massive shift to digital consolidation. Online sales are growing above market rates—massively above market rates. And it's where we'll grow for the foreseeable future.

Evolving with customer needs

Retailers need to personalise the customer experience to remain competitive. But a great customer experience needs to be fed data to succeed. Acquiring data, making sense of it, and acting on it, is necessary work if businesses want to continually improve customer experience.

For me and my team, this means understanding and observing every aspect of the customer journey: multicloud to serverless with AWS, Kubernetes, microservices, events, logs, and the huge waves of data now being pushed across the business. These were tough asks that prompted my team to turn to observability because classic monitoring was unable to cope. Independent teams had no overall picture of the tech stack and the demand for continuous, always-on services was at risk. If you do classical monitoring, you'll see that there are so many events in experimentation that's, really, a lot of false positives. And it's hard to go and find out what's wrong. Observability is key. You contextualize your anomalies. 

Today, observability is paramount to the M&S platform, allowing us to evolve with our customers. Refined dashboards were the first step. Numerous data sources from logs, events, metrics, traces, and instances were all incorporated in one place. Then, to gain a real picture of the M&S customer experience, my team added context, analysis, and aggregation via New Relic.

We need to use data for the tech level of the business. Averages and percentiles are not good enough anymore. You can meet the 75th percentile, but if you are affecting 25% of customers, or 65% of your high-profile customers, or the highest cost of baskets, then it's counterproductive in terms of percentiles. Percentiles and averages on their own can be misleading.

No more 2 am incident rooms

At M&S, the customer is the centre of all of our decision-making. We make decisions based on customer demands in a matter of seconds, not hours or weeks. This is especially true as we approach the busiest time of the year. 

New Relic has helped M&S reduce MTTR by one-third, a massive win. This improvement demonstrates the real value of observability—minutes of downtime and poor customer experience all relate to revenue, but they also impact long-term customer retention and lifetime customer value. Reducing MTTR also drastically changed our incident management processes. 

Before New Relic, we couldn’t easily identify the relevant team to an incident, and there could be up to 80 people trying to figure out the problem and who was accountable. Now, incidents can be resolved in minutes, the platform pinpoints the issue and the relevant team is notified. That means no more 2 a.m. incident rooms.

New Relic has helped M&S reduce MTTR by one-third, a massive win. This improvement demonstrates the real value of observability—minutes of downtime and poor customer experience all relate to revenue, but they also impact long-term customer retention and lifetime customer value. Reducing MTTR also drastically changed our incident management processes.