Elsevier consolidates tools and moves to the cloud

Region
Business Challenge

Elsevier started out by selling books—it still sells $300 million worth of books annually—but now people expect Elsevier to make all the data contained in those volumes available electronically, says Matt Reid, technology infrastructure and operations manager. "Doing so, however, requires a fair amount of transformation, and we couldn’t achieve that transformation within our existing technology footprint. We were sitting in a data center without the agility we needed. We didn’t have the people we needed. And our support functions were outsourced to third parties."

In 2013, when Elsevier CIO Dan Olley charged Reid’s group with removing those constraints and changing the way the company delivered technology, Reid and team quickly looked to the cloud for the agility, flexibility, and cost savings it could provide.

Consolidating tools

Four years later, Elsevier had migrated much of its environment—including 12,000 servers and more than 400 products—to Amazon Web Services (AWS). But along with the benefits provided by the cloud came a new set of challenges—chief among them was getting a single, unified view of performance from the new dynamic environment.

“In 2016 and 2017, we were about halfway through our cloud migration, and we'd given people a high degree of autonomy through this journey," says Reid. "What this actually meant was that we’d employed lots and lots of clever people who had preferences for lots and lots of different tools." This diverse toolset created a new set of problems because the many monitoring products were configured differently, which meant they were providing inconsistent outcomes. "In particular, we had a huge amount of white noise coming out of our infrastructure monitoring," says Reid. "Where every tool under the sun was sending alerts left, right, and center."

A time came to retool and consolidate. This was when Reid and team began considering New Relic.

Gaining a united view and deep insights

"We had three key goals," says Reid. "Understand our costs,  understand the performance and reliability of our products, and move to a DevOps model of development. New Relic’s story was compelling because they showed us not only how we could align their product with our technology, but also how we could use New Relic monitoring to facilitate a DevOps approach and gain much-needed insight into the reliability and availability of our products."

It didn’t take long for the Elsevier team to decide that the New Relic platform should be embedded in the organization’s operating model going forward. The first New Relic product the company deployed was infrastructure monitoring. "A lot of people asked us why we didn’t deploy New Relic APM first," says Reid. "The answer is that we knew New Relic infrastructure would give us the most immediate value because it filled the most immediate need. After deploying Infrastructure, we were able to reduce the white noise and regain operational efficiency by creating standard configurations, employing standard instrumentation, and doing standard reporting across our teams."

By deploying Infrastructure across its environment, Elsevier gained deep insight into its cost footprint and utilization. "One of the issues we'd suffered from historically was that we didn't understand the utilization of our nonproduction environments versus our production ones," says Reid. "That made it really hard to tell whether releases to our nonproduction environments were having an impact on our production environments. By deploying Infrastructure at a base level across the board, we could visualize and understand the impact of any changes before a release made its way to production. What’s more, Infrastructure provided the insight we needed to remove six or seven contracts."

As a result of New Relic, our developers can now see how their applications are performing from an infrastructure perspective, from an end-user perspective, and from within the application itself.

A DevOps future

While deploying Infrastructure, the Elsevier team was also busy rebuilding its existing tools for end-user monitoring into synthetics. It also deployed APM across the Elsevier environment. "As a result of New Relic, our developers can now see how their applications are performing from an infrastructure perspective, from an end-user perspective, and from within the application itself," says Reid. "A developer and a SysOps engineer can actually sit next to each other and diagnose and triage an issue. This simply wasn’t possible in the past."

It is just this sort of change that’s enabling Elsevier to meet its goal of moving to a DevOps model of continuous development and delivery. Says Reid, "We’re changing the way we work to become more agile as an organization, and New Relic has been a real catalyst in that effort. Thanks to New Relic, we now have something called a Dev Squad for each of our products, and included in those teams are DevOps engineers focused on both the development and the operational pieces."

Understanding the digital customer experience

In addition to providing application and infrastructure monitoring, the New Relic platform is also helping Elsevier get a much better read on how end users are experiencing its digital products. Take Reaxys, the company’s chemistry search application, which is used by researchers, students, and pharmaceutical companies around the world to answer chemistry-related queries. When shifting the application to the AWS cloud, the company needed to refresh the user interface and rework the front end into a single-page application.

Jonathan Snow, manager of software engineering for the Reaxys product, joined Elsevier just two weeks before the new cloud-native application was scheduled to go live and was shocked to learn that nobody knew how it would perform under load, and that they had no holistic view of how things worked. Piggybacking on Elsevier’s existing agreement with New Relic, and drawing on his own experience with the platform, he quickly deployed browser monitoring and APM.

With the New Relic platform in place, Snow and team immediately got the holistic view they were looking for, but they wanted to drill down even deeper to understand the page load speed, which was negatively impacting customer loyalty and increasing churn, as well as dragging down the Net Promoter Score (NPS).

What I love about synthetics is that it gives us a controlled, repeatable view of how our application is performing from all over the world—and it does so within New Relic’s single ecosystem of tools.

When it emerged that customers in China were experiencing drastically long load times for the application, Snow and team put synthetics to work. "We could see in New Relic that global load times were sitting at 12.5 seconds—which was already horrible—but that in China load times were as long as 35 seconds," says Snow. "While we managed to reduce load times globally, China was still at over 8 seconds. It wasn’t until we deployed synthetics that we were finally able to confirm that it was a network problem, i.e., that they were experiencing packet loss and DNS poisoning. Now the page load times for Chinese customers are around 1.2 seconds.”

"What I love about synthetics is that it gives us a controlled, repeatable view of how our application is performing from all over the world—and it does so within New Relic’s single ecosystem of tools," Snow says.

For Reid, the biggest benefit has come from the whole-system view of performance and democratization of data it has provided. "In the past, I I would get a 3 a.m. call about a problem, and the development engineer would tell me the application was performing perfectly, the network engineer would tell me the network was fine, and the infrastructure engineer would tell me that utilization was fine. But things were not fine, and the real challenge stemmed from the fact that they were looking at three different control planes. The step change with New Relic is that I can now bring a group of engineers together at 3 a.m. They’re all visualizing the utilization, performance, and reliability of the product, services, and network within the same environment. This is what has allowed us to deliver our transformation and start moving to that DevOps model."

It’s also what’s allowed the company to dramatically decrease issue resolution time; improve application, infrastructure, and website performance; and reduce costs. Explains Snow, "Today, our development teams can see the impact of their changes immediately across both production and nonproduction environments—which means we’re no longer spending hours and hours going from component to component trying to identify the root cause of issues."

And when teams are no longer spending all of their time investigating and resolving problems, developers can focus on developing, product teams can focus on products, and so on. As a result, performance goes up and costs go down. Best of all, customers are happier."After using synthetics and dashboards to drill into our data, we went from having customers in China threatening to cancel or reduce their contracts to increasing their contracts and also signing renewals," says Snow. "We've taken it to the point where we’re now actually growing our customer base. This is a remarkable transformation, largely due to insights derived from New Relic."

As Elsevier continues its modernization journey, Reid expects the benefits of New Relic to grow steadily. Already, the organization has made several hundred thousand dollars of cost savings by simply reducing the number of monitoring tools and eliminating the white noise that prevented swift issue identification and resolution in the past. The future looks even brighter."The exciting part for me," says Reid, "is that we’re starting to link our business goals and our technology goals, and we have a clear vision of how we can enhance our digital customer experience, enrich our products and services, and provide feedback between those two teams, thanks to New Relic."