New Relic at re:Invent—our journey to AWS

Published 2021년 Dec 9일 3분 소요

In 2020, New Relic began migrating its telemetry data platform to AWS to accommodate explosive growth. At AWS re:Invent, Andrew Hartnett, Senior Director of Software Engineering at New Relic, shared how New Relic used a cell-based architecture to pave the way for long-term scalability and geographic expansion. To see the Unlocking scalability with cells: New Relic’s journey to AWS presentation, go to AWS re:Invent to watch on-demand content. You’ll learn more about the New Relic database (NRDB), how New Relic uses Kafka, and how New Relic successfully made the transition to AWS.

As Andrew discusses in the talk, “Our explosive growth led us to some serious scalability barriers. We found ourselves…with a monolithic single huge cluster in production running our entire business, over 100 engineering teams, 800 services, 1400 JVMs in the NRDB, all of this while ingesting 20 gigs per second. In 2018 and 2019, we had issues, some major outages."

New Relic uses Kafka to feed data through a pipeline composed of various services that aggregate, normalize, decorate, and transform data. Data can transit from a Kafka topic to a service 3 or 4 or 5 times before reaching its final destination.

As Andrew states, “We needed to scale up quickly. We had massive growth, coupled with increased demand for customers wanting their data led to those scalability barriers... Our Kafka was already one of the largest, if not the largest Kafka clusters in the world, and we were at the physical limitations of that Kafka cluster, so things were getting pretty hairy.”

To learn more about the transition, check out the presentation. Read on to discover what Andrew shared about key lessons learned for New Relic during the transition to AWS.

Observability is key

“It's really important to instrument every part of your system, including the system you're migrating from, the actual migration, and then the end system as well," Andrew advised.

"All of our systems are fully instrumented. Observability hugely accelerated our ability to make large scale changes and migrations. We knew exactly where and when bottlenecks appeared, and rather than looking at low-level host metrics, our teams had visibility into the topology of our systems and could look closely at anomalies."

"To manage and mitigate the risk of such large system change events, applications, owners, and architects need to have the ability to see their system in high cardinality and high dimensionality.”

This chart depicts the data volume flowing into NRDB broken down by environment during the transition to AWS.

Don't plan for the happy path

“Things can change. Customers can do things to you. Even the best laid plans will have surprises and discoveries. Plan for these surprises and discoveries along the way," Andrew said.

"One key was to start small and then iterate. Stick your toe into the water first. Try with your own data. Because we are actually one of our own largest customers, sending our own data first made sense.”

Continuously build new cells

“We have a goal of having an average cell life of 90 days," Andrew shared. "By doing that, you get the ability to make your changes with new cells instead of having to do it with cells that are dealing with customer load. AWS provides all the APIs and managed services we need to continuously build and decomm cells. So we do it. As a result, our platform reliability has improved significantly, and we've reclaimed engineering capacity that was previously spent on infrastructure.”

Make sure you communicate

“Over communicating becomes very important. Don't keep your customers in the dark," Andrew recommended.

"It's important to let them know the reasons why we're doing all of this. We may experience some bumps, but this is going to get better.”

다음 단계

Watch Andrew Hartnett’s presentation, Unlocking scalability with cells: New Relic’s journey to AWS at AWS re:Invent.

To learn more about the transition, read Transitioning to the cloud: New Relic's journey to AWS.

If you want to experience New Relic yourself, sign up for a forever free account.

프란츠 크누퍼(Franz Knupfer), 수석 매니저, 기술 콘텐츠 팀

프란츠 크누퍼(Franz Knupfer)는 뉴렐릭의 기술 콘텐츠 매니저로, 뉴렐릭에 입사하기 전에는 오리건주 포틀랜드에 있는 Epicodus 코드 스쿨의 커리큘럼 책임자였습니다.

이 블로그에 표현된 견해는 저자의 견해이며 반드시 New Relic의 견해를 반영하는 것은 아닙니다. 저자가 제공하는 모든 솔루션은 환경에 따라 다르며 New Relic에서 제공하는 상용 솔루션이나 지원의 일부가 아닙니다. 이 블로그 게시물과 관련된 질문 및 지원이 필요한 경우 Explorers Hub(discuss.newrelic.com)에서만 참여하십시오. 이 블로그에는 타사 사이트의 콘텐츠에 대한 링크가 포함될 수 있습니다. 이러한 링크를 제공함으로써 New Relic은 해당 사이트에서 사용할 수 있는 정보, 보기 또는 제품을 채택, 보증, 승인 또는 보증하지 않습니다.

780+ 개 통합을 사용해 무료로 스택 모니터링

모든 통합 보기

In this article