Launch day. Years of effort have gone into this moment: You've obsessed over every detail of your product and worked tirelessly to prepare your technology stack. As you get ready to launch, the last thing you should have to worry about is your monitoring platform.

But all too frequently, launches are a white-knuckle moment, and complicated monitoring setups add to that. What if something goes wrong? Will you be able to find it in a sea of specialized tools, each holding only a piece of the puzzle? Are you recording the right information in the right format so that you can answer the questions that will come up? Did you build your data pipeline so that it can scale as its workload increases?

There is another way. We believe that monitoring should reduce anxiety instead of adding to it. All your data should be in one place and connected. Tools should help you answer questions quickly, even the ones you never thought you'd need to ask. And scaling up should be our problem, not yours.

This blog introduces you to the telemetry data platform behind New Relic—its design principles, how it achieves millisecond performance, and how it can support you on your biggest days.

Three design principles

As a pioneer of SaaS APM, we have over a decade of experience supporting our customers’ biggest deadlines, sporting events, and product launches year after year. That experience is manifest in everything we've built, and at the heart of it all is our telemetry data platform, the world's most powerful platform to analyze your operational data. The New Relic platform serves the needs of more than 180,000 accounts around the globe by ingesting over a billion telemetry data points every minute. Its unique power comes from adhering to three design principles:

  1. Observability requires a unified telemetry platform
  2. Real-time investigation requires both speed and flexibility
  3. Dynamic demand requires unlimited scalability

Observability requires a unified telemetry platform

By combining your metrics, events, logs, and traces in a unified platform, New Relic gives you a complete view of your technology stack, enabling you to identify, understand, and resolve the issues that impact your business. No more combing through multiple systems to hunt for needles in different haystacks, while minutes, or even hours, are wasted getting your systems back online. With the New Relic Query Language (NRQL), you get a single interface to explore all of your data.

Real-time investigation requires both speed and flexibility

Most databases require you to choose between speed and flexibility: You can get answers lightning fast, as long as you chose the right schema and indexes. Or, you can ask any question you want, as long as you are willing to wait for the answer. In today’s complex world of distributed systems, microservices, and ephemeral infrastructure, it’s impossible to predict every question you will need to ask of your data. When trouble strikes, you may need to answer questions that you had never thought about asking before, and you need those answers fast. That’s why we built New Relic from the ground up as a schema-less database that enables fast queries and queries formed ad hoc without requiring indexing in advance, so that you don’t have to choose between speed or flexibility—you get both.

How the New Relic achieves millisecond performance

Answering unindexed queries requires processing huge amounts of data, so we’ve optimized New Relic for speed and parallelization. Every second, our platform serves thousands of queries for our customers, who need answers stored in multiple terabytes of data. Moving all that data around to search through it doesn't make sense, so instead, we take the query to the data. Every New Relic query starts at a query router that locates the data in the cluster and sends the original query to hundreds, or even thousands, of workers to scan where the relevant data resides. To balance memory and IO needs in our multi-tenant cluster, very large queries are broken up into smaller pieces. Those pieces of the query are sent to other routers that deliver their partial queries to the workers holding the data. The in-memory cache provides the fastest results for recently executed queries, or the workers scan the data from disk for queries asked less often. As each worker reads its files to answer the query, the process is reversed. First, the results of each file are merged on a worker. Then, each worker’s result is merged through the routers recursively until the original router has all of the data, returning the completed answer to the user.

Dynamic demand requires unlimited scalability

We designed New Relic to scale without limits to support the unpredictable demand of our customers around the globe. As our customer base has grown over the past decade, from retail to entertainment, apparel to healthcare, and gaming to e-commerce, we've scaled our telemetry data platform, which minimizes the overall impact of local spikes in demand. New Relic ingests over twelve billion data points per minute, so when any customer experiences increased demand, it handles the incremental hundreds of millions of data points with ease.

How New Relic benefits you

With a lightning-fast median query response of 60 milliseconds and the ability to analyze over 50 billion events in a single query, the New Relic observability platform enables you to find the needles within your largest haystacks. And because of its multi-tenant architecture, our smallest customers benefit from the same massive computing resources as our largest users. Additionally, our platform delivers the following capabilities:  

  • Single query interface: Use NRQL to search all your telemetry data
  • Intelligence: Correlate insights across all your data sources
  • Performance: Query tens of billions of data points with results in milliseconds
  • Elasticity: Scale your business and trust your data retention will scale too
  • Predictable costs: Only pay for what you need