This post was originally published on July 31, 2018.
With more than 43 million vehicles viewed per month, Dealer.com's site performance and availability drives business for thousands of automobile dealerships across the country. If the site is not available or experiences an error and shoppers can’t quickly and seamlessly get a quote or do their preliminary research on a new car or truck, the sale might never happen.
In complex software environments like that at Dealer.com, each request typically makes its way through dozens of discrete services. A single problematic service along the path can affect the overall response time for that request, change a good customer experience into a bad one, and potentially cause customers to look elsewhere.
Software teams, working in an environment where there are many services involved in servicing a request, need to deeply understand the performance of every service, both upstream and downstream, so they can more effectively resolve performance issues, measure overall system health, and prioritize high-value areas for improvement.
New Relic distributed tracing overview
New Relic Distributed Tracing is designed to give software teams working in modern environments an easy way to capture, visualize, and analyze traces through complex architectures, including architectures that use both monoliths and microservices.
“We’ve found New Relic’s distributed tracing to be super easy to integrate with. We simply updated our agent, and all of a sudden we had distributed tracing. It was a great experience.”—Andrew Potter, senior developer at Dealer.com, a Cox Automotive brand
Every customer with Full-Stack Observability or the older New Relic APM Pro subscription gets distributed tracing at no extra charge; you just have to update your agents and enable distributed tracing in your configurations. New Relic supports C, Go, Java, .NET, Node.js, PHP, Python, and Ruby, as well as New Relic Browser, Mobile, AWS Lambda, and Zipkin format traces through the Trace API, and assembles trace data collected across polyglot environments into detailed scatter chart and waterfall visualizations.
Understanding modern software complexity
To understand why distributed tracing is so important, it’s helpful to look at how software environments are changing. Modern software technologies such as cloud platforms, containerization, and container orchestration are helping forward-thinking software organizations more quickly build, scale, and operate business-critical applications.
Traditional software environments typically included just a few large services. When issues cropped up, these relatively simple, monolithic architectures made it easier to identify which service was at fault so developer teams could dig through transactions inside that service to find critical bottlenecks or errors.
Today’s applications are often composed of hundreds or thousands of separate services built on ephemeral infrastructures. Some of these services are large monoliths built with legacy technologies, while others are clusters of smaller dynamic microservices. Despite the many advantages of these distributed architectures, the exploding number of components and their diversity in language, operating environment, and ownership creates a huge new burden for teams trying to manage them. Teams can’t effectively work toward resolving issues in a complex system until they understand the full call graph for requests and how the performance characteristics of dependent services are impacting their services. They need a complete view of the entire system.
For DevOps teams, understanding how a downstream service “a few hops away” can create a critical bottleneck for their service is essential for fast problem resolution. Just as important, it also provides teams with insight on how to optimize their code. If DevOps teams can’t determine when, why, and how an issue happens, small defects may continue to linger in production until a perfect storm of events aligns and the system breaks all at once. Distributed tracing provides engineers with a detailed view of individual requests so they can point out precisely what parts of the larger system are problematic.
Distributed tracing: Creating a steel thread
As organizations evolve to a more distributed architecture, they soon discover the need for distributed tracing. As New Relic’s Erika Arnold explained in a blog post (The Difference Between Tracing, Tracing, and Tracing), we can describe distributed tracing as a way to instrument, propagate context, record, and visualize requests through complex, distributed systems.
Let’s look at how New Relic has implemented a solution that handles all four components of distributed tracing:
New Relic makes setting up distributed tracing easy by auto-instrumenting application code using language agents that work with hundreds of different libraries and frameworks across multiple languages. New Relic instruments each service involved in the request, whether it’s a monolith or a microservice, creates timings for operations within the service, and sends each measured operation as a “span” to New Relic’s platform.
New Relic automatically adds important troubleshooting information to each span. For example, when New Relic instrumentation creates a span representing a database query operation, it includes the database connection information and SQL query as attributes in the span. Customers using New Relic’s existing agent API to add custom attributes to transactions will see all their information in the trace as well, without changing anything.
New Relic’s distributed tracing solution automatically instruments your services to create a unique Trace ID for each incoming request. It propagates the Trace ID and other necessary correlation information as the “trace context” across the entire call. For example, when one service makes a call to another service, New Relic adds the trace context to the HTTP Request header for the next service to use. New Relic’s auto-instrumentation is designed to eliminate the hard work of managing and propagating context, but if you’re using a transport that requires manual instrumentation, the New Relic agent provides an API that allows you to inject and extract the trace context. New Relic agents use the W3C Trace Context format for propagating the trace context, making them interoperable with any other tracing agent or tool that also supports the W3C Trace Context standard.
New Relic agents send trace data to our Software-as-a-Service (SaaS) platform, where we ingest and store the data in the Telemetry Data Platform, the world’s most powerful telemetry database. Perhaps the least glamorous part of the system, this is where the hardest work gets done. New Relic already ingests and stores massive amounts of metrics, events, and other telemetry in a scalable platform so our customers can focus on building their business, not managing their monitoring platform. Because New Relic stores trace data for you in the Telemetry Data Platform, you can query trace data directly and create custom dashboards.
Finally, it all comes together in the New Relic user experience through trace visualizations designed to help you quickly understand why a specific request is slow, where an error originated, and where you can optimize your code to improve your customer’s experience. We do this by providing an advanced trace filtering capability and trace visualizations that bring together distributed tracing and New Relic APM.
Getting started with New Relic’s distributed tracing
2. Deploy the latest APM agent to each service involved in the call path you’re interested in, and enable distributed tracing in the agent config.
2a. Deploy the latest Browser agent.
2b. Deploy the latest Mobile agent.
3. The “Distributed tracing” menu in New Relic APM or the Distributed Tracing launcher in New Relic One will take you to the main “trace listing” view where you’ll be able to identify slow traces and traces with errors quickly. You can adjust the “time picker” to change the window of traces you want to view. You can use advanced filtering to find traces by a combination of attributes.
4. Dive into distributed traces to see how long each span takes. You can click into each span to see historical performance charts and associated attributes that layer in the context you need to understand and troubleshoot issues. You can also jump right into the correlated APM overview page for a specific service involved in the trace. Here you can see deeper transaction information and stack traces to resolve issues in that service.
5. With the New Relic One chart builder, you can run queries of your span data to create custom charts and other visualizations that are important to your team. You can also build custom dashboards containing multiple charts.
Here at New Relic, we understand that as modern software organizations evolve their environments, things are becoming more complex and difficult to understand and troubleshoot. New Relic’s depth of automatic instrumentation makes using distributed tracing easy, so you can quickly understand why a specific request is slow, where an error originated, and where you can optimize your code to improve your customer’s experience.
You can find out much more about distributed tracing in the New Relic documentation:
- Introduction to distributed tracing
- Transition guide
- Enable distributed tracing
- Using data
This post contains “forward-looking” statements, as that term is defined under the federal securities laws, including but not limited to future roadmap for distributed tracing as well as the benefits of such features. The achievement or success of the matters covered by such forward-looking statements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantial risks, uncertainties, assumptions, and changes in circumstances that may cause New Relic’s actual results, performance, or achievements to differ materially from those expressed or implied in any forward-looking statement. Further information on factors that could affect New Relic’s financial and other results and the forward-looking statements in this press release / post is included in the filings New Relic makes with the SEC from time to time, including in New Relic’s most recent Form 10-K, particularly under the captions “Risk Factors” and “Management’s Discussion and Analysis of Financial Condition and Results of Operations.” Copies of these documents may be obtained by visiting New Relic’s Investor Relations website at http://ir.newrelic.com or the SEC's website at www.sec.gov. New Relic assumes no obligation and does not intend to update these forward-looking statements, except as required by law.