Over the last two decades, a shift in technology has brought the power of distributed systems from specialized fields such as telecommunications into common day-to-day operations for many enterprises. Alongside this shift, a need arose to be able to understand and observe a distributed system at scale. With a distributed architecture with many dependencies, it can be complex and difficult to understand where a particular error or increase in latency caused an impact to your users. Let’s explore how we can use New Relic to understand difficult-to-track problems in a distributed world.
Indicators
Generally, a DevOps engineer cares about a few performance metrics, such as:
- Size of payloads (size)
- Time of a particular request (duration)
- Whether the request was successful or not (error)
Moreover, it’s common to enhance the above-mentioned metrics with additional attributes for improved debuggability:
- Upstream application: For tracking which of your upstreams generated this request.
- Trace Id For tracking an individual request through many systems.
If you’re running a multi-tenanted application, you may want to additionally include tenant related attributes, such as “userId” and “tenantId” for debugging individual requests for a particular customer.
A common method of introducing this information between distributed applications is to include them in headers when making requests. This allows downstream applications to understand who their clients are and semantics about that call. This pattern works well for both synchronous and event-driven architectures.
Instrumentation
There are many methods that New Relic offers to instrument your applications with relevant data. While a power user may utilize custom events, simpler use cases may be able to instrument “transaction” events with custom attributes. Let’s take a look at a few simple examples.
Ruby on Rails
In a Ruby on Rails application, you can easily add custom attributes to your transactions with the New Relic Agent; like in the following example:
class MyController < ApplicationController
def doAction
NewRelic::Agent.add_custom_attributes({
upstreamApplication: request.headers['x-application-context'],
traceId: request.headers['x-trace-id']
})
...
end
end
Java Spring
For Java Spring developers, the process is just as straightforward. Below is an example of how to achieve similar instrumentation in a Java Spring application:
@RequestMapping(
path = "/api/doAction",
method = RequestMethod.POST)
public void doAction(
@RequestHeader("x-application-context") String applicationContext
@RequestHeader("x-trace-id") String traceId
) {
NewRelic.addCustomParameter("upstreamApplication", applicationContext);
NewRelic.addCustomParameter("traceId", traceId);
...
}
Using the New Relic SDK, you’re able to quickly instrument information about upstream applications, which can be useful in debugging complex issues.
Distributed tracing
If you’ve enabled distributed tracing, these attributes and headers are automatically added by New Relic and propagated between your applications. You can simply utilize the built-in functionality of New Relic agents without needing to worry about instrumenting both your client and server with matching headers.
Observe
In New Relic, you can then issue a query such as the following:
FROM Transaction SELECT traceId, appName, upstreamApplication
Here, you can see the different transactions that occurred, and where they originated:
To track a particular erroring request throughout multiple services, we can issue a query filtering on a traceId. This will give you a table of the request path from the client to the server:
You can then use this to add additional attributes already present on transaction events, and identify that the error for an upstream application actually came from one of its dependencies. In this example, we can see that the error was actually caused at the “api-gateway” service, which resides in between the “web-dashboard” and “auth-service”.
Understanding this, we’re now able to investigate the erroring application in APM and see that the error occurred during a token refresh in the api-gateway service:
Using our pre-built Distributed Tracing UI, you can also see a map of application dependencies, as well as individual transactions and their dependencies, alongside key performance information such as the span’s duration:
Here, you can see how New Relic utilizes distributed tracing to track sign-up flows which end up calling multiple APIs on our user service, for example.
Conclusion
When you have a distributed infrastructure that’s composed of many microservices, debugging a simple error can be challenging. Adding in debugging information such as “traceId”, and “upstreamApplication” allows you to be able to quickly and efficiently track down the source of errors in a complicated mesh of system calls. Using New Relic distributed tracing makes that even easier.
Próximos pasos
Want to improve your visibility into distributed systems? Sign up for New Relic today and start tracking down complex issues with ease.
For more detailed guidance on using New Relic with distributed systems, check out the following resources:
Alternatively, you can also take our our self-paced course on distributed tracing.
Las opiniones expresadas en este blog son las del autor y no reflejan necesariamente las opiniones de New Relic. Todas las soluciones ofrecidas por el autor son específicas del entorno y no forman parte de las soluciones comerciales o el soporte ofrecido por New Relic. Únase a nosotros exclusivamente en Explorers Hub ( discus.newrelic.com ) para preguntas y asistencia relacionada con esta publicación de blog. Este blog puede contener enlaces a contenido de sitios de terceros. Al proporcionar dichos enlaces, New Relic no adopta, garantiza, aprueba ni respalda la información, las vistas o los productos disponibles en dichos sitios.