When working with distributed logging for the first time, a developer’s first instinct might be to send application logs directly from their application to a logging backend. A direct connection is appealing: afterall, there are fewer moving parts to manage. Such communications typically happen over transactional REST APIs, giving devs a false sense of security that all their logs will get through.
Unfortunately, there are three points of fragility with this model:
- Backpressure on HTTP requests can disrupt the normal functions of instrumented code, particularly if logs reach an unexpected size or rate.
- When sending data to the logging backend, latencies can cause lags in the transmission of log data.
- Network connectivity issues can lead to lost logs, which is especially troubling when agent logs from an APM-instrumented application correlate with an outage.
A central principle of site reliability engineering is the constant observation of system telemetry to make systems less fragile. Decoupling your log forwarder from your application can definitely make your system less fragile: When you decouple your forwarder from your application code, you’re able to keep complex processing computations in a separate process, and more frequently update code or configurations for your forwarder without worrying about affecting an underlying application.
Additionally log forwarders have built in memory or file based buffering that provides critical flexibility for all manner of latencies and interruptions between the application data center and the logging backend.
Log forwarders also can be:
- Extended to support a wider variety of network protocols and output formats than underlying application code
- Separated at the infrastructure level (i.e., you can run them on different hosts or in different containers)
- Load balanced
- Used in complex pipelines to achieve upstream aggregation and durable buffering
In this post, I’ll share five enterprise-ready patterns for using log forwarders to deliver logs to a logging backend, such as New Relic logs—a central component of New Relic. These patterns will give you a general understanding of the practical choices you can make to reduce fragility in your overall log pipeline by reducing latency, error, and saturation. My goal is to demystify the process of distributed logging and provide practical patterns you can use today.
Choosing a log forwarder
Selecting a log forwarder is a key component of this process because you'll want to make sure it meets the requirements of your system and is easy to manage.
Logstash (part of the ELK stack), Rsyslog, and Fluentd are common and relatively easy to use log forwarders. Newer forwarders like Wayfair’s Tremor and Timber.io’s Vector were built for high-performance use cases, but it’s well beyond the scope of this document to compare and contrast all of them.
The New Relic Infrastructure agent supports log forwarding by means of a Fluent Bit extension. The configuration is generally compatible with Fluent Bit syntax. It’s convenient to use that built-in forwarder to send out logs if you deploy the Infrastructure agent. In this post, my example configurations will use Fluentd and Fluent Bit as standalone forwarders. You can use these in scenarios where you can’t install the Infrastructure agent or you want a centralized forwarding layer to handle multiple distributed sources.
Note: In addition to Fluentd and Fluent Bit, Logstash, and Vector also provide plugins for New Relic log management. And other forwarders not listed here may have flexible configuration options for enabling them to send logs to any backend, including New Relic.
Consider these characteristics to determine if you need Fluentd or Fluent Bit:
Pros |
Cons |
|
---|---|---|
Fluentd |
|
|
Fluent Bit |
|
|
Install Fluentd or Fluent Bit
New Relic log management offers a fast, scalable log management platform that allows you to connect your log data with the rest of your telemetry data. Pre-built plugins for Fluentd and Fluent Bit (and others) make it simple to send your data from anywhere to New Relic.
See the New Relic documentation for installation and configuration instructions:
The following examples assume you want to forward logs to New Relic, in which case you’ll need a New Relic license key. If you don’t yet have a license key but want to test basic forwarding functionality, you can download a forwarder (see the links below) and configure it to write to an output file for testing purposes.
NOTE: The forks of these tools that are built for package managers follow the naming convention of td-agent (Fluentd) and td-agent-bit (Fluent Bit).
Fluent Bit packages:
Fluentd packages:
Five enterprise patterns for forwarding logs
Now let’s look at five patterns for forwarding logs to New Relic or whatever backend you want to use). For each pattern we’ll look at the pros and cons, and I’ve also included some example configurations.
Pattern one: Co-located forwarder with a file tailer
In this pattern, you’d use a file tailer to watch one or more log files and send new lines to your logging backend as they’re written. The forwarder sits between two applications on an application host. Most forwarders have rich configuration options to determine exactly how the tailing works and what kind of buffering will be used when posting to the logging backend. For additional scalability, a co-located forwarder can also forward onto an off-host forwarder layer (similar to pattern three discussed below).
Pros of pattern one
- This pattern scales automatically with your application infrastructure, as there is one forwarder per unit of infrastructure.
- Since this pattern uses log files, you can forward logs from legacy applications that you may not be able to rebuild with modern logging tools.
Cons of pattern one
- Each forwarder can consume considerable compute resources, and this can add to the systematic over-deployment of your application infrastructure.
- Configurations for the co-located forwarder becomes a dependency for each application’s configuration, which can add to the complexity of application configuration and deployment.
- At points of peak load, log files can grow so large that that file tailer (as well as log rotate utilities) may not be able to keep up, causing log lag and possible storage issues.
Example configuration for pattern one
Note: The examples here (and in the rest of this post) refer to a Python APM agent logger configuration. For an example of how to fully configure logs in context with the Python agent, see the documentation. In addition, see the official python documentation on setting up logging handlers to ensure you set up your output correctly.
. . # Instantiate a new log handler handler = logging.FileHandler(‘/var/log/app-a.log’) . .
Fluentd
<source> @type tail <parse> @type none </parse> path /var/log/app-a.log tag app-a </source> <source> @type tail <parse> @type none </parse> path /var/log/app-b.log tag app-b </source> <match **> @type newrelic license_key <NEW_RELIC_LICENSE_KEY> base_uri https://log-api.newrelic.com/log/v1 </match>
Fluent Bit
[INPUT] Name tail Path /var/log/app-a.log [INPUT] Name tail Path /var/log/app-b.log [OUTPUT] Name newrelic Match * licenseKey <NEW_RELIC_LICENSE_KEY>
Pattern two: Co-located forwarder using sockets
In this pattern, your application code will send logs directly to your forwarder over a UDP or TCP port (no files are stored), and the forwarder will in turn forward them asynchronously to New Relic (using the New Relic Log API). The forwarder sits between your app and the New Relic backend and provides a minimal buffer and processing layer.
Note: It’s beyond the scope of this post to provide guidance on whether you should use UDP vs TCP. In general, UDP will have the minimal impact on your application, but the protocol has a lower delivery guarantee. Most high-volume log environments will eventually gravitate toward UDP for sending logs to a forwarder for this reason. Certain security applications, such as a SIEM software, will still tend toward using TCP to ensure completeness.
Pros of pattern two
- This pattern scales automatically with your application infrastructure, as there is one forwarder per unit of infrastructure.
- This pattern uses socket protocols for log input, so there is no need to store or rotate files. And since the forwarder is co-located, the network overhead is low.
- One forwarder can receive logs from any application that can access it over the network interface.
Costs of pattern two
- Each forwarder can consume considerable compute resources, and this can add to systematic over-deployment of your application infrastructure.
- Configurations for the co-located forwarder become a dependency for each application’s configuration, which can add to the complexity of application configuration and deployment.
- You’ll have no physical log file that you can explore to troubleshoot applications on the host (unless you write that as a different configuration).
- TCP protocol can still cause back pressure into the application.
- You’ll need to tune TCP and UDP kernel parameters for this use case.
- You’ll need to monitor system telemetry, such as:
- UDP Buffers (for UDP)
- UDP Buffer Receive Errors
- TCP Errors
Example configuration for pattern two
UDP
# UDP Example . # Instantiate a new log handler handler = logging.DatagramHandler(‘localhost’, 5160) . .
TCP
# TCP Example . # Instantiate a new log handler handler = logging.SocketHandler(‘localhost’, 5170) . .
Fluentd
<source> @type udp <parse> @type none </parse> tag udp_5160 port 5160 bind 0.0.0.0 </source> <source> @type tcp tag tcp_5170 <parse> @type none </parse> port 5170 bind 0.0.0.0 </source> <match **> @type newrelic license_key <NEW_RELIC_LICENSE_KEY> base_uri https://log-api.newrelic.com/log/v1 </match>
Fluent Bit
Note: UDP is not supported as a built-in plugin for Fluent Bit.
[INPUT] Name tcp Listen 0.0.0.0 Port 5170 [OUTPUT] Name newrelic Match * licenseKey <NEW_RELIC_LICENSE_KEY>
Pattern three: Separately located forwarder using sockets
In this pattern, your log forwarder is located outside the application host. Your application code will send your logs to your forwarder over a UDP or TCP port, and the forwarder will in turn use the Log API (encapsulated in its New Relic output plugin) to forward them to New Relic.
Moving your forwarder into separate infrastructure gives you an economy of scale regarding compute resource utilization, and centralizes configuration and maintenance to the log forwarder infrastructure. With this pattern, different application families can send log data into the same infrastructure pool. You could even use different ports or protocols, as well as pattern matching, to do custom handling of logs coming from different sources if needed.
Pros of pattern three
- In this pattern, you have specific infrastructure provisioned specifically to handle logs; you don’t need to provision the forwarder as part of your application infrastructure.
- You can send logs from different application pools into the same log forwarding pool.
- You’ll eliminate backpressure since you can scale your forwarder independent of the application.
- You can use a number of powerful methods for achieving a durable buffer; for example, you could use Apache Kafka to store logs before shipping them to New Relic or your logging backend.
Cons of pattern three
- You’ll have no physical log file that you can explore to troubleshoot applications on the host (unless you write that as a different configuration).
- You’ll have to maintain a new class of infrastructure with independent configuration.
- You can still run into the issue of a “hot sender” application that can overwhelm one forwarder in the forwarder pool. (In the next pattern, I’ll show how to eliminate this concern using a load balancing layer in front of your forwarder pool.)
Example configuration for pattern three
Your Python agent logging configuration will be nearly identical to pattern two, but it will be necessary to use the public IP or DNS name of the forwarder host:
handler = logging.DatagramHandler(‘forwarder1.host.mycompany.com’, 5160)
Pattern four: Separately located forwarder with load balancing
As in the previous pattern, the forwarder layer is located outside the application. However, in this case, the forwarder layer is installed behind a load balancer layer. Your application code will send logs to a load balancer layer in front of the forwarder layer usually over a UDP port, and the load balancer will send the data to an appropriate instance of the forwarder based on common load balancing rules (for example, round robin). Each forwarder is configured identically to the forwarder in the previous pattern.
Fundamentally, there is nothing different about the configuration of this forwarder pool except the load balancer layer. In this case an Nginx load balancer will ensure that an application process won’t overwhelm a particular forwarder. Your applications will send logs using a round robin DNS resource associated with the Nginx UDP load balancer.
Pros of pattern four
- This pattern allows for massive scalability.
- This is well suited for a multi-tenant log forwarding infrastructure.
- You’ll get high availability of your log forwarding infrastructure.
- Logging “spikes” from “hot senders” will be better distributed so one application that may be sending an excessively high volume of logs can’t clog the forwarding endpoint.
Cons of pattern four
- You’ll need to maintain a new class of infrastructure with independent configuration.
Example configuration for pattern four
Your Python logging configuration will be nearly identical to patterns two and three, but it will be necessary to use the DNS name associated with the Nginx load balancer.
handler = logging.SocketHandler(‘loglb.mycompany.com’, 5160)
Configure your load balancer as follows:
# Load balance UDP‑based DNS traffic across two servers stream { upstream dns_upstreams { server <IP of forwarder 1 of 3>:5160; server <IP of forwarder 2 of 2>:5160; server <IP of forwarder 3 of 3>:5016; } server { listen 5160 udp; proxy_pass dns_upstreams; proxy_timeout 1s; proxy_responses 1; error_log logs/log-lb.log; } }
Pattern five: Log interpretation and routing
This set of patterns can be mixed in with any of the other data transfer patterns we’ve covered. For most enterprise use cases, you’ll need to enrich, filter, and appropriately route your logs. Implementations of these patterns tend to be forwarder dependent, but most of the common forwarders will support these patterns. (For more background on how log events are processed, I’d recommend Life of a Fluentd event.)
Pros
- This set of patterns allows for massive scalability.
- This pattern mitigates anomalous spikes in rate and log size without crashing your forwarders.
Cons
- These patterns provide a layer of complexity, as dropping and routing log data can obfuscate the logstream.
- As you enable various types of filters, forwarders can consume a lot of your CPU and RAM resources. Be sure to use filters in a sensible way. If you’re using more than one set of filters, always run the filters that exclude the most data first, so you don’t unnecessarily process data that's going to be dropped anyway.
Let’s take a look at some examples of filtering, enriching, and routing.
Filtering
Select which logs to include in a log stream.
Fluentd supports the match directive for each output plugin. The match directive looks for events with matching tags and processes them.
- Allow all
<match **> … New Relic account A.. </match>
- Allow only records tagged as being from certain applications
<match app.customer_info> ... </match>
One drawback to note: If there's any intermediate processing in the forwarder, each record will receive that processing overhead even if you discard that record away. You can also filter within the Fluentd event processing pipeline, which allows you to discard unnecessary records as soon as possible.
<source> … input configs… tag app-a </source> <source> … input configs… tag app-b </source> <filter app-a> … expensive operation only for app-a… </filter> <match **> … output configs… </match>
Another extremely useful pattern is to drop or exclude unwanted content. You could use an <exclude>
filter to drop certain logs from your stream, such as logs containing personally identifiable information (PII).
<filter **> @type grep <exclude> key message pattern /USERNAME/ </exclude> </filter>
Enriching
Add or alter content to an existing log in a stream.
Enrichment generally entails adding or updating an element to the record being processed. Fluentd provides a number of operators to do this, for example record_transformer
.
# add host_param to each record. <filter app.customer_info> @type record_transformer <record> host_param "#{Socket.gethostname}" </record> </filter>
These elementary examples don’t do justice to the full power of tag management supported by Fluentd. See the rewrite_tag_filter
documentation for some great examples of how to use the rewrite_tag_filter
plugin to inject a tag into a record, which allows for great flexibility in the downstream pipeline.
Routing
Send different types of logs to different backends (or to a separate New Relic account, for example).
Fluentd supports a number of powerful routing techniques that allow you to send different events to completely different backends. Two practical examples of routing are:
- Send logs from one application to a specific New Relic account but send logs from all other applications to a different New Relic account.
- Send logs to two different outputs: a) New Relic and b) a cloud storage bucket for long term archiving.
Final tip: Don’t overwhelm your system
In the modern context of SRE—where full stack observability is critical—it makes sense to devote considerable thought to how your upstream logging implementation will scale and how it can be resilient to various latencies and interruptions that may occur between data centers over complex wide area networks.
Under certain anomalous conditions, even well-behaved applications can suddenly start to emit logs at an unexpected size per log (multi-MB stack traces or object dumps) or at a rate that has been previously unforeseen (millions per minute). The patterns I’ve shown provide a substantial part of the toolkit needed to ensure those anomalies don’t overwhelm any single part of the system and cause unacceptable disruption in observability.
Próximos passos
If you're looking for more great New Relic logs content from our experts, don't miss How to set up logs in context for a Java app running in Kubernetes.
And be sure to sign up and start testing out New Relic log management today!
As opiniões expressas neste blog são de responsabilidade do autor e não refletem necessariamente as opiniões da New Relic. Todas as soluções oferecidas pelo autor são específicas do ambiente e não fazem parte das soluções comerciais ou do suporte oferecido pela New Relic. Junte-se a nós exclusivamente no Explorers Hub ( discuss.newrelic.com ) para perguntas e suporte relacionados a esta postagem do blog. Este blog pode conter links para conteúdo de sites de terceiros. Ao fornecer esses links, a New Relic não adota, garante, aprova ou endossa as informações, visualizações ou produtos disponíveis em tais sites.