Harkamal Singh, a Manager of Programmability in the Customer Solutions group, contributed to this post.
As the number of telemetry data sources continues to grow, software teams are finding gaps in their monitoring strategies. To mitigate this, more and more teams have turned to open source monitoring tools for collecting metrics, traces, and other telemetry data.
However, many open source telemetry tools require teams to operate and manage multiple complex layers—one for traces, one for metrics, one for logs, one for visualizations, and a database to store it all. These tools are also limited in terms of high availability, scale, and long-term storage.
As part of our continued effort to bring you a single place to ingest, analyze, visualize, and alert on any telemetry source on a single platform, we’ve released many APIs and SDKs for ingesting metrics into New Relic from virtually any service you’ve built. And now, using an output plugin built on the Golang Telemetry SDK, you can add Telegraf to your expanding toolkit.
In this post, we’ll show you how to ingest data with Telegraf and send it to New Relic as custom metrics via the New Relic output plugin for Telegraf.
About Telegraf
Telegraf is a server-based agent that collects metrics from inputs—applications, databases, message queues, and more—and writes them into outputs, like New Relic’s Metric API. Telegraf’s plugin-driven architecture and lightweight footprint—it requires no external dependencies like npm, pip, or gem—makes it a popular tool for collecting metrics from a variety of input sources; in fact, Telegraf has more than 200 integrations.
Since Telegraf is a compiled Go executable, and all plugins are compiled directly into the build, you don’t need to install any of the integration plugins—they are built in. Instead, you need only ensure the Telegraf version that you use has the plugins you need.
Here’s a quick look at some of the plugin types available from Telegraf:
Input source plugins
Most of Telegraf’s input plugins can be grouped as follows:
- Metrics from common open source data infrastructure:
- Metrics from common DevOps tools and frameworks:
- Metrics from common monitoring systems:
- Low-level OS system telemetry:
- iptables, Linux sysctl FS, Netstat, and many more.
Beyond providing instrumentation for such commercial and open source packages, Telegraf shines in its ability to ingest metrics from generic data sources, such as files and sockets. (We’ll see the benefits of this in our example configuration below.)
Telegraf supports the following generic sources:
- File - Consume the contents of an entire file
- Tail - Tail a file
- Socket Listener - Receive input from socket over TCP and UDP
- HTTP Listener - Receive POSTS over HTTP
- Http Poller - Periodically get data from a configured endpoint
- Exec - Execute a process and get metrics from stdout
- Execd - Execute a daemonized process
Input source formats
Using these generic input sources, you can configure generic data formats to receive and send data. When configuring Telegraf, consider the variety of input and output formats it supports:
- JSON: Parses a JSON object or an array of objects into metric fields.
- CSV: Creates metrics from a document containing comma-separated values.
- Graphite: Translates graphite dot buckets directly into telegraf measurement names, with a single value field, and without any tags
- CollectD: Parses the collectd binary network protocol
- Logfmt: Parses data in logfmt format
- Dropwizard: Parses the JSON Dropwizard representation of a single dropwizard metric registry
- Grok: Parses line-delimited data using a language similar to regex
- Nagios - Parses the output of Nagios plugins
Processor and Aggregator plugins
In addition to input and output plugins, Telegraf also supports processor and aggregator plugins, which provide powerful intermediate enrichment, transmutation, and aggregation to data coming through your system.
Our example won’t use these plugin types, but we highly encourage you to check out the Telegraf documentation to learn how to use these types in your workflows.
You can envision the whole pipeline as follows:
Sending metrics from Telegraf to New Relic
Now, let’s walk through an example where we ingest log data from a message queue and send it to New Relic as custom metrics.
If you want to follow along, be sure to have Telegraf version 1.15.0 installed. And you’ll also need a New Relic Insert API key for sending data to the Metrics API.
Our example will use the following components:
- Source: We’ll use the Tail input plugin, for tailing log lines from an input text file
- Format: We’ll use JSON and take advantage of some of the features that allow us to refine the pipeline’s data.
- Output: We’ll use the new New Relic output plugin.
Note: In this example, we’re using a legacy message queue system that doesn’t have a built-in input plugin for Telegraf; if we were using ActiveMQ or RabbitMQ, we could use those input plugins directly.
Here’s what our pipeline looks like:
Again, we’re bypassing processors and aggregators, but they are options if we want to take advantage of them.
Step 1: Define a metric
Before we go any further, it’ll help to understand a bit about InfluxDB's Line Protocol. An input plugin’s core function is to ingest one of a variety of formats and emit a logical representation of a line protocol. A line protocol consists of a measurement name (think of it as a namespace), a set of one or more fields (usually numeric metric values that can be thought of as gauges, timers, or counters), and, a set of one or more tags. These tags are dimensional metadata that allows you to facet, group, and otherwise aggregate metrics. Within a given measurement, the number of series is determined by the unique set of tag values in the set, giving you a sense of your metrics’ cardinality.
So if our example message queue outputs log files with the following JSON format…
{"timestamp": 1588100043050, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01E47020", "clientIpAddress": "36.1.90.2", "clientName": "JMS Producer", "queueManager": "QM1", "queueTopic": "MY.QUEUE.1", "putMessage": 3605} {"timestamp": 1588100044203, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01F0F77A", "clientIpAddress": "36.1.90.2", "clientName": "JMS Consumer","queueManager": "QM1", "queueTopic": "MY.QUEUE.1", "getMessage": 2000} {"timestamp": 1588100060150, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01E47020", "clientIpAddress": "36.1.90.5", "clientName": "Fin Producer", "queueManager": "QM1", "queueTopic": "MY.QUEUE.2", "putMessage": 400} {"timestamp": 1588100065285, "metricName": "messageQueue.operation", "product": "Legacy MQ", "productVersion": "8.0.1.11", "connectionId": "64688F5E01F0F77A", "clientIpAddress": "36.1.90.7", "clientName": "Fin Consumer", "queueManager": "QM1", "queueTopic": "MY.QUEUE.2", "getMessage": 400}
... we would parse this metric it in the following way:
- Measurement Name: derived from the JSON field metricName
- Tags: derived from the following JSON fields:
product
productVersion
connectionId
clientIpAddress
clientName
queueManager
queueTopic
Fields will be derived from any other JSON field (except for timestamp, which will not be included in the field set). In our example file, we have two different fields that may occur: getMessage
and putMessage
.
Step 2: Configure the Telegraf agent and plugin
Now, we’ll configure the Telegraf agent and the input plugin.
Configuring the agent
Before configuring inputs and output plugins, we need to set some basic parameters related to how Telegraf fetches, batches, and flushes data. It’s beyond this post’s scope to provide full optimization details, but note that all input plugins are subject to these collection parameters.
(See the Telegraf documentation for a full overview of configuration instructions and options.)
# Configuration for telegraf agent [agent] interval = "1s" # Collect input every 1 second metric_batch_size = 1000 # send 1000 metrics at a time metric_buffer_limit = 10000 # don’t let the internal buffer grow past 10000 flush_interval = "10s" # Flush every 10s or when we have at least 1000 metrics in the buffer
Configuring the input plugin
Here’s what our input plugin configuration looks like:
# Config for legacy MQ metrics [[inputs.tail]] # The file we want to tail files = ["/var/log/legacy-mq-metrics.log"] # Don’t reach back to the beginning (it may be a ton of data) from_beginning = false # This plugin automatically adds this tag, we don’t want to emit it. tagexclude = ["path"] # Method used to watch for file changes. Can watch by “inotify” or “poll” watch_method = "poll" data_format = "json" # This will be our Measurement Name json_name_key = "metricName" # Use the timestamp field for our metric timestamp, if omitted Telegraf will insert one automatically. json_time_key = "timestamp" json_time_format = "unix_ms" tag_keys = ["product", "productVersion", "connectionId", "clientIpAddress", "clientName", "queueManager", "queueTopic"]
Step 3: Test the input plugin
To make sure we won’t be sending “junk” to our metrics backend (i.e., New Relic), we can test our input plugin by configuring a file output. This simple configuration will allow us to see if our metrics are handled properly based on our input configuration.
# Send telegraf metrics to a file for debugging [[outputs.file]] ## Files to write to, "stdout" is a specially handled file. files = ["/var/log/metrics.out"] use_batch_format = false data_format = "json"
After restarting Telegraf, we get the desired output—a successful test.
{"fields":{"putMessage":3605},"name":"messageQueue.operation","tags":{"clientIpAddress":"36.1.90.2","clientName":"JMS Producer","connectionId":"64688F5E01E47020","product":"Legacy MQ","productVersion":"8.0.1.11","queueManager":"QM1","queueTopic":"MY.QUEUE.1"},"timestamp":1588448293}
Step 4: Configure the New Relic output plugin
Now we’re ready to configure our New Relic output plugin to send our Telegraf metrics to New Relic. We can have multiple output configurations, so we’ll leave the file output config for now.
Note: If you’re following the example, don’t forget to add your Insert API key where indicated.
[[outputs.newrelic]] insights_key = "[INSERT API KEY]" # we don’t need to send this as a field. The plugin will send a proper timestamp via the Metrics API. fielddrop = ["timestamp"]
Exploring our Telegraf metric data in New Relic
Finally, we’ll navigate to our account in New Relic where we can use chart builder to visualize our new metric data. Here we get three views of metric data from our message queue:
Max getMessage Operation Last 30 Minutes
Based on the data sent from our log file to New Relic, we can chart how many messages are in our queue. Specifically, this shows us the maximum number of messages for a GET
operation for any topic in our queue for the last 30 minutes.
NRQL query
SELECT max(messageQueue.operation.getMessage) FROM Metric where metricName = ‘messageQueue.operations.getMessage’ SINCE 30 minutes AGO
Max getMessage Operation Last 30 Minutes Faceted by Topic
As in the previous chart, this shows us the maximum number of messages for a GET
operation for any topic in our queue for the last 30 minutes, but we’ve used facet filtering to filter the results by the queueTopic
attribute.
NRQL query
SELECT max(messageQueue.operation.getMessage) FROM Metric where metricName = ‘messageQueue.operations.getMessage’ SINCE 30 minutes AGO TIMESERIES facet queueTopic
Max getMessage Operation Last 12 Hours Ago Faceted by Topic (show as table)
This chart shows us the maximum number of messages for a GET
operation for any topic in our queue for the last 12 hours. Here we facet this data by the queueTopic
attribute and a time grouping of one hour.
NRQL query
SELECT max(messageQueue.operation.getMessage) FROM Metric where metricName = ‘messageQueue.operations.getMessage’ SINCE 12 hours AGO AGO facet queueTopic, hourOf(timestamp)
Bringing open source telemetry into New Relic
As you begin to revise and automate workflows to get telemetry out of all the services and components in your architecture, it’s important to realize that telemetry data formats are heterogeneous, sometimes quirky, and may require a highly-refined and configurable toolkit. By partnering with open source tools like Telegraf, we aim to give you the confidence that New Relic can ingest all the telemetry your systems create into our telemetry database. From there, youcan then curate and view it within the context of your other data assets.
Alongside the New Relic output plugin for Telegraf, we’ve recently provided some open source integrations built on top of our Metrics and Traces APIs, including integrations for Prometheus, Open Census and OpenTelemetry, Micrometer, DropWizard, and Istio. In addition to those, we’ve added the New Relic Flex to our New Relic infrastructure agent, which allows you to build “codeless” integrations on top of the New Relic Infrastructure to collect metric data from a wide variety of services.
Sign up for a free New Relic account, and get started with Telegraf and New Relic today.
本ブログに掲載されている見解は著者に所属するものであり、必ずしも New Relic 株式会社の公式見解であるわけではありません。また、本ブログには、外部サイトにアクセスするリンクが含まれる場合があります。それらリンク先の内容について、New Relic がいかなる保証も提供することはありません。