New Relic Now Start training on Intelligent Observability February 25th.
Save your seat.
Por el momento, esta página sólo está disponible en inglés.

Prometheus is one of the big powerhouses in the realm of open-house monitoring solutions. Over the years, the platform has become synonymous with efficient, scalable, and flexible monitoring practices, emerging as a go-to solution for organizations seeking insights into their systems. But what is Prometheus? And how does it work? Let’s get into it.

What is Prometheus?

Prometheus is an open source monitoring solution written in Go that collects metrics data and stores that data in a time series database. It was originally built by SoundCloud in 2012 and became part of the Cloud Native Computing Foundation (CNCF) in 2016. It uses PromQL, a powerful query language for querying your time series data.

How does Prometheus work?

Prometheus scrapes metrics data from HTTP endpoints and then pushes that data into a database that uses a multidimensional model. Read How to monitor with Prometheus to learn more about the push model (and how it’s different from the pull model), the multidimensional model, and how data is collected. 

Or, here’s a simplified three-step breakdown of how Prometheus works:

  1. Data collection and retrieval: Prometheus follows a pull-based model, where it periodically scrapes data from the targets (applications, services or infrastructure components) that are instrumented with its client libraries. The targets expose metrics and endpoints, allowing Prometheus to gather relevant information.
  2. Data storage: The collected metrics are stored in a time-series database, providing a historical record of system performance over time.
  3. Service discovery: Prometheus utilizes discovery mechanisms to ensure that new instances are automatically detected and monitored without manual intervention.

What is PromQL?

PromQL, which is short for Prometheus Query Language, is a functional tool which features allows  you to select and aggregate time series data. It’s a flexible, powerful language that allows you to slice and dice your data however you need.

PromQL use cases

You can use instant vectors to query data from a single point in time, while using range vectors to query data over a time range. You can query a basic metric such as http_requests_total and then filter your metrics further through key-value pairs with regular expression matches.

You can also query your data directly in New Relic with PromQL-style queries.

Prometheus features

Prometheus offers a range of features to make team processes a whole lot easier. These unique features are designed to provide comprehensive visibility into the health and performance of your systems.

These are the key features of Prometheus:

  • Multi-dimensional data: Prometheus uses a multi-dimensional data model to represent time-series data. This model allows flexibility in organizing and querying metrics based on various dimensions like job, instance, and labels, enabling detailed analysis.
  • PromQL: As discussed, Prometheus uses a powerful query language to slice and dice collected time series data, allowing flexibility in monitoring diverse systems and services.
  • Alerting rules: Prometheus allows you to define alerting rules based on specified conditions. If the system detects that a predefined condition is met, it will trigger an alert, letting teams know about potential issues before they impact users.
  • Data visualization through integrations: Prometheus is often used with other tools, like Grafana (an open-source analytics and monitoring platform). Grafana allows users to create visually appealing dashboards and reports based on the data collected by Prometheus. Additionally, Prometheus integrates seamlessly with Kubernetes, making it a preferred choice for monitoring containerized environments.
  • Scalability and federation: Prometheus can be deployed in a federated setup, enabling multiple Prometheus instances to collaborate. This allows for scalability making the platform suitable for large and distributed architectures.

These features, among others, collectively make Prometheus a popular choice for monitoring modern, cloud-native architectures, providing users with the tools needed to gain deep insights into their systems' behavior and performance.

 

When to use Prometheus?

Prometheus is a highly-reliable open source tool that can be used to monitor any part of your application, including microservices. Because it is vendor-neutral and has a rich open-source community of developers and contributors, you can use it to monitor almost your entire application, including the frontend and backend, servers and hardware, and even infrastructure like a service mesh, as mentioned earlier.

Many open-source tools, such as Istio (service mesh) and CoreDNS (default DNS for Kubernetes) have native Prometheus endpoints. To monitor services that don’t use HTTP endpoints, such as hardware, you can use exporters.

As an open-source tool, Prometheus has other major advantages—it’s free, its code is available on GitHub, and its toolkit is readily customizable.

Prometheus includes AlertManager, which groups and deduplicates alerts before sending out categories of alerts as a single notification. That means less alert fatigue—and you won’t be flooded with constant alerts during an outage.

It's also highly reliable because its servers are independent, which means that servers can continue functioning even when part of your system is down. This is especially important because you need your monitoring system to remain functional during outages.

 

When not to use Prometheus?

A good engineer knows that it’s not just about using good tools—it’s also about using the right tool for the job. Prometheus is very good at what it does, but it’s not intended to be an all-in-one platform for all of your observability needs. 

Here are some examples where you’ll benefit from using another tool. Note that even when another tool is a better fit for a use case, you can still use Prometheus alongside it because it's often the right tool for monitoring a service.

Long-term data storage:

Prometheus isn’t intended for durable long-term storage. You can use an observability platform or another storage source for long-term storage. For instance, New Relic provides extended storage for up to 13 months for dimensional metrics.

When you need 100% accuracy:

Prometheus prioritizes reliability over accuracy. According to the CAP theorem, you can only have two of three in a distributed system: consistency (accuracy), availability (reliability), and partition tolerance (data collected on separate servers). Since distributed systems always need partition tolerance, there is a tradeoff between reliability and accuracy. While the tradeoff is fairly small, when you need 100% accuracy (such as with a billing system), you’ll need to use another system.

Automatic setup for your environment:

An observability platform can automatically detect services and instrument them so you get observability in minutes. With Prometheus, you need to configure different services. That includes adding configurations to scrape specific HTTP endpoints and setting up exporters for services that don’t use HTTP endpoints. For large distributed systems, that’s a lot of work.

New Relic provides both long-term storage and helps you automatically set up your environment, so you can combine New Relic with Prometheus to benefit from both tools.

What is Prometheus monitoring?

An open-source monitoring and alerting toolkit that can be used for monitoring highly dynamic service-oriented architectures and hardware.

What can you monitor with Prometheus?

We have a whole article on monitoring with Prometheus, but if you just want a refresher on what Prometheus can monitor, the short answer is: almost everything. 

Frontend:

Use Prometheus to monitor application metrics like throughput (TPS) and response times. In particular, the Prometheus blackbox exporter enables you to perform an uptime check and monitor website status.

Backend:

Monitor databases, APIs, and HTTP request status on your endpoints using Prometheus. For instance, you can use the toolkit to track REST API metrics like request latency, log events, and error rate per API. You can also use Prometheus to monitor JVM applications via the JMX Exporter.

Servers:

Leverage Prometheus Blackbox Exporter to track server KPIs like average response time. In addition, you can monitor your operating system to understand your server CPU utilization or its hard disk usage. You can also use Apache Prometheus exporter to monitor Apache web server.

Hardware:

With Prometheus Node Exporter, you can monitor hardware and kernel metrics on Linux and any other Unix system. Some of the hardware metrics you can track include CPU usage, disk utilization, network bandwidth, and memory.

Infrastructure:

Prometheus can monitor your infrastructure and applications at multiple levels, including hosts, the application itself, and any containers. For instance, you can monitor the performance of MySQL and identify any issues with the database using the Prometheus MySQL exporter. Another example is the ability to track the throughput and response times of the Kafka load generator, Cassandra client, and Kafka consumer.