Prometheus is one of the big powerhouses in the realm of open-house monitoring solutions. Over the years, the platform has become synonymous with efficient, scalable, and flexible monitoring practices, emerging as a go-to solution for organizations seeking insights into their systems. But what is Prometheus? And how does it work? Let’s get into it.
What is Prometheus?
Prometheus is an open source monitoring solution written in Go that collects metrics data and stores that data in a time series database. It was originally built by SoundCloud in 2012 and became part of the Cloud Native Computing Foundation (CNCF) in 2016. It uses PromQL, a powerful query language for querying your time series data.
How does Prometheus work?
Prometheus scrapes metrics data from HTTP endpoints and then pushes that data into a database that uses a multidimensional model. Read How to monitor with Prometheus to learn more about the push model (and how it’s different from the pull model), the multidimensional model, and how data is collected.
Or, here’s a simplified three-step breakdown of how Prometheus works:
- Data collection and retrieval: Prometheus follows a pull-based model, where it periodically scrapes data from the targets (applications, services or infrastructure components) that are instrumented with its client libraries. The targets expose metrics and endpoints, allowing Prometheus to gather relevant information.
- Data storage: The collected metrics are stored in a time-series database, providing a historical record of system performance over time.
- Service discovery: Prometheus utilizes discovery mechanisms to ensure that new instances are automatically detected and monitored without manual intervention.
What is PromQL?
PromQL, which is short for Prometheus Query Language, is a functional tool which features allows you to select and aggregate time series data. It’s a flexible, powerful language that allows you to slice and dice your data however you need.
PromQL use cases
You can use instant vectors to query data from a single point in time, while using range vectors to query data over a time range. You can query a basic metric such as http_requests_total and then filter your metrics further through key-value pairs with regular expression matches.
You can also query your data directly in New Relic with PromQL-style queries.
Prometheus features
Prometheus offers a range of features to make team processes a whole lot easier. These unique features are designed to provide comprehensive visibility into the health and performance of your systems.
These are the key features of Prometheus:
- Multi-dimensional data: Prometheus uses a multi-dimensional data model to represent time-series data. This model allows flexibility in organizing and querying metrics based on various dimensions like job, instance, and labels, enabling detailed analysis.
- PromQL: As discussed, Prometheus uses a powerful query language to slice and dice collected time series data, allowing flexibility in monitoring diverse systems and services.
- Alerting rules: Prometheus allows you to define alerting rules based on specified conditions. If the system detects that a predefined condition is met, it will trigger an alert, letting teams know about potential issues before they impact users.
- Data visualization through integrations: Prometheus is often used with other tools, like Grafana (an open-source analytics and monitoring platform). Grafana allows users to create visually appealing dashboards and reports based on the data collected by Prometheus. Additionally, Prometheus integrates seamlessly with Kubernetes, making it a preferred choice for monitoring containerized environments.
- Scalability and federation: Prometheus can be deployed in a federated setup, enabling multiple Prometheus instances to collaborate. This allows for scalability making the platform suitable for large and distributed architectures.
These features, among others, collectively make Prometheus a popular choice for monitoring modern, cloud-native architectures, providing users with the tools needed to gain deep insights into their systems' behavior and performance.
When to use Prometheus?
Prometheus is a highly-reliable open source tool that can be used to monitor any part of your application, including microservices. Because it is vendor-neutral and has a rich open-source community of developers and contributors, you can use it to monitor almost your entire application, including the frontend and backend, servers and hardware, and even infrastructure like a service mesh, as mentioned earlier.
Many open-source tools, such as Istio (service mesh) and CoreDNS (default DNS for Kubernetes) have native Prometheus endpoints. To monitor services that don’t use HTTP endpoints, such as hardware, you can use exporters.
As an open-source tool, Prometheus has other major advantages—it’s free, its code is available on GitHub, and its toolkit is readily customizable.
Prometheus includes AlertManager, which groups and deduplicates alerts before sending out categories of alerts as a single notification. That means less alert fatigue—and you won’t be flooded with constant alerts during an outage.
It's also highly reliable because its servers are independent, which means that servers can continue functioning even when part of your system is down. This is especially important because you need your monitoring system to remain functional during outages.
When not to use Prometheus?
A good engineer knows that it’s not just about using good tools—it’s also about using the right tool for the job. Prometheus is very good at what it does, but it’s not intended to be an all-in-one platform for all of your observability needs.
Here are some examples where you’ll benefit from using another tool. Note that even when another tool is a better fit for a use case, you can still use Prometheus alongside it because it's often the right tool for monitoring a service.
Long-term data storage:
Prometheus isn’t intended for durable long-term storage. You can use an observability platform or another storage source for long-term storage. For instance, New Relic provides extended storage for up to 13 months for dimensional metrics.
When you need 100% accuracy:
Prometheus prioritizes reliability over accuracy. According to the CAP theorem, you can only have two of three in a distributed system: consistency (accuracy), availability (reliability), and partition tolerance (data collected on separate servers). Since distributed systems always need partition tolerance, there is a tradeoff between reliability and accuracy. While the tradeoff is fairly small, when you need 100% accuracy (such as with a billing system), you’ll need to use another system.
Automatic setup for your environment:
An observability platform can automatically detect services and instrument them so you get observability in minutes. With Prometheus, you need to configure different services. That includes adding configurations to scrape specific HTTP endpoints and setting up exporters for services that don’t use HTTP endpoints. For large distributed systems, that’s a lot of work.
New Relic provides both long-term storage and helps you automatically set up your environment, so you can combine New Relic with Prometheus to benefit from both tools.
What is Prometheus monitoring?
An open-source monitoring and alerting toolkit that can be used for monitoring highly dynamic service-oriented architectures and hardware.
What can you monitor with Prometheus?
We have a whole article on monitoring with Prometheus, but if you just want a refresher on what Prometheus can monitor, the short answer is: almost everything.
Frontend:
Use Prometheus to monitor application metrics like throughput (TPS) and response times. In particular, the Prometheus blackbox exporter enables you to perform an uptime check and monitor website status.
Backend:
Monitor databases, APIs, and HTTP request status on your endpoints using Prometheus. For instance, you can use the toolkit to track REST API metrics like request latency, log events, and error rate per API. You can also use Prometheus to monitor JVM applications via the JMX Exporter.
Servers:
Leverage Prometheus Blackbox Exporter to track server KPIs like average response time. In addition, you can monitor your operating system to understand your server CPU utilization or its hard disk usage. You can also use Apache Prometheus exporter to monitor Apache web server.
Hardware:
With Prometheus Node Exporter, you can monitor hardware and kernel metrics on Linux and any other Unix system. Some of the hardware metrics you can track include CPU usage, disk utilization, network bandwidth, and memory.
Infrastructure:
Prometheus can monitor your infrastructure and applications at multiple levels, including hosts, the application itself, and any containers. For instance, you can monitor the performance of MySQL and identify any issues with the database using the Prometheus MySQL exporter. Another example is the ability to track the throughput and response times of the Kafka load generator, Cassandra client, and Kafka consumer.
Next steps: Monitor Prometheus metrics with New Relic
You now have a better understanding of how to use Prometheus technology — how will you utilize it?
You can integrate New Relic directly with Prometheus through either the Remote Write or OpenMetrics integrations. The Remote Write option gives you a well-established Prometheus infrastructure with quick access to your metrics, whereas the OpenMetrics integration is more useful for visibility across multiple container platform configurations. If you don’t have a New Relic account yet, sign up for the free tier to get started, and then read more about the process of sending Prometheus data to New Relic, querying your data, and setting up alerts.
Take a deeper dive into New Relic’s Prometheus documentation.
이 블로그에 표현된 견해는 저자의 견해이며 반드시 New Relic의 견해를 반영하는 것은 아닙니다. 저자가 제공하는 모든 솔루션은 환경에 따라 다르며 New Relic에서 제공하는 상용 솔루션이나 지원의 일부가 아닙니다. 이 블로그 게시물과 관련된 질문 및 지원이 필요한 경우 Explorers Hub(discuss.newrelic.com)에서만 참여하십시오. 이 블로그에는 타사 사이트의 콘텐츠에 대한 링크가 포함될 수 있습니다. 이러한 링크를 제공함으로써 New Relic은 해당 사이트에서 사용할 수 있는 정보, 보기 또는 제품을 채택, 보증, 승인 또는 보증하지 않습니다.