Prometheus is a powerful, open-source tool for real-time monitoring of applications, microservices, and networks, including service meshes and proxies. Monitoring with Prometheus is especially useful for Kubernetes clusters. Originally developed by Soundcloud in 2012, it’s now part of the Cloud Native Computing Foundation.

Prometheus has a dimensional data model, which is ideal for recording time series data—in other words, your metrics. It includes PromQL, a powerful query language that helps you drill down into those metrics. Monitoring with Prometheus also is extremely reliable because its servers are standalone and continue to operate even when other parts of your system are down. In this article, we’ll cover the basics of working with Prometheus.

Key takeaways:

  • Prometheus offers real-time monitoring for applications, microservices, and networks, with a dimensional data model and powerful query language.
  • A Prometheus server collects data via a pull model, supporting targets, jobs, exporters, and native endpoints.
  • Prometheus monitoring is ideal for container orchestration, microservices architectures, cloud-native applications, and infrastructure monitoring.
  • Prometheus integrates seamlessly with Grafana for visualization and observability platforms like New Relic for comprehensive monitoring.

Prometheus features

Some key features of Prometheus include:

  • Time-series data collection: Prometheus is designed to collect and store time-series data, making it easier to monitor changes and trends over time.
  • Multi-dimensional data model: The data collected by Prometheus is organized into a multi-dimensional model, allowing for efficient querying and filtering of specific metrics.
  • Flexible query language: Prometheus uses PromQL (Prometheus Query Language) which allows users to write complex queries to extract metrics data.
  • Alerting system: Prometheus has a built-in alerting system that can trigger alerts based on predefined rules and thresholds, helping identify and resolve real-time issues.
  • High scalability: Prometheus is highly scalable and can handle many servers and applications without compromising performance.
  • Support for multiple exporters: Prometheus supports various exporters, allowing you to monitor different applications and systems (e.g., databases, web servers) using a single tool.
  • Easy integration with Grafana: Prometheus can be easily integrated with Grafana, an open-source analytics and visualization platform, providing powerful insights into your monitoring data.
  • Active community support: Prometheus has a large community of active users who continually contribute to the development of the tool and provide support through forums and online resources.

How does Prometheus collect data?

A Prometheus server collects metrics data from its targets using a pull model over HTTP. But what exactly is a pull model, and how is it different from a push model?

Push versus pull model

On a basic level, it really is as simple as it sounds. A Prometheus server pulls telemetry data from somewhere else. The server uses jobs to scrape HTTP endpoints. The data from those endpoints is then pulled into the Prometheus database.

A push model collects data and then pushes it to a database. New Relic and other observability platforms use the push model. That means using instrumentation (the process of installing agents in your systems) to collect telemetry data. The agent, which is embedded in your software, collects the data and pushes it to another server.

Both push and pull methods work well for monitoring and observability, and OpenTelemetry (OTEL), which is open-source and vendor-neutral, supports both methods.

Targets and jobs

A Prometheus server pulls its data by scraping HTTP endpoints. The endpoints provide a continuous stream, allowing the Prometheus server to collect real-time data. These endpoints are also known as targets or instances, and a collection of instances that have the same purpose is known as a job. Jobs are commonly used for scalability and reliability.

Exporters and native Prometheus endpoints

It may seem like a major limitation that Prometheus can only pull data from HTTP endpoints. However, sometimes you need to monitor data from things that don't have HTTP endpoints such as proxies, hardware, and databases. In those cases, you generally need to use exporters. Exporters act as an intermediary between services and your Prometheus servers, collecting data from the services and making it available to Prometheus for scraping. Because Prometheus has a robust open-source community, there are a lot of third-party exporters that help you expose data to a Prometheus server.

Many open-source projects also have native Prometheus endpoints. When you use projects with native Prometheus endpoints, you don’t need an exporter. These open-source projects make it easier to integrate Prometheus without any additional work from engineers. Examples include:

  • ArgoCD, which helps manage GitOps workflows
  • CoreDNS, which is used to manage DNS for Kubernetes’ clusters
  • Istio, an open-source service mesh

Time series database and multidimensional model

The Prometheus server pulls data into a time series database. Time series data includes a timestamp and optional labels stored as key-value pairs, and it’s especially useful for tracking how data changes over time.

While other database systems such as Cassandra, PostgreSQL, and MongoDB support time series data, Prometheus is specifically designed for this kind of data. That makes Prometheus ideal for real-time monitoring.

Prometheus also uses a multidimensional model. That means you can query your data in many different ways, such as by job, endpoint, or time period. The ability to slice and analyze your data is very important for getting full observability into the systems you’re monitoring. For instance, narrowing down a query to a specific endpoint might correspond to a microservice or piece of hardware. On the other hand, querying data over a time period provides metrics for dashboards, including average response times and a high-level view of how a system is performing. Ultimately, a multidimensional model gives you tremendous flexibility in how you query and analyze your telemetry data.

Metrics for monitoring with Prometheus 

The four main metrics for monitoring with Prometheus are:

  1. Counter: This metric is used to count the number of times a particular event has occurred. For example, it can be used to measure the number of HTTP requests made to a server.
  2. Gauge: A gauge metric represents a numeric value that can increase or decrease over time. It is often used to measure system resources such as CPU usage, memory usage, or network traffic.
  3. Histogram: This metric is used to track the distribution of values over time. It divides data into buckets and counts the number of data points that fall into each bucket.
  4. Summary: Similar to histograms, summaries also track the distribution of values over time but with added functionalities such as calculating percentiles and quantiles.

These metrics are essential for monitoring systems and applications, providing valuable insights into their performance and helping identify any issues or bottlenecks that need attention. Using Prometheus, these metrics can be seamlessly collected, stored, and visualized in real-time, allowing for efficient troubleshooting and proactive maintenance.

Ideal use cases for monitoring with Prometheus

Monitoring with Prometheus is well-suited for a variety of ideal use cases in the realm of metric-based monitoring. It excels in scenarios where real-time visibility into system performance is critical, making it an excellent choice for:

Container orchestration 

Prometheus is highly popular for monitoring containerized environments, such as Kubernetes and Docker, providing insights into container resource utilization, orchestration, and scaling. This makes it a valuable tool for DevOps teams looking to optimize their containerized applications and infrastructure.

Microservices architectures 

In microservices-based applications, Prometheus can track the performance of individual services and help troubleshoot issues within the distributed system. The ability to extract data from various endpoints and combine metrics makes it a solid choice for monitoring intricate microservices architectures.

Cloud-native applications 

Prometheus is a natural fit for cloud-native applications, enabling teams to monitor dynamic, auto-scaling workloads in cloud environments. To ensure sustained performance, consider integrating it with Kubernetes and other orchestration tools. This allows for clear visibility into containerized applications operating on cloud platforms.

Infrastructure monitoring 

Prometheus is an ideal choice for monitoring server and network infrastructure, providing insights into resource consumption, network latency, and system health. With its flexible data model, Prometheus can collect metrics from a variety of sources and provide real-time monitoring for critical infrastructure components.

Custom application metrics 

Developers can instrument their applications with Prometheus client libraries to collect custom metrics, making it a versatile tool for tracking specific application-level KPIs. This allows for a more comprehensive view of the entire system, from infrastructure to application performance.

DevOps and Site Reliability Engineering (SRE) practices 

Prometheus aligns well with the principles of DevOps and SRE, offering real-time insights to ensure service reliability and meet SROs and SLAs. Its query language, PromQL, enables users to perform complex queries and visualizations, making it a valuable tool for troubleshooting and root cause analysis.

Open-source ecosystem 

Prometheus boasts a vibrant open-source ecosystem with exporters and integrations for a wide range of technologies, making it suitable for diverse tech stacks. This allows users to easily monitor and gather metrics from various components of their system, regardless of the technology used.

Prometheus is an excellent choice when you need granular control over the metric collection and a real-time view of your system's performance. Its flexibility and scalability make it a valuable tool for organizations of all sizes, especially those with dynamic and containerized infrastructures.

Using Prometheus to monitor Kubernetes

Prometheus integrates seamlessly with Kubernetes, making it a preferred choice for monitoring containerized environments. It works by pulling metrics from the Kubernetes API and allows users to create custom dashboards for visualizing these metrics. This enables DevOps teams to gain insight into the health and performance of their Kubernetes clusters, including pod and node level metrics.

Additionally, Prometheus also has built-in alerting capabilities that can be configured to send notifications when certain thresholds are exceeded. This helps teams proactively identify and resolve any issues before they impact end-users.

There are several advantages to installing Prometheus to monitor Kubernetes:

  • Native integration: Prometheus was built specifically for monitoring containerized environments, making it a natural fit for Kubernetes. This means easy setup and configuration, as well as optimized performance.
  • Versatile metric collection: Prometheus is designed to pull metrics from various sources, including the Kubernetes API, making it capable of capturing granular data that other monitoring tools may miss.
  • High scalability: Prometheus is highly scalable and can handle large amounts of data without compromising on performance. This makes it perfect for monitoring large Kubernetes clusters.
  • Real-time monitoring: Prometheus provides real-time monitoring capabilities that allow you to view the current state of your Kubernetes infrastructure at any given time. This enables you to quickly detect and troubleshoot any problems that may arise.
  • Alerting system: Prometheus comes with a built-in alerting system that allows you to set up alerts based on predefined thresholds or custom rules. This helps you proactively monitor your cluster's health and take necessary actions before any major issues occur.

Best practices for monitoring with Prometheus

Want to get the most out of your monitoring with Prometheus? Integrate some of these best practices.

  1. Use labels: Labels allow you to add custom key-value pairs to your metrics, making it easier to filter and aggregate data. This is especially useful for monitoring multiple instances of the same service or application.
  2. Set alerting rules: Prometheus allows you to set alerting rules based on your metrics. This can help you proactively identify and resolve any issues before they become critical.
  3. Use service discovery: Prometheus supports various service discovery mechanisms, such as DNS, Kubernetes, Consul, etc. This makes it easier to monitor dynamic environments where services may come and go.
  4. Monitor key metrics: It's important to only monitor relevant metrics that provide meaningful insights into the health of your system. Too many irrelevant metrics can lead to noise and make it difficult to identify real issues.
  5. Configure retention policies: Prometheus stores all data in a time-series database, which means old data will eventually be deleted unless specified otherwise. Make sure to configure appropriate retention policies based on your monitoring needs.
  6. Consider using a dashboard: While Prometheus provides a basic web UI for querying and visualizing data, it may not be sufficient for complex monitoring needs. Consider using a dashboard tool like Grafana for more advanced visualization capabilities.
  7. Use best practices for instrumentation: Proper instrumentation is key when using Prometheus for monitoring. Follow best practices such as including metric labels, using consistent naming conventions, and avoiding cardinality explosion.
  8. Regularly check configuration and alerts: As your system evolves, make sure to regularly review and update your configuration and alerting rules accordingly.
  9. Monitor Prometheus itself: Since Prometheus is responsible for collecting and storing metrics, it's important to monitor its own health as well. Keep an eye on its resource usage and make sure it's properly configured for optimal performance.
  10.  Ignore transient errors: Not all errors are critical or require immediate attention; some may be transient and resolve themselves. Make sure to configure your alerts to ignore these types of errors to avoid unnecessary notifications.

Pros and cons of monitoring with Prometheus

Monitoring with Prometheus offers a range of advantages and some challenges.

Advantages of Prometheus monitoring

On the positive side, Prometheus is renowned for its robust metric-based monitoring capabilities and its ability to scale with the growing needs of your infrastructure. It provides real-time insights into system performance, enabling proactive issue resolution and efficient resource utilization. The integration of Grafana complements Prometheus, offering customizable and intuitive dashboards for visualizing data. 

Disadvantages of Prometheus monitoring

It's important to recognize that Prometheus primarily focuses on metrics, and achieving comprehensive observability may require additional tools for logging and tracing. Moreover, implementing and maintaining Prometheus can be resource-intensive, demanding expertise and effort from your IT team. Overall, monitoring with Prometheus is a powerful solution, but its effectiveness hinges on thoughtful integration and

 

Prometheus vs Grafana monitoring? Which is best?

Comparing Prometheus and Grafana for monitoring purposes is not a matter of determining which is best, but rather understanding their complementary roles in a monitoring stack. 

As you (hopefully) have come to see, Prometheus excels in data collection, storage, and alerting based on time-series metrics. It's ideal for tracking system performance and resource utilization, offering powerful querying capabilities through PromQL. 

On the other hand, Grafana is a visualization and dashboarding tool designed to display data from various sources, including Prometheus. Grafana's strength lies in its ability to create interactive, user-friendly dashboards that allow users, including IT team leaders and CTOs, to gain actionable insights from Prometheus data. 

So, it's not a question of one being better than the other; they are typically used together. Prometheus collects the data, and Grafana helps visualize it, providing a comprehensive monitoring solution. The real value comes from integrating both to create a robust observability stack that empowers your organization to efficiently manage systems, reduce costs, enhance the customer experience, and meet SROs and SLAs effectively.

How to visualize and query your Prometheus metrics

Prometheus metrics provide valuable insight that can help you monitor your services. Prometheus comes with a flexible query language known as PromQL (short for Prometheus Query Language) that you can use to query your metrics. But where do you query your data, and how do you visualize it? There are four different options:

  1. Use the built-in expression browser. Prometheus provides an expression browser that you can use to see your data. It’s easy to use but very basic.
  2. Use an analytics and visualization platform. The most popular option is Grafana, which is open-source and provides powerful dashboard and visualization features.
  3. Use an observability platform. You can use an observability platform like New Relic, which provides dashboards and visualizations like Grafana. An observability platform also provides data storage and the ability to instrument services you aren’t monitoring with Prometheus, which platforms focused on just analytics and visualization don’t provide.
  4. Combine multiple platforms. You can also combine the benefits of both kinds of platforms as well.

Expression browser

Once you’ve set up at least a basic Prometheus server, you can use Prometheus’s built-in expression browser at localhost:9090/graph to view your metrics. If you haven’t set up any external targets yet, you can even have Prometheus scrape itself (generally at localhost:9090). While that’s not extremely useful in itself, it’s a good way to ensure that Prometheus is up and running.

While you can make PromQL queries directly to a Prometheus server, that’s not the most effective way to see your metrics in action. Prometheus doesn’t offer features like distributed tracing, real user monitoring, and application performance monitoring, and its dashboards aren’t very fancy. You’ll likely want to query and visualize your data somewhere else.

Grafana

We already mentioned, Grafana is open-source analytics and visualization software that you can use to query your Prometheus data. Grafana’s only job is to query and visualize, and it does both very well. Here’s an example of a Grafana dashboard visualizing Prometheus metrics.

If you want to build dashboards with your Prometheus data in Grafana, you just need to add Prometheus as a data source and then create a Prometheus graph.

The best of both worlds: An observability platform

While Prometheus and Grafana form a potent duo for metric-based monitoring, there are compelling reasons why someone might consider integrating an observability platform (like New Relic) into their monitoring strategy. The platform provides a more comprehensive solution by offering not just metrics but also logs, traces, and application performance monitoring. 

This holistic approach enables deeper insights into system behavior and the user experience. It simplifies root cause analysis by correlating metrics, logs, and traces, making it easier to pinpoint and resolve issues swiftly. 

Additionally, New Relic's AI-driven capabilities can proactively identify anomalies and performance bottlenecks, helping IT team leaders and CTOs ensure optimal system reliability, reduce operational costs, enhance the customer experience, and meet SROs and SLAs effectively. 

By combining Prometheus and Grafana with New Relic, organizations can achieve a more robust and proactive observability stack that covers all aspects of modern application monitoring.

An observability platform like New Relic gives you additional options for scaling and storing your data. New Relic provides up to 13 months of storage for your data and is compatible with PromQL.

To store Prometheus data in New Relic, you can set up the remote write integration or use the Prometheus OpenMetrics integration for Kubernetes or Docker. The steps for setting up the remote write integration are included in Harnessing the power of open source metrics in New Relic. You can also read the documentation to learn more about both integration options.