What is infrastructure monitoring? Best practices and use cases

Maintaining the performance, availability, and health of IT infrastructure is absolutely essential in the digital landscape today. That’s where infrastructure monitoring comes into play. At its core, it’s a system designed to provide real-time insights into your entire stack, ensuring optimal performance and pointing out potential issues before they escalate. From cloud services to on-premise servers, we’re going to dive deep into infrastructure monitoring, its importance, functionality, and impact on modern businesses. Let’s get started.

What is infrastructure monitoring?

Infrastructure monitoring is software that helps you monitor, quickly pinpoint, and fix issues across your entire infrastructure—including cloud-based services, on-premises hosts, orchestrated containers, and virtual machines. You can use infrastructure monitoring to get complete observability of complex and hybrid systems such as data centers and cloud-based services like Amazon Web Services (AWS) and Microsoft Azure. You can also use infrastructure monitoring to give you a high-level view of your system’s CPU, RAM, storage, and network traffic. With these insights, engineers can identify and troubleshoot performance problems within servers, containers, Kubernetes clusters, databases, on-host services, and more, whether on-prem or in the cloud. More specifically, infrastructure monitoring delivers in-depth performance metrics, trend values, and predictive insights that empower businesses to fine-tune their resources, improve uptime, and guarantee smooth service.

What is application infrastructure?

Application infrastructure is all of the assets that allow your systems and technology to function, including networks, hardware devices, and servers, whether they are based in the cloud or on-premises. Even if you’re using cloud solutions, that infrastructure is still based on a physical server somewhere. Application infrastructure is like a building’s foundation—you can’t see it, but it’s supporting the entirety of the building.

Ultimately, you can think of application infrastructure as consisting of three layers:

Hardware: The hardware includes all of the physical components that host your infrastructure. It includes the physical servers and the processors, network devices, and other physical devices that your system uses. This layer is ultimately built on microchips, including logic chips (CPUs) and memory chips (RAM). There are other types of chips, too, including neural processing units (NPUs), which are designed for machine learning applications.
Operating system (OS): The operating system provides an interface that connects the two layers of application infrastructure: the hardware and the application itself. The operating system executes applications while also using hardware resources such as CPUs and RAM. This also includes virtual machines, which have their own operating systems.
Application: This is the application itself, which could be a custom application you’ve developed or an application that uses a content management system like WordPress. The application layer also includes containers, which are used to run many applications.

If you’re using on-premises servers, you need to think about all of these layers, including making sure your hardware is functioning properly. With cloud-based infrastructure, you no longer have to worry about hardware in the same way, because your cloud provider maintains the infrastructure that hosts your software and applications. However, you do still need to think about provisioning resources—CPU, memory, storage, and networking. If your application is underprovisioned, it won’t function properly, and if it’s overprovisioned, then you’ll be wasting money on capacity you don’t need.

The next image shows a dashboard in New Relic Explorer with a high-level view of containers, services, hosts, and more.

How does infrastructure monitoring work?

Like other types of monitoring, infrastructure monitoring usually involves instrumenting a host by installing an agent. In the case of a monitoring solution like New Relic, you can begin the process of instrumentation with a simple guided installation. The agent automatically detects the application and log sources running in your environment and then recommends which ones you should instrument.

Once your hosts are fully instrumented, the agent will collect system data and send it to your infrastructure monitoring solution. In some cases, the agent will forward data and logs, particularly in the case of integrations.

The following chart shows how a New Relic on-host integration receives data from a service like Redis or Apache.

Like other types of application monitoring, infrastructure monitoring involves data from MELT— metrics, events, logs, and traces.

Logs, which are discrete actions that occur in an application, are the building blocks of metrics, events, and traces. They are made of single lines of text. For instance, a NGINX server will log all transactions that occur. Events can consist of many lines of log data. Along with traces, which connect events together, events provide more context on what is happening in your infrastructure.

Finally, metrics are aggregated data, giving you a high-level view of what’s happening in your application. An example is the average latency of a service over the last seven days. Metrics paint a bigger picture for you and are especially helpful for visualizing the overall health and performance of your infrastructure. It's also important to know how infrastructure disruption comes into play as proactive use of technology to drive business innovation is becoming prominent.

Why is infrastructure monitoring important?

Regardless of whether your applications use cloud-based or on-premises hosts (or both), infrastructure provides the foundation for your systems. Just as a train can only operate on tracks that are well-maintained, your system needs performant, reliable servers to ensure that services are delivered to your users. When infrastructure goes down, your application's performance suffers and you might even have outages. Because the stakes are so high, maintaining infrastructure can be both challenging and stressful. Even if your servers have nearly 100% uptime, the outages that do occur can be severe. Outages and downtime impact your authority and your users’ trust. At best, your users can’t access your services during an outage, and at worst, your users get frustrated and don’t return.

While you can monitor things like a system’s CPU and RAM on an operating system command line, you need a more comprehensive solution for monitoring application infrastructure, especially as your applications get larger and more complex. That’s where infrastructure monitoring tools come in. An infrastructure monitoring tool like New Relic allows you to visualize your entire system’s infrastructure from one place, including metrics, events, logs, and traces (MELT).

Infrastructure monitoring is just one part of a complete observability practice. Observability is about proactively collecting, visualizing, and alerting on data across all of your systems, including your infrastructure. Ideally, the platform you use should also monitor other aspects of your application, including application performance. That way, you can pinpoint and fix errors that arise in your infrastructure and elsewhere in your applications.

Benefits of infrastructure monitoring

Infrastructure monitoring is a critical component of IT management, ensuring that all hardware and software resources supporting an organization's IT environment function optimally. The benefits of implementing a robust infrastructure monitoring system are boundless, spanning operational efficiency, cost management, and strategic planning. Here are some of the key benefits:

Improved performance and reliability

By continuously monitoring the health and performance of servers, networks, and other infrastructure components, organizations can ensure that their IT systems are always running at peak efficiency. This minimizes downtime and ensures that applications and services are consistently available to users.

Cost savings

Infrastructure monitoring can lead to significant cost savings by optimizing resource utilization and reducing the need for emergency repairs or downtime. By identifying underutilized resources, organizations can make informed decisions about downsizing or reallocating resources, reducing waste and lowering operational costs.

Scalability

Scalability is a critical benefit of any infrastructure monitoring solution, especially in the context of growing organizations. As a business expands, it naturally experiences an increase in the complexity and volume of its IT infrastructure. This growth can include adding new servers, network devices, applications, and cloud services, each introducing new challenges in monitoring and management. A scalable infrastructure monitoring solution is designed to handle this increasing complexity and volume without degrading in performance or becoming inefficient in resource utilization.

Future-proof your IT infrastructure

Investing in a scalable infrastructure monitoring solution is essentially an investment in the future readiness of an organization's IT landscape. It prepares the business to embrace growth opportunities without being constrained by its monitoring capabilities. This future-proofing aspect ensures that the organization can remain agile and responsive to market demands and technological advancements.

What can you monitor with an infrastructure monitoring solution?

An infrastructure monitoring solution allows you to monitor all parts of your application infrastructure. In the case of New Relic, you get the following by default once your infrastructure is instrumented:

The current state of the server, including CPU, memory, disk, and network.
The usage and capacity of a storage device associated with the server.
The usage data for each network device associated with the server.
Data on all Docker containers and Kubernetes clusters, including metrics about CPU, memory, and networking.
Any changes in a system’s live state, which is stored in an InfrastructureEvent.

In addition to instrumentation, you can also use integrations to analyze, visualize, and alert on data from other parts of your infrastructure. New Relic has two main categories of infrastructure integrations:

Cloud integrations with services such as AWS, Azure, and Google Cloud Platform.
On-host integrations with services such as NGINX, MySQL, Redis, Kafka, and Apache.

An infrastructure monitoring platform should also provide enough flexibility for your own custom solutions. You can even get creative and monitor the infrastructure in your home environment, too. Here’s how an engineer used New Relic to monitor his home solar array.

The next image shows an example of monitoring Kubernetes clusters in New Relic Explorer.

Infrastructure monitoring metrics

Infrastructure monitoring metrics shed light on the performance and reliability of your system. Here are some commonly monitored metrics:

CPU metrics

CPU metrics stand as critical indicators of your system's health and efficiency. These metrics offer a window into the processing power of your system, revealing how well it manages the computational demands placed upon it. Below are a couple examples of CPU metrics that you could monitor.

CPU usage
CPU load average
CPU idle time
CPU wait time

Memory metrics

These metrics provide insights into how effectively your system utilizes its RAM, a crucial component in determining overall performance and responsiveness. Monitoring memory metrics ensures that your system maintains optimal performance levels and that applications have access to the memory resources they need to function efficiently. Understanding your system's memory usage patterns allows you to optimize performance and avoid issues that could lead to system slowdowns or instability. Let’s take a look at some examples of memory metrics.

Total memory
Used memory
Free memory
Memory page swaps

Disk metrics

These metrics shed light on how data is read from and written to disk, offering a clear view of the efficiency and health of your storage subsystem. By closely monitoring disk metrics, IT professionals can ensure that storage systems are operating smoothly, data is accessed efficiently, and there is ample capacity for future data storage needs. Take a look at commonly monitored disk metrics.

Disk read/write rates
Disk I/O
Disk utilization
Disk Capacity

Infrastructure health

Infrastructure health metrics provide a holistic view of the operational status and wellbeing of your entire IT ecosystem. By keeping a pulse on the health of your infrastructure, you can safeguard against potential failures, optimize system performance, and deliver a seamless experience to users. Infrastructure health metrics are the cornerstone of effective IT management, enabling organizations to maintain high service quality and operational excellence. Let’s take a look at which metrics you can monitor here.

Uptime/downtime
System availability
Hardware errors
Service/process status

This list is not exhaustive, and metrics can vary depending on the exact nature of the infrastructure. Still, these provide a foundational understanding of the range of metrics that are essential to monitoring your infrastructure.

Infrastructure monitoring use cases

Infrastructure monitoring serves as the eyes and ears of IT teams, offering insights that extend across various operational scenarios. These include the following:

Proactive problem detection: Before a minor glitch escalates into a major outage, infrastructure monitoring tools can alert administrators to take action.
Monitoring website uptime and performance: Monitoring tools can oversee web server health, database responsiveness, and even end-user experience in real-time.
Capacity planning: Analyze historical data to predict when infrastructure could potentially hit its limits.
Compliance: Continuous monitoring and logging can provide a detailed activity trail ensuring compliance standards are met.
Post-deployment feedback: For businesses adopting DevOps practices, monitoring provides feedback post-deployment, making it easier to spot any inefficiencies.

Like other types of application monitoring, infrastructure monitoring involves data from MELT— metrics, events, logs, and traces.

Infrastructure monitoring best practices

Take a holistic approach: Go beyond monitoring isolated components and consider the entire infrastructure ecosystem, including servers, databases, networking equipment, and applications.
Set up comprehensive alerts: With the right alert system in place, teams can shift from reactive to proactive. Strategically choose what you’d like to be alerted on.
Regularly review metrics and data being collected: Ensure that your tools and monitoring parameters remain relevant as your infrastructure evolves.
Test Test Test: Testing your infrastructure under high load conditions will reveal potential weak points and avoid real-world disasters.
Create infrastructure monitoring dashboards for your team: Infrastructure monitoring dashboards are a centralized hub for understanding the state of your current system. Use them to discuss, analyze, and collaborate on issues while collectively understanding infrastructure performance.

Choose the right infrastructure monitoring tool: Select a tool that aligns with your organization's needs, scale, and objectives. Don’t forget to consider user experience, integration capabilities, reliability, and cost-effectiveness.

What to look for in an infrastructure monitoring tool

When selecting an infrastructure monitoring tool, it's crucial to choose one that aligns with your current needs and has the flexibility to adapt to future changes and challenges. Here are the key features and capabilities of an infrastructure monitoring tool:

Comprehensive monitoring capabilities: Look for a tool that provides a holistic view of your infrastructure, including hardware, networks, servers, virtual environments, and applications. It should cover physical and virtual components across on-premises, cloud, and hybrid environments.

Supports a wide range of technologies: Ensure the tool supports a broad spectrum of technologies, platforms, and vendors, including newer and legacy systems, to avoid blind spots in your monitoring strategy.

Alerting: The tool should offer real-time monitoring capabilities with customizable alert thresholds, enabling you to respond to issues promptly before they impact users or business operations.

Historical Data Analysis: It's important for the tool to collect and store historical performance data, facilitating trend analysis and helping predict future infrastructure needs.

Transparent Pricing: Understand the pricing model and ensure it aligns with your budget and the scale of your operations. Consider both upfront costs and ongoing expenses.

Why monitor infrastructure with New Relic?

Dive into the future of infrastructure monitoring and observability with New Relic. Our platform not only empowers every engineer with over 30 capabilities across APM, Infrastructure, and more, but it also comes with a consumption-based pricing model that eliminates per-user license fees. This means you can manage your operational expenses more efficiently while giving every engineer the tools they need.

Cost-effective and transparent pricing

Consolidate your toolset and manage costs effectively as you scale. With New Relic's consumption-based pricing, you can spend just a third of what you would with Datadog. For a detailed comparison, check out our Datadog vs New Relic comparison blog.

Break down data silos for rapid remediation

Say goodbye to data silos. New Relic connects your APM and infrastructure data, offering unrestricted visibility across your entire stack. This holistic view enables teams to remediate performance issues up to 80% faster, no matter which team they're on.

Seamless collaboration across teams

Our single observability platform serves as a unified source of truth, allowing engineers from all teams to collaborate efficiently when issues arise. No additional tools are required, and there's no need to go through procurement to add users or SKUs.

Get started today

Sign up for New Relic's free tier and take a deeper dive into our comprehensive infrastructure monitoring documentation and best practices. Get hands-on experience through our "Identify Root Cause Issues in Your Infrastructure" lab.

Experience the New Relic difference today and transform the way you monitor, observe, and optimize your infrastructure.

By Franz Knupfer, Senior Manager, Technical Content Team

Franz Knupfer manages the technical content team at New Relic. Prior to joining New Relic, he was Director of Curriculum at Epicodus code school in Portland, Oregon.

The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.

780+ integrations to start monitoring your stack for free.

See All Integrations

In this article