State of observability

Monitoring is fragmented. Most organizations do not currently monitor their full tech stacks.

Current deployment

First, we look at the observability characteristics employed and capabilities deployed at the time of the survey, how those capabilities were deployed, strategy and organization, the benefits they experienced, the percent of their IT budgets allocated for observability, their pricing and billing preferences, and the challenges to prioritizing/achieving full-stack observability.

Mature observability practice characteristics

We asked survey respondents which mature observability practice characteristics they think are the most important and—in a separate question later in the survey—which they had employed. We found that:

  • Only 2% indicated that their organizations had all 15 observability characteristics employed
  • Just 1% indicated that they had none employed
  • More than half (53%) had three to five employed (52% had one to four employed, 48% had five or more employed, and 10% had 10 or more employed)

Based on our definition of a mature observability practice, only 5% of survey respondents had a mature observability practice.

Those with mature observability practices also tended to have more observability practice characteristics employed: 97% had nine or more and 40% had all 15.

Of those who had mature observability practices, 100% indicated that observability improves revenue retention by deepening their understanding of customer behaviors compared to the 34% whose practices were less mature.

There was a gap between what respondents thought were the most important characteristics of a mature observability practice and what characteristics they actually were employing. Download the full report to learn more.

Number of mature observability practice characteristics employed

  • Regional insight: North American organizations were the most likely to have a mature observability practice (7%), while European organizations were the least likely (4%). Download the full report to view more regional highlights.
  • Role insight: Unsurprisingly, executives felt that observability improves revenue retention by deepening their understanding of customer behaviors (top pick). In contrast, this did not strike a chord with others—revenue retention placed a distant tenth for non-executive managers and sixth for practitioners.

Capabilities deployed

Capabilities, not to be confused with characteristics, are specific components of observability. We asked survey respondents to tell us which of 17 different observability capabilities they deployed. Below we review the results by capability and by the number of capabilities.

By capability

The survey respondents indicated their organizations deploy observability capabilities by as much as 57% (network monitoring) and as little as 34% (Kubernetes monitoring). We found that:

  • Just over half said they deploy environment monitoring capabilities and log management
  • DEM and services-monitoring capabilities were in the 40% range
  • Monitoring capabilities for emerging technologies were among the least deployed, with each hovering in the low-30% range

Download the full report to view highlights for each capability.

By number of capabilities

When we looked at how many capabilities the survey respondents said their organizations deploy, we found:

  • Only 3% indicated that their organizations have all 17 observability capabilities deployed
  • Just (3%) indicated that they had none deployed
  • Most (61%) had four to nine deployed (9% had one to three, 80% had five or more, and 28% had 10 or more)

These results show that most organizations do not currently monitor their full tech stacks. However, this is changing. View future deployment plans.

Deployed capabilities

Number of deployed capabilities

  • Regional insight: In general, Asia-Pacific organizations had the most capabilities deployed, while European organizations had the least. Download the full report to view more regional highlights.
  • Role insight: Executives were more likely to state that all observability capabilities are deployed (6%, compared to 2% for non-executive managers and 3% for practitioners), suggesting a gap in knowledge of what is deployed versus what will be deployed.

Full-stack observability prevalence

Based on our definition of full-stack observability, only 27% of survey respondents’ organizations have achieved it. And an even smaller percentage—3%—said that their organization has already prioritized/achieved full-stack observability.

These results indicate that large parts of organizations’ tech stacks are not being monitored or fully observed today, creating ample opportunities to make rapid progress in achieving full-stack observability.

Notably, 84% of organizations that have achieved full-stack observability allocated at least 5% of their total IT budget for observability tools.

Percentage of organizations that do and don’t have full-stack observability

  • Regional insight: Asia-Pacific organizations were the most likely to have achieved full-stack observability (33%), while European organizations were the least likely (21%). Download the full report to view more regional highlights.
  • Organization size insight: Of those organizations that had achieved full-stack observability, only 7% were small, while 52% were midsize, and 42% were large.

On any given day, 33–35% of our infrastructure and our platform compute and storage cycles are in multiple cloud locations. Most of that is unmonitored because our application space is decentralized, which is very common in academia. From a security risk management angle, it represents one of our biggest areas of security risk.

Number of monitoring tools

When asked about the number of tools they use to monitor the health of their systems, survey respondents overwhelmingly reported using more than one.

  • Most (82%) used four or more tools (94% used two or more)
  • One in five used seven tools, the most common number reported
  • Only 2% used just one tool to satisfy their observability needs

So, the state of observability today is most often multi-tool—and therefore fragmented—and likely inherently complex to handle. In fact, 25% of survey respondents noted that too many monitoring tools are a primary challenge that prevents them from prioritizing/achieving full-stack observability.

  • Organization size insight: Small organizations were the most likely to use only one monitoring tool (6%, compared to 2% for midsize and only 1% for large).

Number of tools used for observability capabilities

Observability monitoring and information security can draw closer together and leverage common platforms—there’s a lot of overlap. To the extent that you can use one tool to provide everything, that’s going to become more important.

Unified telemetry

When survey respondents were asked about how unified or siloed their organizations’ telemetry data is: 

  • Almost half (49%) said more unified (they unify telemetry data in one place), but only 7% said entirely unified
  • A third said more siloed (they silo telemetry data in discrete data stores), including 8% who said entirely siloed
  • Less than one-fifth (17%) said roughly equally unified and siloed

Interestingly, among the 51% who had more siloed data, 47% indicated that they actually strongly prefer a single, consolidated platform. And 77% of those who had entirely siloed data indicated that they prefer a single, consolidated platform.

Given the use of disparate monitoring tools and open-source solutions—only 2% of respondents used a single tool for observability—these findings are not surprising. Because siloed and fragmented data make for a painful user experience (expensive, lack of context, slow to troubleshoot), the more silos an organization has, the more preference to consolidate. Perhaps the respondents who seemingly feel the most pain from juggling data from different silos long for more simplicity in their observability solutions.

Unified versus siloed telemetry data

  • Regional insight: Organizations in Europe and North America were more likely to have unified telemetry data (51% and 56% respectively) and less likely to have siloed data (31% and 25% respectively). While organizations in Asia Pacific were the least likely to have unified telemetry data (38%) and the most likely to have siloed data (45%)—in fact, 15% were entirely siloed. Download the full report to view more regional highlights.
  • Organization size insight: Large organizations were slightly more likely to have more unified telemetry data (54%) and less likely to have siloed data (30%). Conversely, small organizations were less likely to have unified telemetry data (40%) and more likely to have siloed data (45%).

Unified visualization and dashboarding

For the visualization and dashboarding of that data, it’s a similar story. When survey respondents were asked how unified or disparate the visualization/dashboarding of their organizations’ telemetry data is:

  • More than two-thirds (68%) said it is more unified (telemetry data is visualized in a single dashboarding solution)
  • Almost a quarter (23%) said it is more disparate (multiple visualization solutions are used without cross-communication)
  • Less than one-tenth (8%) said it is neither unified nor disparate

Despite the reality of multiple tools to capture observability needs, it appears that respondents mostly managed to unify and visualize the data from their many tools. These findings seem to point to the desire for a unified observability experience.

Unified versus disparate visualization/dashboarding of telemetry data

  • Regional insight: Respondents surveyed in North America were more likely to have telemetry data visualized in a single dashboarding solution (74%) and less likely to have multiple visualization solutions without cross-communication (18%). While those surveyed in Asia Pacific were the least likely to have telemetry data visualized in a single dashboarding solution (61%) and the most likely to have multiple visualization solutions without cross-communication (33%)—in fact, 11% were entirely disparate. Download the full report to view more regional highlights.
  • Role insight: ITDMs were more likely to think that their visualization and dashboarding of telemetry data was more unified (71%), while practitioners were less likely (66%).
  • Organization size insight: Respondents from midsize organizations were the most likely to think that their visualization and dashboarding of telemetry data is more unified (70%), while those from small organizations were the least likely (62%).

Strategy and organization

Next, we looked at the observability strategies and team organization of the survey respondents, including their preference for a single platform or multiple point solutions, how they learn about software and system interruptions, trends driving the need for observability, advocacy for observability by role, the perceived purpose of observability, in which stages of the software development lifecycle they are using observability, and what teams are responsible for observability when.

Single platform or multiple point solutions

For the past decade, observability vendors have created purpose-built tools to help specialty engineering teams monitor their part of the stack. For example, New Relic created and led the APM category for application developers. Others chose different specialty roles and created best-in-class tools that served those teams well. However, this practice increased complexity as each tool brings a disparate experience and data store.

To achieve the full power of observability, organizations require a unified, underlying data store for all types and sources of telemetry (like New Relic now offers with its observability platform). With a unified experience, engineering teams can see all their entities and their dependencies in one place and collaborate more closely together while eliminating team, tool, and data silos.

But what are the strategic preferences of organizations when it comes to the number of tools they use for observability? Do they prefer a single, consolidated observability platform or multiple point/best-in-class solutions that are often cobbled together or used only for specific monitoring capabilities? We found that:

  • Almost half (47%) preferred a single, consolidated observability platform
  • A third preferred multiple point solutions
  • One in five had no preference

These results imply that many organizations desire a single-tool, all-in-one approach to their observability needs.

However, even though more respondents claimed that they prefer a single, consolidated platform, 94% used more than two monitoring tools—just 2% used a single tool for observability and only 7% indicated their organizations’ telemetry data is entirely unified.

What’s more, when asked what the primary challenges are that prevent prioritizing/achieving full-stack observability, a quarter said it was that they have too many monitoring tools.

Taken together, we see the current state of observability as multi-tool and fragmented, yet we see an increasing strategic preference for a single, consolidated observability platform with the knowledge that tool fragmentation is a significant hindrance to full-stack observability.

Preference for a single, consolidated platform versus multiple point solutions

  • Regional insight: In the Asia-Pacific region, 55% of respondents claimed that they prefer a single, consolidated platform. Download the full report to view more regional highlights.
  • Role insight: About a third (32%) of non-executive managers said that they strongly prefer a single, consolidated platform, compared to 17% of executives and practitioners.
  • Industry insight: In the financial/insurance and industrials/materials/manufacturing industries, more than half of respondents claimed that they prefer a single, consolidated platform (60% and 54% respectively). Download the full report to view more industry highlights.

Our stated strategy and the vision that we have is to use fewer providers and cover more territory. Where appropriate, we try to use one vendor to do many things as opposed to many vendors to do only one thing each.

Detection of software and system interruptions

How does observability impact performance in the organization? The survey results showed that:

  • Almost half (46%) primarily learn about interruptions through multiple monitoring tools
  • Only about one in five (21%) primarily learn about interruptions through one tool

So, about two-thirds of respondents (67%) indicated that they primarily learn about interruptions through one or more monitoring tools. That a higher percentage primarily learn about interruptions through multiple monitoring tools makes sense given what we know about the large number of monitoring tools respondents deployed for observability purposes.

But what is remarkable is how manual the process remains for so many organizations. We found that:

  • Almost a quarter (22%) primarily learn about interruptions through manual checks/tests that are performed on systems at specific times
  • About one in 10 (11%) primarily learn about interruptions through incident tickets and complaints from customers and employees.

In sum, a third of respondents still primarily learned about interruptions and outages through manual checks/tests or through incident tickets and complaints.

What‘s more, there is a clear connection between how respondents primarily learned about interruptions and how unified their telemetry data was. Generally, when telemetry data was more unified, notice of interruptions came through one observability tool.

How respondents detected software and system interruptions

  • Organization size insight: Large organizations were more likely to detect interruptions through multiple tools, while small organizations were more likely to use manual checks/tests and multiple tools.

Trends driving observability

So, what technology strategies and trends are driving the need for observability?

Modern applications typically run in the cloud and depend on hundreds of components, each introducing additional monitoring challenges and security risks. With cloud adoption, cloud-native application architectures, and cybersecurity threats on the rise, it’s not surprising that an increased focus on security, governance, risk, and compliance was the most frequently cited strategy or trend driving the need for observability at the organizations surveyed (49%).

Development of cloud-native front-end application architectures, an increased focus on customer experience management, and migration to a multi-cloud back-end environment were all mentioned more than 40% of the time as well.

While not the top responses, 39% of respondents said that they are adopting open-source technologies such as OpenTelemetry, 36% are adopting serverless computing, and 36% are containerizing applications and workloads—all trends where observability requires a unified approach.

Technology strategies and trends driving the need for observability

  • Regional insight: The development of cloud-native application architectures (front-end) was the number one strategy driving the need for observability in the Asia-Pacific region (53%). Download the full report to view more regional highlights.
  • Role insight: The development of cloud-native application architectures (front-end) was the number one driver for executives (50%) and the second driver for non-executive managers and practitioners (47% total).
  • Industry insight: Respondents from the energy/utilities and nonprofit industries were more likely to say that migration to a multi-cloud environment is their top driver, while those in government were more likely to say development of cloud-native applications architectures, and those in healthcare/pharma were more likely to say adoption of serverless computing. Those from the services/consulting industry were equally torn between security, cloud-native, and multi-cloud for their top driver. Download the full report to view more industry highlights.

As we’ve gone to the cloud, there’s a lot more we have to monitor and a lot of additional needs. Observability has come to encompass more than just standard premise monitoring and become the way we look at all the different aspects and view all of them.

Advocates of observability

Who were the biggest advocates of observability in respondent organizations? We asked survey participants to rate the varying levels of advocacy among several roles in their organizations.

In general, survey respondents indicated that all roles advocate for observability more than resist it. At first glance, there seems to be little pattern in the varying levels of advocacy. But it’s intriguing that survey-takers thought the less technical-focused C-suite executives have the highest levels of strong observability advocacy (39%), even higher generally than more technical-focused C-suite execs (31%). Other notable findings include:

  • There was low resistance to observability overall (less than 10%)
  • Those with full-stack observability or a mature observability practice by our definitions were notably more likely to have strong observability advocates than those without those two things
  • Respondents whose organizations saw observability completely as an enabler of core business goals indicated notably higher levels of strong advocacy for observability in nearly every role

These findings tend to support future observability deployment plans and budget plans as organizations are more likely to expand deployment and increase budgets if individual roles and teams see the value in and advocate for observability.

Levels of observability advocacy by role

Purpose of observability

We were curious about the perceived purpose of observability—do practitioners and ITDMs see observability as more of a key enabler for achieving core business goals or more for incident response/insurance? We found that:

  • Half thought observability is more of a key enabler for achieving core business goals
  • More than a quarter (28%) indicated that business goals and incident response equally enable observability in their organizations
  • Just over a fifth (21%) said observability is more for incident response/insurance

The fact that more than three-quarters of respondents (78%) saw observability as a key enabler for achieving core business goals implies that observability has become a board-level proofpoint.

Observability as a key enabler for achieving core business goals or for incident response/insurance

  • Regional insight: Respondents surveyed in the Asia-Pacific region were the most likely to view observability as more of a key enabler for achieving core business goals (58%), compared to 48% surveyed in North America and Europe. Conversely, respondents surveyed in Asia Pacific were the least likely to say that observability is more for incident response/insurance (15%), compared to 22% surveyed in North America and 24% surveyed in Europe. Download the full report to view more regional highlights.
  • Role insight: Unsurprisingly, executives were the most likely to view observability as more of a key enabler for achieving core business goals (56%), compared to 51% for non-executive managers and 48% for practitioners. Conversely, executives were the least likely to say that observability is more for incident response/insurance (16%), compared to 17% for non-executive managers and 24% for practitioners.
  • Organization size insight: Respondents from midsize organizations were the most likely to view observability as more of a key enabler for achieving core business goals (54%), while small organizations were the least likely (42%). And midsize organizations were the least likely to say that observability is more for incident response/insurance (19%), while large organizations were the most likely (25%).
  • Industry insight: The most likely to view observability as more of a key enabler for achieving core business goals were respondents from the retail/consumer (57%), financial/insurance (54%), and IT/telco (52%) industries. Conversely, the most likely to view observability as more for incident response/insurance were those from the energy/utilities (33%), services/consulting (28%), and nonprofit/unspecified (26%) industries. Download the full report to view more industry highlights.

SDLC stages using observability

Originally, monitoring was focused on the operate (or run) stage of the software development lifecycle (SDLC). But there is potential for that data to span the entire SDLC, helping teams be more data-driven as they plan, build, deploy, operate, and then iterate. Historically, many engineers who work earlier in the SDLC (plan, build, and deploy stages) were not aware that observability could help them do their jobs better.

Even so, we found that most respondents used some level of data-driven observability insights in all stages of the SDLC. However, only about a third of respondents used full observability in each stage:

  • Plan: 34% used full observability in the plan stage
  • Build: 30% used full observability in the build stage
  • Deploy: 34% used full observability in the deploy stage
  • Operate: 37% used full observability in the operate stage

Those with a mature observability practice (by our definition) were notably more likely to use full observability in all SDLC stages (53% for plan, 46% for build, 51% for deploy, and 54% for operate) than those without.

Those who had full-stack observability (by our definition) were also more likely to use full observability in all SDLC stages (38% for plan, 36% for build, 42% for deploy, and 46% for operate) than those who didn’t have it.

Developers often spend too much time debugging as opposed to shipping new features. It’s crucial for them to have an all-in-one observability platform that can streamline the entire SDLC.

Software development lifecycle for DevSecOps

Degree of observability used for each stage of the SDLC

  • Regional insight: Respondents surveyed in Asia Pacific were the most likely to use extensive or full observability in the plan (72%) and build (75%) stages, while those surveyed in North America were the most likely to use it in the deploy (75%) and operate (81%) stages. Those surveyed in Europe were the least likely to use extensive or full observability across the board with 63% for plan, 65% for build, 67% for deploy, and 69% for operate. Download the full report to view more regional highlights.
  • Industry insight: Overall, respondents in the financial/insurance and retail/consumer industries were the most likely to use extensive or full observability in all stages of the SDLC, including 83% in the operate stage, followed by those from IT/telco. Government respondents were the least likely to use extensive or full observability in all stages of the SDLC, followed by those from education. Download the full report to view more industry highlights.

Teams responsible for observability

In asking survey respondents which teams are primarily responsible for the implementation, maintenance, and usage of observability at their organizations, we found that:

  • IT operations teams were the most likely to be responsible for observability followed by network operations and DevOps teams
  • Application development and SRE teams were more likely to be responsible for the implementation of observability than the maintenance or usage of it
  • SecOps and DevSecOps teams were more likely to be responsible for the usage of observability than the implementation or maintenance of it

While most organizations had a dedicated IT operations team, it seems that they were less likely to have dedicated DevSecOps and SecOps teams that are primarily responsible for observability. This could indicate that security teams are possibly using separate security-related observability tools. A comprehensive, all-in-one observability approach supports the cultural shift that brings development, security, and operations teams together more seamlessly (DevSecOps). As organizations prioritize security, it will be interesting to see how this dynamic changes in the next few years.

Teams primarily responsible for the implementation, maintenance, and usage of observability

  • Regional insight: In North America, the observability involvement of IT operations teams was slightly higher than in other regions. While in Asia Pacific, the observability involvement of DevSecOps teams was slightly higher than in other regions. Download the full report to view more regional highlights.

Pricing, billing, and spending

We wanted to know about budget allocation as well as pricing and billing preferences when it comes to observability tools.

Budget allocation

When we asked survey-takers what percentage of their IT budget they were currently allocating for observability tools, we found that:

  • Most (69%) allocate more than 5% but less than 15%, with 14% allocating more than 15%
  • Just 3% allocate more than 20%
  • Only 16% allocate less than 5%

So, like last year, most organizations allocated less than 20% of IT budgets for observability tools.

Organizations with more mature observability practices (by our definition) tended to spend more on observability: more than a quarter (29%) of those that were mature allocated more than 15%, compared to the 14% that were less mature.

And organizations that had the most capabilities deployed tended to have the biggest observability budgets: almost three-quarters (73%) of those that allocated more than 20%, and more than half (57%) of those that allocated more than 15% had nine or more capabilities deployed.

Learn about their budget plans for next year.

Percentage of IT budget allocated for observability tools

  • Regional insight: Respondents surveyed in Asia Pacific were more likely to say they allocate more than 10% of their IT budgets for observability tools (50%), while those surveyed in Europe and North America were more likely to say they allocate less than 10% (60% and 54% respectively). And 21% of those surveyed in Asia Pacific said they allocate 15% or more, compared to 14% of those surveyed in North America and 11% in Europe. Download the full report to view more regional highlights.
  • Industry insight: Respondents from the energy/utilities and industrials/materials/manufacturing industries indicated that they allocate more than 10% but less than 15% of their IT budgets for observability tools as their top choice, while those from all other industries were more likely to select more than 5% but less than 10%. This could be because the energy/utilities and industrial/materials/manufacturing industries tend to be more sensitive to downtime, are more regulated, and may be more likely to use technologies like AI, ML, and IoT. Download the full report to view more industry highlights.

Pricing features

In addition to pricing model preferences, we looked at what pricing features were the most important to respondents and their organizations for their observability tools/platform. We found that:

  • Budget-friendly pricing ranked the most important overall; transparent pricing, a single license metric across all telemetry, and having a low-cost entry point were also frequently cited
  • Hybrid pricing models ranked higher than user-, host-, and agent-based-only pricing models
  • The single-SKU and bundle-of-SKUs approaches were neck and neck

The two hybrid pricing models (hybrid user + data-ingestion pricing and hybrid host + data-ingestion pricing) are the dominant pricing models in the market, so it follows that these options ranked highly in the survey. Respondents clearly favored usage-based pricing for data ingest.

Pricing feature preferences

  • Regional insight: Respondents surveyed in North America were the most likely to select budget-friendly pricing; they also favored a single license metric across all telemetry more than those in other regions did. While those surveyed in Asia Pacific were more likely to favor hybrid pricing models the most. Download the full report to view more regional highlights.
  • Role insight: Interestingly, practitioners were the most likely to select budget-friendly pricing as their top answer. They were also more likely to select a low-cost entry point. Executives, however, pegged budget-friendly pricing as the sixth most important, and they instead favored hybrid pricing models the most. Non-executive managers were the least likely to favor a bundle-of-SKUs approach, a single license metric across all telemetry, and a host- or agent-based-only pricing model.
  • Organization size insight: Small organizations were slightly more likely to favor a single-SKU approach (34%, compared to 29% for midsize and large) and a low-cost entry point (31%, compared to 26% for midsize and large). Large organizations were slightly more likely to value transparent pricing (33%, compared to 31% for small and 29% for midsize).

Billing features

We also wanted to understand what billing models and features were the most important to survey respondents. We found that:

  • The flexibility to scale usage based on consumption with no monthly minimum was the top choice overall
  • Usage-based billing models (whether usage is based on monthly provisioned use or active use) were preferred over subscription-based
  • The ability to ingest any telemetry data type and autoscale with no penalties as well as predictable spending also ranked high

In supplemental interviews with ITDMs by ETR during the execution of this study, interviewees most commonly desired predictability in pricing and billing. No matter the technical design of a pricing or billing model, ITDMs desired the ability to predict accurately what the bill would be in advance.

Billing feature preferences

  • Regional insight: Respondents surveyed in Europe were the least likely to care about the type of billing model and flexibility to scale usage based on consumption with no monthly minimum. Those surveyed in Asia Pacific were the least likely to care about the ability to autoscale without penalty. While those surveyed in North America were more likely to care about the ability to pay with a credit card and no premium overage fees or shelfware. Download the full report to view more regional highlights.
  • Role insight: Practitioners cared the most about predictable spending. While non-executive managers were the least likely to prefer a subscription-based billing model, the ability to pay as they go, and no shelfware.
  • Organization size insight: Small organizations were more likely to prefer a usage-based billing model where usage is based on monthly provisioned use instead of active provisioned use. Midsize organizations were slightly more likely to care about the ability to ingest any telemetry data type without penalties (34%, compared to 32% for small and 33% for large) and to pay as they go (34%, compared to 32% for small and 29% for large). Large organizations were the most likely to care about the flexibility to scale usage based on consumption with no monthly minimum (42%, compared to 35% for midsize and 33% for small) and predictable spending (36%, compared to 32% for midsize and 29% for small).

I often need to establish my budget 16–18 months ahead of where I'm actually going to spend it. And to the extent that I can predict accurately, that's obviously the preferred way to go.

Benefits of observability

Now for the good stuff. We were curious about the overall primary benefits of observability as well as what use cases it’s used for, whether it helps improve service-level metrics (like outage frequency, MTTD, and MTTR), and how it most helps improve the lives of software engineers and developers.

Benefits

We found that respondents saw clear benefits as a result of their current observability deployments. Observability continues to deliver a clear, positive business impact, such as improved:

  • Uptime, performance, and reliability
  • Operational efficiency
  • Customer experience
  • Innovation
  • Business and/or revenue growth

These results indicate how observability can transform an organization’s business, technology, and/or revenue.

Primary benefits enabled by observability deployment

  • Regional insight: Respondents surveyed in Europe were the least likely to select improved uptime and reliability (32%) and proactive detection of issues before they impact customers (28%) as benefits of observability. While those surveyed in Asia Pacific were the most likely to note that proactive detection of issues before they impact customers (40%) is a benefit of observability, they were least likely to note consolidation of IT tooling (25%) and decreased cloud hosting costs (22%). Download the full report to view more regional highlights.
  • Organization size insight: Respondents from small organizations were the most likely to note that consolidation of IT tooling (38%) and business/revenue growth (32%) are benefits of observability, and the least likely to note proactive detection of issues before they impact customers (26%). Those from midsize organizations were the least likely to note improved uptime and reliability (33%) and business/revenue growth (23%). While those from large organizations were the most likely to note increased operational efficiency (39%), proactive detection of issues before they impact customers (38%), and improved customer experience (36%).
  • Role insight: Executives were the least likely to say their organizations increased operational efficiency (31%), while practitioners were the most likely (36%). Non-executive managers were the most likely to say their organizations benefited from proactive detection of issues before they impact customers (40%) and the least likely to cite business/revenue growth (19%).
  • Industry insight: Respondents from the energy/utilities industry were the most likely to note that developers having high confidence in the resilience of their apps/systems (51%) is a benefit of observability. Government respondents were the most likely to note a reduction in employee burnout (55%). Healthcare/pharma respondents were the most likely to note an improved customer experience (43%) and business/revenue growth (39%). IT/telco respondents were the most likely to note the ability to redirect resources to value-added tasks and/or accelerated innovation (35%). Nonprofit/unspecified respondents were most likely to note an increased operational efficiency (52%). And services/consulting respondents were the most likely to note improved uptime and reliability (49%). Download the full report to view more industry highlights.

Use cases

We investigated the technical use cases/purposes where observability was most important to respondents. Results showed a wide range of use cases with the most common being to:

  1. Optimize cloud resource usage and spend (31%)
  2. Support digital transformation efforts to improve and gain a competitive advantage from the digital customer's experience (31%)
  3. Manage containerized and serverless environments (29%)
  4. Increase speed to market for new products/services (29%)
  5. Support an organizational IT move to DevOps (29%)

Observability use cases/purposes

  • Regional insight: The top choice for respondents surveyed in Asia Pacific was to support digital transformation efforts, followed by increase speed to market for new products/services, minimize the risk of migrating core legacy applications to the cloud, and manage containerized and serverless environments. The second choice for those surveyed in North America was to support an organizational IT move to DevOps. European selections aligned closely with average results for all regions. Download the full report to view more regional highlights.
  • Role insight: Executives were the most likely to say their organizations use observability to support an organizational IT move to DevOps (34%), connect IoT device monitoring into the full observability of their estate (30%), and support cost-cutting efforts (29%). Non-executive managers were the most likely to say their organizations use it to troubleshoot distributed systems (31%) and better deliver against SLOs/SLAs (27%). Practitioners were the most likely to say that their organizations use it to minimize the risk of migrating core legacy applications to the cloud (30%).
  • Organization size insight: Respondents from small organizations were the most likely to say they use observability to support cost-cutting efforts (34%), while those from large organizations were the least likely (26%). Respondents from large organizations were the most likely to use it to manage containerized and serverless environments (35%), support an organizational IT move to DevOps (33%), and automate software-release cycles (32%).
  • Industry insight: Education respondents were the most likely to say they use observability to optimize cloud resource usage and spend (63%) and support digital transformation efforts (47%). Energy/utilities respondents were the most likely to use it to support cost-cutting efforts (40%), increase speed to market for new products/services (40%), minimize the risk of migrating core legacy applications to the cloud (38%), expand observability because of a recent M&A event (36%), and manage containerized and serverless environments (34%; tied with services/consulting). Government respondents were the most likely to use it to troubleshoot distributed systems (50%). Industrial/materials/manufacturing respondents were the most likely to use it to support an organizational IT move to DevOps (35%). Services/consulting respondents were the most likely to use it to automate software release cycles (40%). Download the full report to view more industry highlights.

Outage frequency

So, how often are outages occurring that affect customers and end users? Survey results showed that:

  • Outages happen fairly frequently (52–72% noted once per week or more)
  • Low-business-impact outages happen the most frequently (72% noted once per week or more)
  • High-business-impact outages happen the least frequently (two to three times per month or fewer) but more than half (52%) still experience them once per week or more

Given the relative frequency of outages, the findings of how often manual effort and incident tickets are the sources of knowledge for these outages are noteworthy.

There’s a clear connection between those who have achieved full-stack observability and a lower frequency of outages. Respondents from organizations that have achieved full-stack observability (based on our definition) were also more likely to experience the least frequent outages (two to three times per month or fewer) and less likely to experience the most frequent outages (once per week or more).

And respondents who indicated that they have already prioritized/achieved full-stack observability were also more likely to experience the least frequent outages (two to three times per month or fewer) and less likely to experience the most frequent outages (once per week or more).

The data supports a strong correlation between full-stack observability and less frequent outages.

Download the full report to view additional details.

Outage frequency by high, medium, and low business impact

  • Regional insight: Respondents surveyed in North America were more likely to say that their organizations experience outages less frequently (two to three times per month or fewer), while those surveyed in Europe were more likely to say more frequently (once per week or more). Download the full report to view more regional highlights.
  • Role insight: Executives were more likely to say that their organizations experience outages less frequently (two to three times per month or fewer), while practitioners were more likely to say more frequently (once per week or more).
  • Organization size insight: Small organizations were more likely to experience low-business-impact outages once per week or more and medium- and high-business-impact outages two to three times per month or fewer. Midsize and large organizations were more likely to experience outages once per week or more.

MTTD

When we looked at the mean time to detect an outage, a common service-level metric used in security and IT incident management, the survey results showed:

  • The majority had an MTTD of more than five but less than 60 minutes
  • In general, it took more time to detect high- compared to low-business-impact outages
  • More than one in five (22%) took more than an hour to detect high-business-impact outages

Another interesting finding is that respondents from organizations that have achieved full-stack observability (based on our definition) and those who indicated that they have already prioritized/achieved full-stack observability were also more likely to experience the fastest MTTD (less than five minutes).

Respondents who indicated that they have already prioritized/achieved full-stack observability were more likely to have the fastest MTTD (less than 30 minutes) and less likely to have the slowest MTTD (more than 30 minutes).

The data supports a strong correlation between full-stack observability and a faster MTTD.

Download the full report to view additional details.

MTTD by high-, medium-, and low-business-impact outages

  • Regional insight: For high-business-impact outages, respondents surveyed in Europe were the least likely to have an MTTD of more than 60 minutes (19%, compared to 26% for those surveyed in Asia Pacific and 24% for those surveyed in North America). For medium-business-impact outages, Asia-Pacific organizations were more likely to have a faster MTTD (55%), while North American organizations were the least likely (53%). For low-business-impact outages, North American organizations were more likely to have a faster MTTD (62%) and European organizations were the least likely (57%). Download the full report to view more regional highlights.
  • Role insight: For low-business-impact outages, executives and practitioners were more optimistic about the MTTD time than non-executive managers. And for high-business-impact outages, ITDMs were more optimistic than practitioners.
  • Organization size insight: Small organizations were the most likely to detect high-business-impact outages in 30 minutes or less (48%, compared to 44% for midsize and 45% for large).

MTTR

We see similar patterns with MTTR, another common service-level metric used in security and IT incident management:

  • The majority had an MTTR of more than five but less than 60 minutes
  • In general, it took more time to resolve high- and medium- compared to low-business-impact outages
  • Almost a third (29%) took more than an hour to resolve high-business-impact outages

Respondents from organizations that have achieved full-stack observability (based on our definition) and those who indicated that they have already prioritized/achieved full-stack observability were also more likely to experience the fastest MTTR (less than five minutes).

Respondents who indicated that they have already prioritized/achieved full-stack observability were also more likely to have the fastest MTTR (less than 30 minutes), while those who had not prioritized/achieved full-stack observability had the slowest MTTR (more than 30 minutes).

The data supports a strong correlation between full-stack observability and a faster MTTR. Clearly, there’s a link between full-stack observability and the best outage frequency, MTTD, and MTTR performance indicators.

Download the full report to view additional details. Learn how they plan to reduce MTTR.

MTTR by high-, medium-, and low-business-impact outages

  • Regional insight: For high- and medium-business-impact outages, respondents surveyed in Europe were the most likely to have an MTTR of less than five minutes and the least likely to have an MTTR of more than an hour. Download the full report to view more regional highlights.
  • Role insight: For high- and medium-business-impact outages, non-executive managers were more likely to select an MTTR of more than an hour than executives and practitioners. And for low-business-impact outages, executives and practitioners were more optimistic about the MTTR time than non-executive managers.
  • Organization size insight: Large organizations were the most likely to take more than an hour to detect high- and medium-business-impact outages.

Predictors of MTTD/MTTR by capability

In addition, the data predicts a positive association between certain capabilities—including AIOps, distributed tracing, security monitoring, custom dashboards, synthetic monitoring, APM, database monitoring, alerts, and infrastructure monitoring—and a faster MTTD/MTTR (less than 30 minutes). Of those capabilities, AIOps is statistically significant within 10% significance levels.

Capabilities that predict an MTTD/MTTR of less than 30 minutes

Day-to-day life for developers and engineers

We asked the practitioners themselves, as well as ITDMs, how observability helps developers and engineers the most. We found:

  • At least 30% said it increases productivity and enables cross-team collaboration and less guesswork when managing complicated and distributed tech stacks
  • About three out of 10 said it makes developer/engineer lives easier and improves work/life balance and skillset/hireability
  • Roughly a quarter felt that it helps confirm/overcome assumptions, overcome opinions, and fill in gaps

These findings suggest that developers and engineers seek solutions that reduce toil, increase cross-team collaboration, and help them use their time proactively. A data-driven approach to engineering and an all-in-one observability platform make developer and engineer lives better and easier through:

  • Less guesswork managing complicated and distributed tech stacks involving containers, multi-clouds, and multiple tools
  • A better signal-to-noise ratio in the understanding of why incidents took place, not just what happened
  • Resolving issues faster and freeing up time to focus on the higher-priority, business-impacting, and creative coding they love

How observability helps improve the lives of developers/engineers the most

  • Regional insight: Respondents surveyed in Asia Pacific were the most likely to select increases productivity (41%, compared to 30% for those surveyed in Europe and 35% for those surveyed in North America). Those surveyed in Europe were the most likely to select making developer/engineer lives easier as their top choice (31%). While those surveyed in North America were the most likely to say it increases innovation (32%, compared to 24% for those surveyed in Asia Pacific and 25% for those surveyed in North America) and frees up time to work on other projects (30%, compared to 27% for those surveyed in Asia Pacific and 24% for those surveyed in Europe). Download the full report to view more regional highlights.
  • Role insight: Executives were more likely to think that observability improves work/life balance for practitioners (32%) than the practitioners themselves (28%). While non-executive managers were the most likely to feel that it frees up time for practitioners to work on other projects (34%) and the least likely to feel that it makes their jobs easier (22%), increases innovation (23%), or enables cross-team collaboration (26%).
  • Industry insight: Respondents from the education industry were more likely to think that observability enables less guesswork (43%), makes jobs easier (40%), confirms assumptions (37%), and improves skillset/hireability (37%) than those from most other industries. Those from the energy/utilities industry were also more likely to say it enables less guesswork and improves work/life balance (42% for both). Government respondents also ranked enables less guesswork higher (42%). And services/consulting respondents were more partial to improving skillset/hireability (36%) and enabling time prioritization (38%) and cross-team collaboration (43%) than most others. Download the full report to view more industry highlights.

Challenges preventing full-stack observability

So, if full-stack observability provides so many benefits, what’s preventing organizations from prioritizing/achieving it? We found that:

  • A lack of understanding of the benefits and feelings that current IT performance is adequate enough were the most frequently cited challenges (28% for both)
  • More than a quarter of respondents (27%) noted they do not have the budget
  • A quarter said they have too many monitoring tools
  • Just under a quarter struggled with a lengthy sales cycle, un-instrumented systems, a lack of strategy, and a disparate tech stack
  • Nearly one in five (19%) indicated they do not have the skills

In addition, of those who said that their IT performance is adequate (no need to improve current performance), 51% said that they allocate more than 20% of their budget to observability tools.

Taken together, these results suggest a number of different hurdles and pain points when it comes to pursuing full-stack observability. To achieve full-stack observability, technology professionals should have a clearer rationale for its benefits, and large organizations, in particular, should weave this rationale into a clear business strategy.

Primary challenges preventing the prioritization or achievement of full-stack observability

  • Regional insight: In the Asia-Pacific region where 55% claimed that they prefer a single, consolidated platform, their top barriers were that not enough of their systems are instrumented and too many monitoring tools (both 28%). While respondents surveyed in Europe were slightly more likely to cite a lack of budget (29%) and less likely to cite a lack of strategy (21%) and that not enough of their systems are instrumented (22%). And those surveyed in North America were more likely to cite a lack of understanding of benefits (32%) and adequate IT performance (31%). Download the full report to view more regional highlights.
  • Role insight: Many practitioners and executives felt that a lack of understanding of benefits was a primary challenge to adopting full-stack observability (29%), but non-executive managers were more likely to feel that their current IT performance is adequate (31%). Practitioners were the least likely to say that their IT performance is adequate (26%) or that they don’t have the skills (17%), while more likely to cite a disparate tech stack (26%) and siloed data (23%).
  • Organization size insight: Small organizations said the biggest challenge was that it was too expensive (33%), followed by a lack of budget (29%). Midsize organizations struggled the most with a lack of understanding of the benefits (30%). Large organizations were the most likely to say that their IT performance is adequate (32%), while the largest—those with 5,000 or more employees—were the most likely to note a lack of strategy (34%).