A version of this post previously ran on Diginomica.
The Twittersphere has been trying to define observability, but there’s a more important question all software teams and companies should be addressing: How do you know when you’ve achieved observability? And is observability even something you “achieve” and complete, like at a point in time with a climax and the heavens opening and cymbals clanging? Or is it something you practice every day?
To be clear, New Relic defines observability simply as “how well you can understand your complex systems.” In the days of mainframes and static operations, there were few known system failure modes, so monitoring tools were an effective approach to visualizing and troubleshooting system failures.
Fast-forward to today, and the complexity that we’ve created in the name of speed and scale forces you to adjust how you monitor these systems. It’s no longer enough to have a rear-view understanding of the “known unknowns” that traditional monitoring provides through metrics, dashboards, and alerts (i.e., alert me when my server CPU hits a specific threshold).
Because system change (versus stability) is the norm for distributed environments, you need to flexibly query the “unknown unknowns” of these dynamic systems. You need to be able to find answers to questions you couldn’t predict when your system was set up. In short, you need observability.
What you gain from observability
Observability lets you see how all of your applications and their underlying services and systems relate, so you can understand dependencies across organizational boundaries and troubleshoot and solve problems faster. Observability gives you context and helps you understand why an issue has occurred.
In a reality where your software’s health directly affects the health of your customers’ digital experiences and your business, observability gives you the confidence and visibility required to:
- Minimize the time to understand how systems are behaving
- Understand how system and code-level changes impact the business
- Reduce the time to surface, investigate, and resolve a problem’s root cause
In recent research, three-quarters of respondents (75%) said they agree or strongly agree that their “organization has a real-time view of how all systems are performing and interacting on a single platform (i.e., an observability platform).” Any reasonable person would interpret that to mean that 75% are practicing observability, right?
So why did other data indicate they’re not? For instance:
- Only 8% of total respondents rated as “very good” their ability to know why systems and software aren’t working. Knowing “why” vs. “what” went wrong is an observability hallmark.
- Three-quarters are unhappy with the time it takes to detect and fix software and systems issues (and point to an overly complex IT environment as the key factor).
- Just 4% of firms have integrated to a great extent their data on software and systems performance with data on the end-user browser and mobile performance. So they have blind spots—they’re unable to see the entire landscape or understand dependencies.
- The majority of firms use more than 10 tools to instrument their IT systems and, on average, have instrumented less than half of IT systems. Ten tools are nine too many screens to switch between—nine too many silos to manage.
Are you faking it?
Many companies claim to have observability but their practices show otherwise. No outcomes validate its presence. They’re faking it.
So what does true observability look like? And is it something you achieve or something you practice? It’s the latter because change is constant. Software updates are pushed into production multiple times a day (daily deployments number four, 50, sometimes thousands of times, depending on the company).
And gaining observability over all the interrelated and interdependent processes, systems, and applications requires ongoing vigilance.
Those respondents who performed strongly across all software excellence markers in the research offer a clue to what true observability looks like, especially when you compare their results with the bottom 25% who performed poorly.
|They consider observability core to software development, not something bolted on afterward||Leaders94%||Laggards56%|
|They learn about service disruptions from observability technology vs. from customers or employees||Leaders78%||Laggards12%|
|They integrate frontend web and browser performance data with backend software & systems data||Leaders100%||Laggards20%|
|They experience fewer than five major outages each month||Leaders83%||Laggards3%|
|When they do have an incident or outage, they resolve it within 30 minutes||Leaders75%||Laggards1%|
|They’ve instrumented significantly more software and systems and are collecting more data||Leaders58%||Laggards42%|
|They agree that they can “quickly understand the results of changes to software”||Leaders99%||Laggards38%|
So, there you have it—clear indicators that you are practicing observability. And if you’re like the leaders in the study, your business is benefiting. Because the leaders are outperforming other firms when it comes to software and reporting better performance across various metrics, including financial.
To read more about the research and its findings, see Deeper Than Digital: Why and how More Perfect Software drives business success.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.