Get instant Kubernetes observability—no agents required. Meet Pixie Auto-telemetry

Why You Should Rethink Your Microservices Monitoring Strategy

6 min read

Don’t take this wrong but moving to microservices will break your monitoring strategy and increase your mean time to resolution (MTTR). That’s because traditional monitoring alone can’t connect dependencies and performance across the entire customer journey through your distributed architecture.  

Microservices are a good thing, right?        

Pretty much everyone agrees that a distributed systems approach is ideal for many modern applications. That’s why a large majority of organizations—84% according to a survey from API platform company Kong—have adopted a microservices architecture.  

The Kong survey reports that organizations are adopting microservices for security enhancements, quicker development, faster integration of new technologies, improved infrastructure flexibility, and stronger collaboration across teams.   

In its "State of Microservices 2020" report, The Software House reports that organizations cite solving scalability and performance issues as the two most important reasons to choose a microservices architecture for applications. So really, what’s not to like?    

Here’s the trade-off 

The downside of all the microservices goodness is greater complexity for the software and operations teams that support them.  

Once the number of microservices becomes more than a handful, the ability to manage and maintain them becomes increasingly difficult. The Kong survey reported that organizations are running 184 microservices on average. Hence, the hit on your MTTR.  

As monolithic applications become distributed across microservices and software teams face pressure to rapidly and frequently ship new features and experiences, several things make it extremely challenging to understand performance and pinpoint issues and bottlenecks:  

  • Microservices are constantly changing, which introduces more risk 
  • The lifetime of a container in which a microservice is running may be measured in minutes or less
  • The complexity of scale increases with each new microservice or change that’s introduced
  • Teams might be responsible for services they didn’t develop

For all these reasons, it’s critical to find a way to cut through the complexity and ease the effort for your software teams.  

About that monitoring strategy  

Monitoring monolithic applications has traditionally focused on collecting data about the health and performance of the application to know when something goes wrong so that operators and engineers can respond quickly. Over the years, monitoring has evolved to deliver detailed metrics and alerting on performance and user experience across the entire technology stack, including the cloud.   

Monitoring is great, but modern application architectures require a new approach called observability. Observability lets you understand why something is wrong, compared to monitoring, which simply tells you when something is wrong.     

Observability is critical because it gives you the ability to see a connected view of all your performance data in one place, in real time. That way, you can pinpoint issues faster, understand what caused the issue, and ultimately deliver excellent customer experiences. Observability does this by letting you know how your distributed applications work together and how dependencies and individual services impact the health and performance of the entire application.    

The secret weapon for microservices success 

Observability delivers actionable insight by combining four essential types of observability data: metrics, events, logs, and traces. While all the telemetry data is important, the last data type—distributed traces—is essential for software teams using microservices.  

That’s because distributed tracing is the best way to quickly understand what happens to requests as they transit through the microservices that make up your distributed applications. Trace data helps you comprehend the flow of requests and pinpoint where in the system failures or performance issues are occurring and why.  

Relying solely on monitoring individual services can’t give you that insight. It may even lead you to believe that everything is fine with performance despite customers reporting latency issues.    

Instead, distributed tracing gives everybody involved—DevOps, operations, software, and site reliability engineers—quick answers to specific questions in distributed software environments, including:  

  • What is the health of the services that make up a distributed system?  
  • What is the root cause of errors and defects within a distributed system? 
  • Where are performance bottlenecks that could impact the customer experience?   
  • Which services have problematic or inefficient code that should be prioritized for optimization?  

Connecting the dots and reducing MTTR 

When you move to microservices, distributed tracing is the secret weapon that can help you keep your MTTR where it should be—low. Not only does tracing data help you identify and resolve issues faster, but it also lets you and your team measure overall system health and understand the effect of changes and prioritize high-value areas for improvement. You’ll be able to ask the right questions and get the right answers no matter how many microservices you deploy. 

To learn more about distributed tracing, including how it works and when you should use it, read “A Quick Introduction to Distributed Tracing: Gain Visibility and Reduce MTTR in Complex Application Environments.