Applications and services will always be in one of two states: doing something, or waiting for something. This is similar to driving a car on a trip: you will either be moving, or stopped, as you will likely have to wait at traffic lights, or possibly be impacted by a traffic jam.

Ideally we want our applications to be doing things quickly and efficiently, keeping waiting to a minimum (in much the same way we want to drive to our destination without any delays). Whether we are troubleshooting errors, resolving performance issues, or undertaking performance optimisation, we need to determine if the application was doing something, or waiting for something. If our application was waiting, we need to determine what it was waiting for, and if it was waiting for another application or service we then need to investigate what that was doing or waiting for.

What are we ‘Doing’?

When an application is doing something, it is using CPU. This is the processing of the code and data that is within memory, performing functions such as sorting data, working through algorithms for cryptography, and parsing data to prepare the response that has been requested. 

This code is only being held back by how efficiently it uses the CPU and the speed of the CPU, i.e. the fewer instructions required to complete the process the faster it can complete, and the faster the CPU can execute each instruction the sooner the process finishes. With driving on our trip we could find an alternative route that would be shorter, or get a faster car.

Why are we ‘Waiting’?

Our applications are not always able to continue processing, and instead of using CPU they need to pause and wait. There are two different types of waits that our applications experience:

  • Direct waits: These are requests that our code makes of other systems in order to perform the next steps, e.g. reading a file from storage, making a HTTP request to another service for some data, waiting for user input, etc. The CPU is available for our application to use, but it needs data from elsewhere in order to continue. This would be an open road with no traffic, but we would need to fill the car with fuel to be able to continue.
  • Indirect waits: This is where our application is ready to use the CPU, but the CPU is not available as something else is already using it. This could be another process running on the same host that is using the CPU which leads to our application having to wait for the CPU to become available. It could also be something in the environment of the application, such as garbage collection within the runtime which means that our code needs to wait for that to complete until it can continue. This is the dreaded traffic jam that prevents us from being able to drive at the normal speed for the road.

A representation of the sequence of events within a simple transaction showing time spent Doing and Waiting

In the APM Summary view in New Relic, the instrumentation breaks out the direct waits into segments, such as MySQL, Web External, etc. If the time is shown as the language of the runtime, then this could be either the application doing something, or indirect waits, such as other processes blocking access to the CPU, or garbage collection occurring within the runtime.

The image above shows direct wait time within MySQL being responsible for the majority of the response time of this application, and a very small amount of time within the Java runtime.

You can use the language runtime view to see if CPU usage and garbage collection correlate with the increase in response time.

The JVM view allows us to see the amount of CPU being used by our application, and the amount of that CPU time that is spent in garbage collection instead of executing our code.

In my previous blog post (Everything, Everywhere, All At Once) I gave an example of an issue a company had where users were experiencing increased response times when trying to retrieve their documents from a web portal:

  • The code in the web browser processed the user input and then issued a request to the backend application and waited on it (Direct wait).
  • The backend application processed the request from the browser and then made a request to the database which responded very quickly with a file location (Direct wait).
  • The file location was processed by the backend application which then requested the file from the file server, for which it then had to wait a lot longer than normal (Direct wait).
  • The file server was having to wait for another process (antivirus) that was using CPU and disk I/O before it could return the requested file (Indirect wait).

Reducing waits

Observability allows you to determine what your applications are waiting on, and if you have enough visibility (you are monitoring everything, right?) it will allow you to determine where the time is being spent. From this you can determine the changes needed to reduce the time spent doing and waiting. In the example above, the performance issue was caused by indirect waits, and the resolution was to modify the configuration of the antivirus software to prevent it from blocking the file server.

A real world example

In the days when most applications were monolithic they were typically running queries against relational databases, with a lot of the waiting seen to be in the database. When looking into why the queries were taking so long, you would find that they were waiting for blocks to be read from the disk into the buffer cache memory of the database server. If too many blocks were required reading one block would remove another from the cache, which would then have to be read back in again if required later.

In these circumstances, some database monitoring tools would recommend adding more memory to the server and increasing the size of the buffer cache. This would allow more blocks to be held in memory and reduce the chance of needing to read them from disk again. Making this change would work, and the queries would complete in less time.

With the removal of the frequent waits to read blocks from disk, the queries no longer needed to pause. This would lead to the queries and the database server using more CPU as they processed the blocks in memory. The tools that recommended the extra memory would now recommend more CPUs being added to the server. In addition to the hardware costs of adding more CPUs (as well as the previous extra memory) this could also lead to increases in software licensing costs.

When I investigated these queries I often found that the SQL used was inefficient. Viewing the explain plans, and the data the query was actually returning as the final result set, it was clear that blocks for tables and indexes that were not needed were being read from disk or processed in the buffer cache unnecessarily.

I would optimize the query to read fewer blocks but still return the same result set. Where the buffer cache had not been increased in size there would be fewer blocks needing to be read from disk and stored in the cache, reducing the wait for those reads. Where the blocks were already in memory, this would then also lead to the query not using as much CPU as there would be fewer blocks to be processed. These changes reduced both the doing and the waiting, improving the performance of the application, and reducing costs for the business.

현재 이 페이지는 영어로만 제공됩니다.