Application performance monitoring (APM) allows you to track key metrics and events in an application, giving you insights on everything from page load speed and performance bottlenecks to service outages and errors. With a good APM solution, you can proactively fix issues in your application, benefiting both end users and the bottom line.
Getting started with APM can seem overwhelming at first, so it’s important to break the process down into manageable steps and best practices. In this article, you’ll learn the basic steps involved in implementing an APM solution, from making a plan and preparing your teams to instrument your services and setting up incident alerts. If you want to learn more about APM and why it’s important, see What is APM?
You’ll also learn how to get started with the New Relic APM tool in minutes. Your free account includes 100 GB/month of free data ingest, one free full-access user, and unlimited free basic users. This image shows an APM dashboard in New Relic that visualizes the most time-consuming transactions in an application:
Make a plan
First, you need to determine what you plan to monitor. Do you want to start small and monitor a single service? Or is your goal to monitor everything in your application? There are advantages to both approaches, but you should work towards comprehensive monitoring of all your services to ensure you have complete observability of your systems.
With highly distributed applications, you need to consider all of the services you’re using, ranging from cloud providers to on-premises servers to APIs and much more. Applications that are smaller or use a monolithic architecture will be simpler to monitor.
Starting small allows you to try out an APM solution with minimum risk and cost. For instance, a New Relic account allows you to try APM and other product features for free, and with 100 GB/month of data ingest, you’ll be able to analyze a meaningful amount of telemetry data. You get the opportunity to learn how to use an industry-standard APM tool and decide if it’s the right solution for you. This approach can also be effective if you need buy-in from a manager or executive for the widespread implementation of application performance monitoring. Finally, starting small with an APM solution can be an effective way to monitor and debug a problematic service without worrying about large-scale provisioning.
While starting small has minimal short-term risk, in the long term, you’ll want to prevent gaps in your monitoring coverage. If you don’t have complete coverage, you’ll deal with a longer mean time to detect (MTTD) and mean time to resolution (MTTR) of issues, taking valuable engineer resources and potentially sapping team morale. There are also greater risks to your bottom line and the potential for end users to stop using your application.
Audit your services
Regardless of whether you plan to start small or cover as many services as possible, the next step is to audit your services. That includes servers, infrastructure, cloud providers, applications, and more. Having a complete picture of your services can help you prioritize which services you want to monitor, or even better, help ensure that you have complete monitoring coverage for your applications.
Tools like New Relic can make this process easier by automatically discovering the applications, infrastructure, and log sources running in your environment. New Relic then makes recommendations on what should be instrumented. This makes it much easier to configure and deploy APM across your systems.
Instrument your application
After you’ve audited your services and know what you plan to monitor, it’s time to instrument your application. Instrumentation is the process of installing an agent in your application’s environment. An agent tracks the data flowing through your application and sends it back to the APM solution. This data is also known as telemetry.
You can instrument services in many different ways, depending on the APM solution you’re using and the services you’re instrumenting.
Some APM solutions provide guided installations so you can automatically instrument your application. You may also use custom instrumentation and SDKs to instrument services. Custom instrumentation can be used to monitor unsupported frameworks and also to add monitoring to transactions that your APM solution doesn’t track automatically.
Sometimes, you won’t be able to instrument a service. When that happens, you can use log forwarding to forward logs from that service to an APM solution.
Finally, you can always choose not to instrument services. This is sometimes a concern with services that handle sensitive data. However, your APM solution should meet the highest standards for security, privacy, and compliance. If you have concerns about whether your APM solution provides the compliance you need, it’s time to consider another solution. Solutions like New Relic prioritize compliance and privacy. New Relic complies with standards for data protection laws around the world, and you can even request HIPAA account enablement.
To instrument your application with New Relic, see Install APM.
Choose your metrics and customize your dashboards
When you’ve instrumented your application, telemetry data will start flowing into your APM solution. A good APM solution provides some metrics automatically, usually in the form of dashboards and visualizations, like response time, throughput, error rate, and CPU usage, among others. These metrics are a good starting point, but you’ll likely have other metrics you want to track based on your team’s KPIs and goals. In the case of New Relic, you can report custom telemetry data using API calls.
You can also customize dashboards so they display the most relevant metrics. You can choose which metrics are showing and create custom visualizations that help you better understand how your application is performing.
This next screen shot shows the number of people in various cities who are viewing New Relic within an organization. The custom visualization uses the New Relic CLI and Treemap from the Recharts library.
Grant access to APM
After you have an APM solution in place, you need to determine who should have access. That means you need to consider the capabilities and restrictions of the teams and users that are monitoring your services. You might want DevOps and site reliability engineers (SREs) to have access to some dashboards while development teams and software managers have access to others. Teams should have access to dashboards related to their specific work, but you also need to foster collaboration between teams and avoid siloing because application issues can affect multiple services and teams.
Some APM solutions also have user types with different pricing and access to features. In the case of New Relic , you can have as many basic users as you need for free, while full and core users come with an additional cost.
The next video shows how to grant access roles to users in New Relic.
Set up alerts
After you’ve identified your key metrics, you should set up alerts that notify your teams when issues arise or certain critical thresholds are reached. To set up alerts, you need to answer the following questions:
- What conditions should trigger alerts? For example, you might want to trigger alerts when the average page load time for a specific product page falls below a certain threshold.
- What should the threshold be for each alert? If you make the threshold too high, your teams won’t be alerted during critical incidents. On the other hand, if the threshold is too low, your teams will get false alarms. This can lead to alert fatigue and can also result in too many alerts about minor incidents burying the critical alerts you need to address quickly. In the case of New Relic, you can also use applied intelligence to create dynamic thresholds. For instance, you might want to alert on different thresholds for throughput depending on whether your application is at a peak usage time versus lower usage times such as the middle of the night.
- Which teams should receive the alert? Do you have one team managing and triaging all alerts? Or do you have different teams that should be notified depending on which service is affected? You need to choose which teams will be alerted for each alert policy you set up.
- What channels are you using for alerts? APM solutions like New Relic offer multiple ways to alert teams including Slack, PagerNow, and email.
The next video show how to navigate from an alert incident to a root cause in New Relic.
Lay the groundwork for collaboration between teams
When a critical incident occurs, you might have multiple teams scrambling to find the root cause. Is the issue coming from infrastructure, your code, a cloud provider, or something else? Identifying and fixing an issue often involves collaboration across teams, and if your teams are siloed, you’ll have slower MMTD and MTTR. Ideally, your APM solution should include features that allow your teams to better collaborate.
For example, errors inbox in New Relic allows you to communicate between teams directly in your APM solution. You can use Slack and Errors Inbox to quickly share and discuss important context on issues as they arise.
The next image shows errors grouped together in errors inbox.
Streamline your workflow and implement best practices
You can lower your MTTD and MTTR further by continuing to streamline your workflow and implementing APM best practices. This has multiple benefits beyond decreased MTTR—your teams will have more time to work on the projects they care about most, and removing friction from workflows can help reduce low morale and burnout. Here are some best practices:
- Standardize your naming conventions. Your APM solution should include descriptive names for your agented applications. Otherwise, you’ll have a difficult time identifying your monitored services, especially if your applications grow and you need to monitor more services.
- Tag your data. You can also tag your data to make it easier to filter and sort data at a high level. Use key-value pairs to add important metadata such as region and environment.
- Combine your APM solution with CI/CD. If you’re using a continuous integration/continuous delivery process, you can use an APM tool to monitor your deployment pipeline. Tools like New Relic’s CircleCI quickstart give you visibility into analytical data about your CI jobs.
- Document your monitoring workflow. Documentation is an important part of making sure all of the teams using APM understand how the product works. It’s also helpful for onboarding new engineers to your processes.
Reduce context switching between tools. For example, you might need to switch between your APM dashboards, your code editor, and multiple other tools for communication, version control, and documentation. This context switching can be time-consuming and lead to additional mental load. In the case of New Relic , you can use the New Relic CodeStream integration to easily switch between APM and your IDE. You can select an error in a New Relic dashboard and jump directly to the line that is causing the error in your code editor. With CodeStream, you can also plan, review, and debug code with collaborators directly in your IDE, ensuring that you have the additional reviewers you need to update or roll back your code as needed.
The next image shows how you can use New Relic CodeStream to communicate with others about your code.