Application performance monitoring (APM) allows you to track key metrics and events in your software , giving you insights on everything from page load speed and performance bottlenecks to service outages and errors. With a modern APM solution, you can proactively fix issues in your application by minimizing the analysis loop, allowing swift identification and resolution of issues while lowering security risks, benefiting both end users and your bottom line.
Making APM a daily practice can seem overwhelming at first, so it’s important to break the process down into manageable steps to focus on effective monitoring strategies in order to fully realize the potential of APM. In this article, you’ll learn the basic steps involved in implementing a modern APM solution, from making a plan and preparing your engineering teams to instrument your services to setting up incident alerts and create dashboards to monitor application performance. If you want to know more about APM and why you need it to monitor the performance of your applications, see What is APM?
You’ll also learn how to get started with the New Relic APM tool in minutes. Your free account includes 100 GB/month of free data ingest, one free full-access user, and unlimited free basic users. This image shows an APM dashboard in New Relic that visualizes the most time-consuming transactions in an application:
Step 1: Make a plan for your monitoring strategy
First, you need to determine what you plan to monitor. Do you want to start small and monitor a single service? Or is your goal to monitor everything in your application? There are advantages to both approaches, but you should work towards comprehensive monitoring of all your services to ensure you achieve complete observability of your systems.
With highly distributed applications, you need to consider all of the services you’re using, ranging from cloud providers to on-premises servers to APIs and much more. Applications that are smaller or use a monolithic architecture will be simpler to instrument and monitor.
Start small: monitoring a single service
Monitoring a single service allows you to try out an APM solution with minimum risk and cost. For instance, a New Relic account allows you to try APM and other product features for free, and with 100 GB/month of data ingest, you’ll be able to analyze a meaningful amount of telemetry data. You get the opportunity to learn how to use an industry-leading APM tool and decide where to go next in order to integrate data from a wide variety of telemetry sources for a more holistic view of your tech stack. This approach can also be effective if you need buy-in from a manager or executive for the widespread implementation of application performance monitoring. Finally, starting small with an APM solution can be an effective way to monitor and debug a problematic service without worrying about large-scale provisioning.
Step 2: Instrument your application
After you’ve audited your services and know what you plan to monitor, it’s time to instrument your application. Instrumentation is the process of installing an agent in your application’s environment. An agent tracks the data flowing through your application and sends it back to the APM solution. This data is also known as telemetry.
You can instrument services in many different ways, depending on the APM solution you’re using and the services you’re instrumenting.
Guided installation of an APM solution
Some APM solutions provide guided installations so you can automatically instrument your application. These installations typically offer step-by-step assistance in configuring and deploying APM agents within an application, making the process more accessible to both technical and non-technical stakeholders.
Guided installations also help teams define and set up Service Level Indicators (SLIs) and Service Level Objectives (SLOs), ensuring that APM aligns with the specific performance metrics that matter most to their business.
By following guided installation procedures, teams can track and visualize performance data, identify bottlenecks, and proactively address issues that could impact user experience.
You may also use custom instrumentation and SDKs to instrument services. Custom instrumentation can be used to monitor unsupported frameworks and also to add monitoring to transactions that your APM solution doesn’t track automatically.
Unlike out-of-the-box solutions, custom instrumentation offers the flexibility to define and capture metrics that are unique to an application's architecture and requirements. This process often involves adding code snippets or APM agent configurations to monitor critical business transactions, user interactions, or other application-specific events. Custom instrumentation is helpful when off-the-shelf solutions might not cover all the nuances of a complex or highly specialized application.
By investing in custom instrumentation, you’ll gain granular insights into the performance of specific features, detect issues early, and optimize the user experience based on their unique objectives and priorities. This level of flexibility empowers teams to make data-driven decisions and proactively improve their application's performance and reliability.
When neither one works, try log forwarding
Sometimes, you won’t be able to instrument a service. When that happens, you can use log forwarding to forward logs from that service to an APM solution.
By forwarding application logs to an APM system, your team can monitor specific application events and errors, even in distributed or containerized environments.
We recommend this method when integrating APM with legacy applications or systems that may not support native instrumentation. Through log forwarding, important context can be extracted from logs, allowing for performance analysis and troubleshooting.
Skipping instrumentation altogether
Finally, you can always choose not to instrument services. This is sometimes a concern with services that handle sensitive data. However, your APM solution should meet the highest standards for security, privacy, and compliance.
If you have concerns about whether your APM solution provides the compliance you need, it’s time to consider another solution. Solutions like New Relic prioritize compliance and privacy. New Relic complies with standards for data protection laws around the world, and you can even request HIPAA account enablement.
To instrument your application with New Relic, see Install APM.
Step 3: Audit your services
Regardless of whether you plan to start small or cover as many services as possible, the next step is to audit the health of your services. That includes deployment changes, key transactions, Service Level Objects (SLOs), infrastructure status, availability of cloud providers, applications response time, and more. Having a complete picture of your tech stack can help you prioritize which services you want to monitor, and help ensure that you have complete monitoring coverage for your applications.
Modern APM platforms like New Relic can make this process easier by automatically discovering the applications, infrastructure, and log sources running in your environment. New Relic also makes it easy to close instrumentation gaps by making recommendations on what should be instrumented. This makes it much easier to configure and deploy APM across all of your systems.
Step 4: Monitoring your entire application for daily insights
While starting small has minimal short-term risk, in the long term, you’ll want to close gaps in your monitoring coverage to understand the upstream and downstream impact of issues, discover emerging trends, and get the right insights to prevent potential issues. If you don’t have complete coverage, you’ll deal with a longer mean time to detect (MTTD) and mean time to resolution (MTTR) of issues, taking valuable engineer resources and potentially sapping team morale and slower innovation. There are also greater risks to your bottom line and the potential for end users to stop using your application. Identify key systems and prioritize efforts to address gaps in your instrumentation that move you closer to a complete view of your entire stack.
Step 5: Choose your metrics and customize your dashboards
When you’ve instrumented your application, telemetry data will start flowing into your APM solution. A good APM solution provides some metrics automatically, usually in the form of dashboards and visualizations, like response time, throughput, error rate, and CPU usage, among others. These metrics are a good starting point, but you’ll likely have other metrics you want to track based on your team’s KPIs and goals. In the case of New Relic, you can report custom telemetry data using API calls.
Customizing your APM dashboards
You can also customize dashboards so they display the most relevant metrics. You can choose which metrics are showing and create custom visualizations that help you better understand how your application is performing.
This next screenshot shows the number of people in various cities who are viewing New Relic within an organization. The custom visualization uses the New Relic CLI and Treemap from the Recharts library.
Step 6: Create a shared understanding of system health
After you have an APM solution in place, you need to consider the capabilities and restrictions of the teams and users that are monitoring your services in order to drive wide adoption to get the best value for your APM practice You might want DevOps and site reliability engineers (SREs) to have access to some dashboards while development teams and software managers have access to others. Teams should have access to dashboards related to their specific work, but you also need to foster collaboration between teams and avoid siloing because application issues can affect multiple services and teams.
Some APM solutions also have user types with different pricing and access to features. In the case of New Relic, you can have as many Core users as you need for free. However, there is an additional cost of $49/user on New Relic Standard, Pro, and Enterprise plans.
The next video shows how to grant access roles to users in New Relic.
Step 7: Set up alerts
After you’ve identified your key metrics, you should set up alerts that notify your teams when issues arise or certain critical thresholds are reached. To set up alerts, you need to answer the following questions:
- What conditions should trigger alerts? For example, you might want to trigger alerts when the average page load time for a specific product page falls below a certain threshold.
- What should the threshold be for each alert? If you make the threshold too high, your teams won’t be alerted during critical incidents. On the other hand, if the threshold is too low, your teams will get false alarms. This can lead to alert fatigue and can also result in too many alerts about minor incidents burying the critical alerts you need to address quickly. In the case of New Relic, you can also use applied intelligence to create dynamic thresholds. For instance, you might want to alert on different thresholds for throughput depending on whether your application is at a peak usage time versus lower usage times such as the middle of the night.
- Which teams should receive the alert? Do you have one team managing and triaging all alerts? Or do you have different teams that should be notified depending on which service is affected? You need to choose which teams will be alerted for each alert policy you set up.
- What channels are you using for alerts? APM solutions like New Relic offer multiple ways to alert teams including Slack, PagerNow, and email.
The next video shows how to navigate from an alert incident to a root cause in New Relic.
Step 8: Lay the groundwork for collaboration between teams
When a critical incident occurs, you might have multiple teams scrambling to find the root cause. Is the issue coming from infrastructure, your code, a cloud provider, or something else? Identifying and fixing an issue often involves collaboration across teams, and if your teams are siloed, you’ll have slower MMTD and MTTR. Ideally, your APM solution should include features that allow your teams to better collaborate.
For example, errors inbox in New Relic allows you to communicate between teams directly in your APM solution. You can use Slack and Errors Inbox to quickly share and discuss important context on issues as they arise.
The next image shows errors grouped together in the errors inbox.
Step 9: Streamline your workflow and implement best practices
You can lower your MTTD and MTTR further by continuing to streamline your workflow and implementing APM best practices. This has multiple benefits beyond decreased MTTR—your teams will have more time to work on the projects they care about most, and removing friction from workflows can help reduce low morale and burnout. Here are some best practices:
- Standardize your naming conventions. Your APM solution should include descriptive names for your agented applications. Otherwise, you’ll have a difficult time identifying your monitored services, especially if your applications grow and you need to monitor more services.
- Tag your data. You can also tag your data to make it easier to filter and sort data at a high level. Use key-value pairs to add important metadata such as region and environment.
- Combine your APM solution with CI/CD. If you’re using a continuous integration/continuous delivery process, you can use an APM tool to monitor your deployment pipeline. Tools like New Relic’s CircleCI quickstart give you visibility into analytical data about your CI jobs.
- Document your monitoring workflow. Documentation is an important part of making sure all of the teams using APM understand how the product works. It’s also helpful for onboarding new engineers to your processes.
- Reduce context switching between tools. For example, you might need to switch between your APM dashboards, your code editor, and multiple other tools for communication, version control, and documentation. This context switching can be time-consuming and lead to additional mental load. In the case of New Relic, you can use the New Relic CodeStream integration to easily switch between APM and your IDE. You can select an error in a New Relic dashboard and jump directly to the line that is causing the error in your code editor. With CodeStream, you can also plan, review, and debug code with collaborators directly in your IDE, ensuring that you have the additional reviewers you need to update or roll back your code as needed.
The next image shows how you can use New Relic CodeStream to communicate with others about your code.
本ブログに掲載されている見解は著者に所属するものであり、必ずしも New Relic 株式会社の公式見解であるわけではありません。また、本ブログには、外部サイトにアクセスするリンクが含まれる場合があります。それらリンク先の内容について、New Relic がいかなる保証も提供することはありません。