Alerts help identify and address issues before they impact your customers. But too many alerts can be overwhelming, desensitize the team dealing with them, and can even slow down the team’s incident response. An optimized alert strategy is a cornerstone of observability. It helps your team focus on the right things at the right time to increase uptime, availability, and performance. Alert quality management (AQM) helps you optimize your alert strategy to create fewer, more valuable alerts that pinpoint incidents and minimize alert fatigue.
Before delving into optimizing your strategy, make sure you’re following best practices for alerts and notifications. Read an overview of alerts, issues, incidents, and anomalies.
1. Create alerts that matter to your business
With alerts in New Relic, you can set up robust and customizable alert policies for anything that you can instrument. But, that doesn’t mean you should create alert policies for anything and everything. Choose your alert conditions carefully to avoid overloading your team with noise. If your customers aren’t affected, do you really need to wake someone up with an alert?
Mature organizations tend to set fewer alerts. They focus alerts on a core set of metrics that tell them when their customers’ experience is affected. For example, teams often focus on service level management (SLM) metrics such as response time and error rate.
2. Take advantage of automatic anomaly detection
An anomaly is a behavioral trend that doesn't match the historical data for your system. Make sure you’re getting notified about important issues by taking advantage of anomaly detection in New Relic. Part of our AIOps functionality, New Relic anomaly detection automatically spots unusual changes across all your applications, services, and log data. These automated alerts are based on golden signals such as throughput, errors, and latency.
3. Configure notification workflows to notify the right people at the right time
To streamline your workflow, get automatic alerts sent to Slack or other third-party services such as email, Atlassian Jira, ServiceNow, and PagerDuty when systems need attention. You can also use webhooks to send your data to any compatible third-party service, known as a destination in New Relic. Here’s a list of currently supported destination platforms in New Relic.
To avoid alert fatigue, consider how and when you want notifications to be sent. Does your team want to be notified every time something goes wrong? Should similar notifications be grouped together in one notification? Will everyone on the team receive notifications?
Workflows in New Relic let you control when and where you want to receive notifications about problems in your system. For example, filter the issues you want to send to a destination, making sure that notifications are only delivered to specific people and roles based on the type of issue, the violation, affected services, and other variables.
4. Set up and track alert metrics
Alerts are a great way to quickly identify when something goes wrong, but too much of a good thing can lead to alert fatigue. Alerts might trigger too frequently, thresholds could be too sensitive, and some alerts might not be relevant.
Making sure your alert quality remains high over time starts by tracking metrics. Look to metrics and KPIs that reveal the noisiest and least valuable alerts so you can improve their value or eliminate them. For example, use your AQM data to review metrics and make adjustments to your alert policies to reduce the volume to acceptable levels while still maintaining your goals for reliability and stability.
- Read observability best practices around operational efficiency, development code quality, and digital customer experience.
- Explore our uptime, performance, and reliability implementation guide.
- Set up an alert webhook and dashboard via GitHub.
- See our alerts tutorial for step-by-step instructions on setting up your first alert.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.