From Our CEO: Save Money, Do More in Economic Downturn -
Read the Article

Alerts help identify and address issues before they impact your customers. But too many alerts can be overwhelming, desensitize the team dealing with them, and can even slow down the team’s incident response. An optimized alert strategy is a cornerstone of observability. It helps your team focus on the right things at the right time to increase uptime, availability, and performance. Alert quality management (AQM) helps you optimize your alert strategy to create fewer, more valuable alerts that pinpoint incidents and minimize alert fatigue.

1. Create alerts that matter to your business

With alerts in New Relic, you can set up robust and customizable alert policies for anything that you can instrument. But, that doesn’t mean you should create alert policies for anything and everything. Choose your alert conditions carefully to avoid overloading your team with noise. If your customers aren’t affected, do you really need to wake someone up with an alert?

Mature organizations tend to set fewer alerts. They focus alerts on a core set of metrics that tell them when their customers’ experience is affected. For example, teams often focus on service level management (SLM) metrics such as response time and error rate.     

2. Take advantage of automatic anomaly detection 

An anomaly is a behavioral trend that doesn't match the historical data for your system. Make sure you’re getting notified about important issues by taking advantage of anomaly detection in New Relic. Part of our AIOps functionality, New Relic anomaly detection automatically spots unusual changes across all your applications, services, and log data. These automated alerts are based on golden signals such as throughput, errors, and latency. 

 

In this video, we see how to track alert data in order to make better alerting decisions.

3. Configure notification workflows to notify the right people at the right time  

To streamline your workflow, get automatic alerts sent to Slack or other third-party services such as email, Atlassian Jira, ServiceNow, and PagerDuty when systems need attention. You can also use webhooks to send your data to any compatible third-party service, known as a destination in New Relic. Here’s a list of currently supported destination platforms in New Relic.

To avoid alert fatigue, consider how and when you want notifications to be sent. Does your team want to be notified every time something goes wrong? Should similar notifications be grouped together in one notification? Will everyone on the team receive notifications?  

Workflows in New Relic let you control when and where you want to receive notifications about problems in your system. For example, filter the issues you want to send to a destination, making sure that notifications are only delivered to specific people and roles based on the type of issue, the violation, affected services, and other variables.

4. Set up and track alert metrics

Alerts are a great way to quickly identify when something goes wrong, but too much of a good thing can lead to alert fatigue. Alerts might trigger too frequently, thresholds could be too sensitive, and some alerts might not be relevant. 

Making sure your alert quality remains high over time starts by tracking metrics. Look to metrics and KPIs that reveal the noisiest and least valuable alerts so you can improve their value or eliminate them. For example, use your AQM data to review metrics and make adjustments to your alert policies to reduce the volume to acceptable levels while still maintaining your goals for reliability and stability.