Alerts help identify and address issues before they impact your customers. But too many alerts can be overwhelming, desensitize the team dealing with them, and can even slow down the team’s incident response. An optimized alert strategy is a cornerstone of observability. It helps your team focus on the right things at the right time to increase uptime, availability, and performance. Alert quality management (AQM) helps you optimize your alert strategy to create fewer, more valuable alerts that pinpoint incidents and minimize alert fatigue.
Before delving into optimizing your strategy, make sure you’re following best practices for alerts and notifications. Read an overview of alerts, issues, incidents, and anomalies.
1. Create alerts that matter to your business
With alerts in New Relic, you can set up robust and customizable alert policies for anything that you can instrument. But, that doesn’t mean you should create alert policies for anything and everything. Choose your alert conditions carefully to avoid overloading your team with noise. If your customers aren’t affected, do you really need to wake someone up with an alert?
Mature organizations tend to set fewer alerts. They focus alerts on a core set of metrics that tell them when their customers’ experience is affected. For example, teams often focus on service level management (SLM) metrics such as response time and error rate.
2. Take advantage of automatic anomaly detection
An anomaly is a behavioral trend that doesn't match the historical data for your system. Make sure you’re getting notified about important issues by taking advantage of anomaly detection in New Relic. Part of our AIOps functionality, New Relic anomaly detection automatically spots unusual changes across all your applications, services, and log data. These automated alerts are based on golden signals such as throughput, errors, and latency.
3. Configure notification workflows to notify the right people at the right time
To streamline your workflow, get automatic alerts sent to Slack or other third-party services such as email, Atlassian Jira, ServiceNow, and PagerDuty when systems need attention. You can also use webhooks to send your data to any compatible third-party service, known as a destination in New Relic. Here’s a list of currently supported destination platforms in New Relic.
To avoid alert fatigue, consider how and when you want notifications to be sent. Does your team want to be notified every time something goes wrong? Should similar notifications be grouped together in one notification? Will everyone on the team receive notifications?
Workflows in New Relic let you control when and where you want to receive notifications about problems in your system. For example, filter the issues you want to send to a destination, making sure that notifications are only delivered to specific people and roles based on the type of issue, the violation, affected services, and other variables.
4. Set up and track alert metrics
Alerts are a great way to quickly identify when something goes wrong, but too much of a good thing can lead to alert fatigue. Alerts might trigger too frequently, thresholds could be too sensitive, and some alerts might not be relevant.
Making sure your alert quality remains high over time starts by tracking metrics. Look to metrics and KPIs that reveal the noisiest and least valuable alerts so you can improve their value or eliminate them. For example, use your AQM data to review metrics and make adjustments to your alert policies to reduce the volume to acceptable levels while still maintaining your goals for reliability and stability.
다음 단계
- Read observability best practices around operational efficiency, development code quality, and digital customer experience.
- Explore our uptime, performance, and reliability implementation guide.
- Set up an alert webhook and dashboard via GitHub.
- See our alerts tutorial for step-by-step instructions on setting up your first alert.
이 블로그에 표현된 견해는 저자의 견해이며 반드시 New Relic의 견해를 반영하는 것은 아닙니다. 저자가 제공하는 모든 솔루션은 환경에 따라 다르며 New Relic에서 제공하는 상용 솔루션이나 지원의 일부가 아닙니다. 이 블로그 게시물과 관련된 질문 및 지원이 필요한 경우 Explorers Hub(discuss.newrelic.com)에서만 참여하십시오. 이 블로그에는 타사 사이트의 콘텐츠에 대한 링크가 포함될 수 있습니다. 이러한 링크를 제공함으로써 New Relic은 해당 사이트에서 사용할 수 있는 정보, 보기 또는 제품을 채택, 보증, 승인 또는 보증하지 않습니다.