For successful DevOps teams, alerting is an indispensable practice. You can’t possibly watch every service in your application every second of every day, yet you must be ready to take action immediately, should any service hit a snag. With New Relic Alerts, you can ensure that the right members of your team get the alerts they need as quickly as possible. If a monitored application, host, or other entity triggers a predefined alert condition, New Relic Alerts notifies you automatically.
At the same time, though, your team needs to minimize alert fatigue, which too often leads to mistakes and miscommunication in your incident-response process. With New Relic Alerts, you can easily manage alert policies and conditions that focus on key metrics, while filtering out expected behavior.
To help you get started, we created a list of suggestions, based on best practices from the field, for setting alert conditions for apps instrumented with New Relic Browser and New Relic APM, and for hosts monitored with New Relic Infrastructure. These suggestions serve as a great starting point for teams looking to get up and running with New Relic Alerts, or for teams looking to improve their workflows.
Note: This post covers alert conditions only. You should create alert policies based on how your organization is structured and on your incident response workflow. In some cases, you might have an alert policy that contains a condition that spans your entire New Relic account. In other circumstances, you’ll have to scope the condition to one or more apps or hosts. Similarly, if you’re a mature DevOps team, you may be grouping conditions for Browser, APM, and Infrastructure into the same policy, segmented by app or product. More traditionally structured teams may want to separate Infrastructure, APM, and Browser conditions into different policies.
If you’re not already familiar with New Relic Alerts, be sure to review the following before getting started:
- The Alerts documentation (including NRQL Alert Conditions, Baseline Alerts, and Outlier Detection)
- Getting Started with New Relic Alerts: Best Practices That Set You Up for Success
You should also be familiar with these two terms:
- Thresholds: These are alert condition settings that define what is considered a violation. Threshold values include the value that a data source must pass to trigger a violation and the time-related settings that define a violation; for example:
- An application’s average web response time is greater than 5 seconds for 15 minutes.
- An application’s error rate per minute hits 10% or higher at least once in an hour.
- An application’s AJAX response time deviates a certain amount from its expected baseline behavior.
For more information, see the New Relic documentation for setting thresholds for alert conditions.
- Baselines: You can use baseline alert conditions to define thresholds that adjust to the behavior of your data. Baselines are useful for creating alert conditions that:
- Notify you only when data is behaving abnormally.
- Dynamically adjust to changing data and trends, including daily or weekly trends.
- Work well out-of-the-box for new applications with as-yet-unknown behaviors.
For more information, see the New Relic documentation for creating baseline alert conditions.
Configuring alert conditions for Browser applications
Use the following examples as best practices for getting started with alert conditions for frontend applications you’re monitoring with New Relic Browser.
|Threshold condition on Pageview load time||Triggers an alert if page load times spike over the accepted threshold.|
|Baseline condition on Pageview throughput||Triggers an alert for sudden traffic drops or spikes only. (Baseline conditions reduce noise by accounting for expected traffic fluctuations.)|
|Baseline throughput conditions on AJAX request response time||Triggers alerts when AJAX requests that contact your backend services affect network latency.|
|Baseline throughput conditions on key Page actions (e.g., button clicks)||Triggers alerts for user behavior changes that don’t set off other alerts; for example, if a CSS change moves the "checkout" button off the viewable screen, it won't cause a spike in errors or response times but will affect customer experience.|
Configuring alert conditions for APM applications
Use the following examples as best practice for getting started with alert conditions for applications you’ve instrumented with New Relic APM.
[table id=37 /]
Configuring alert conditions for Infrastructure hosts
Use the following examples as best practice for getting started with alert conditions for hosts you’re monitoring with New Relic Infrastructure.
[table id=38 /]
Next steps: An alerting strategy, when effectively implemented, is one of the most important parts of any successful DevOps team. Check out Effective Alerting In Practice to learn:
- How shifts in modern technology stacks are leading to changes in alerting strategies
- Some alerting best practices for dynamic and scaled environments
- How to design and maintain an alerting system useful to your organization and teams