Your home almost certainly has smoke detectors to alert you in case of a fire. Your production applications should have alerts too, so you know when something goes wrong. However, unlike with smoke detectors, which come with predefined thresholds, it’s not so simple to set up accurate alert conditions in a production application. You need to figure out what to monitor and you also need to set up thresholds that are sensitive enough to catch anomalies but not so sensitive that you get distracting false alarms.
Good news: New Relic One has the data and tools to help. We have developed an Alert Condition Recommendation service that uses AI and machine learning (ML) to recommend specific metrics and signals to monitor for your specific entities. You can use the provided recommendations or modify them to fit your specific needs.
Using recommended conditions
You can easily add recommended alerts to APM entities that do not currently have alert coverage. Here’s how.
In Services - APM in the lefthand pane of New Relic Navigator, there is a high-density view of the health of your system. With the traffic-light visual, it’s easy to view which entities are healthy, which have violations, and which don’t have any alerts coverage. If an entity doesn’t have alert coverage, its hexagon is gray as shown in the image above. Recommended conditions help you automatically add alerts to entities that do not have alerts coverage.
- Click an entity that isn’t covered yet (a gray hexagon). A new option will appear on the right-hand side of the screen. Click Create alert condition as shown in the image below.
- You can choose several recommended conditions, as shown in the next image. The recommendations will depend on the quality of the tags associated with an entity. In New Relic One, each data source is an entity associated with a set of tags that provide metadata on important attributes (such as language and application name). You can add custom tags while other tags are added automatically. The more accurate and informative your tags are, the more precise the recommendations will be. The image below shows a few possible recommendations based on error percentage, Apdex, and response time.
How alert recommendations works
You might be wondering how New Relic One uses machine learning to generate recommended conditions. It’s all dependent on the tags. The machine-learning model is trained to consume information about all of the different entities created across New Relic One, along with their associated tags. The model divides the entities into clusters based on tag similarity and then finds the most common conditions that are defined on the data sources assigned to that cluster. Finally, the model recommends those conditions to you.
What if you are working with a new entity type and the model doesn’t have enough representative data to give you a recommendation? The model will then make recommendations based on the community-curated golden signals data set. This data set includes the most common conditions across all entities that belong to the same product, regardless of their specific tags.
Defining a threshold
The model doesn’t just recommend a metric to monitor—it provides recommended thresholds for when violations should be triggered. The condition recommendation model uses the recommended metric and applies dynamic baseline alerts to automatically set alert thresholds on key application performance metrics, using baselines modeled from historical data. Generally speaking, choosing the right threshold is important. It’s a tradeoff between too few and too many alerts. To diminish this concern, the model chooses a default threshold value based on the most common threshold for other entities with the same metric. However, you can choose at any time whether you want to keep the default recommended condition or set custom thresholds.
Add recommended conditions today
The alert recommendation service is designed to increase alert coverage in a fast and simple way. Maintaining a high level of alert coverage is crucial for avoiding blind spots and viewing your system’s health. Just as you should have working smoke detectors throughout your home, you should create alerts for a well-monitored system. To learn more about AI, take a look at our infographic on how AI is driving digital transformation.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.