Despite the hype, many DevOps and SRE teams have struggled to make the promise of AIOps a reality. Steep learning curves, long implementation and training times, prohibitive pricing, and a lack of confidence in artificial intelligence (AI) and machine learning (ML) have stood in the way. When we dig into the challenges faced by pager-carrying, on-call engineers, they consistently mention three things that stand in the way of keeping services up and running:
- Discovering emerging problems and unknowns when you're dependent on alerts or static dashboards to know what's changing is hard.
- It’s not easy to triage incidents and know how to respond when a cascading failure happens with alerts firing across multiple tools.
- It’s difficult and time-consuming to diagnose the root causes of problems when you have to manually sift through dashboards to understand why the problem occurred and what's impacted.
In short, engineers can no longer afford expensive war rooms and guesswork to troubleshoot incidents, or worse, find out about them from customers. That all changes today with the launch of our next generation of New Relic Applied Intelligence, which makes it easier than ever to:
- Detect unusual changes instantly: You’ll be able to automatically spot anomalies across your applications, services, and logs to prevent potential problems before they impact customers.
- Cut down on alert noise: You can reduce the flood of noisy alerts and prioritize issues more easily by grouping alerts and events from any source into a single correlated, actionable issue.
- Get to root cause quickly: You’ll be able to eliminate guesswork and solve problems faster with automatic insights into the probable root cause of every issue.
- Respond to incidents faster. You can integrate New Relic Applied Intelligence with ITSM tools and eliminate the toil of managing incidents across tools by keeping everything in sync.
Let’s dig into what’s new and now available in Applied Intelligence.
Detect unusual changes instantly
Continuous, automatic anomaly detection now at no additional cost
Applied Intelligence automatically spots anomalies based on golden signals like throughput, errors, and latency across all applications and services—and it’s now automatically enabled for all your instrumented apps and services without configuration and at no additional cost. When it detects anomalies, it will immediately notify you via Slack and other collaboration tools, and give you a real-time feed of every anomaly and in-depth analytics to troubleshoot faster and prevent potential problems from impacting customers.
Applied Intelligence now includes a new capability that uses machine learning to detect patterns and surface outliers in your log data, helping you reduce troubleshooting time. You can explore millions of log messages with a single click and reduce manual querying because Log patterns automatically clusters your log data to help you quickly find anomalous patterns and problematic needles in the haystack. Log patterns is currently in public beta—if you’d like to turn it on for your New Relic account, reach out to your customer success manager.
A new integrated landing page highlights insights and analytics about how your alert configurations are performing.
Alerts recurring muting rules
Define recurring schedules for muting rules for New Relic Alerts to get more control over alerts suppression during scheduled maintenance windows and periods of planned downtime. Set recurring daily, weekly, or monthly schedules for muting rules via the New Relic UI or API.
Cut down on alert noise
You can now correlate related alerts and events based on external relationship data from CMDBs and New Relic entity relationships. In addition to correlating alerts using time-based clustering and context from alert messages, you can now ingest topology data from your relationship datastores (CMDBs) to enable more accurate correlation of alerts that are firing from connected services. This gives you better context into incidents that occur and how they impact your broader environment so that you can prioritize problems more accurately and efficiently.
Anomalies in correlated issues
You can correlate proactively detected anomalies along with alerts and events from any source together to paint a complete picture of the issue at hand, reducing time to understand and act.
Create correlation decisions faster with correlation assistant
Did something not correlate that should have? Do you have an idea of how to correlate but aren’t sure where to start? With a new correlation assistant feature, you can simply begin selecting incidents that should be correlated and let New Relic analyze them to show you what they have in common. This gives you more control over how you reduce alert noise.
In addition, Applied Intelligence is smart enough to simulate your configuration and shows you in real-time how correlating incidents can reduce alert noise and increase context in the future.
Get to the root cause quickly
See the probable root cause(s) of every issue
Applied Intelligence gives you automatic insights into the probable root cause of every issue. You can quickly see why each open issue occurred, which deployments contributed, and relevant error logs and attributes to help you investigate the problem faster than ever. Applied Intelligence scans the distribution of every attribute within event data ingested, and surfaces possible causes by finding significant changes in the distribution. For example, for every transaction event generated, you can scan to see if a single user starts to take up an unusual share of the requests sent to your app.
In addition, root cause analysis automatically classifies issues based on golden signals like errors, traffic, latency, and saturation so you can quickly orient yourself to why the problem occurred.
Understand the impact and scope of every issue
You can see which entities (hosts, containers, applications) are affected to quickly and accurately assess the scope and determine what needs to be remediated. You can isolate the source of the problem with automatic insight into how services and components of your environment are impacted by every issue.
Respond to incidents faster
Two-way integration with ServiceNow for correlated issues
Adding to our existing two-way integration with PagerDuty, you can now eliminate the toil of managing incidents across tools by syncing the state of correlated issues in Applied Intelligence with ServiceNow incidents bi-directionally. As the state of each correlated issue changes in either platform, it is now automatically updated in both tools. Applied Intelligence also supports a webhook for integrating with VictorOps, OpsGenie, and other tools of your choice.
Suggested responders for New Relic alert violations
Get automatic recommendations for individuals on your team who are best equipped to respond to an issue, either because they are experts in the component failing or have resolved similar issues before. This enhancement builds on our existing support for suggesting responders based on PagerDuty incident data, by suggesting responders for issues that originate from New Relic alert violations. Best of all, this feature is completely automatic and requires no configuration or model training—it just works out of the box as New Relic learns from the behaviors of responders on your team.
How to get started with AIOps
All New Relic Applied Intelligence customers have access to today’s new capabilities at no additional cost.
If you’re interested in adding AIOps capabilities to your New Relic implementation, you can get started now by clicking the “Alerts & AI” link in your New Relic account.
And if you’re new to New Relic but interested in digging in, experience the simplicity of New Relic One yourself by signing up for a forever free account, and check out New Relic Applied Intelligence.