As software systems become more complex and the demand for quality and reliability increases, DevOps, SRE, and network operation center (NOC) teams can find themselves overwhelmed by a constant flood of information. Between noisy alerts, signals distributed among multiple tools, and thousands of “unknown unknowns,” it’s difficult to quickly determine and address the root cause of incidents, let alone detect and respond to issues proactively. Troubleshooting and incident response are further complicated by an influx of alerts from multiple tools that can create distractions and response fatigue for your team.
We’ve seen these problems and know the struggle of maintaining complex, large-scale systems. That’s why we’re excited to announce the general availability of New Relic Applied Intelligence (AI), an AIOps solution that helps on-call teams detect, diagnose and respond to incidents faster. New Relic AI is built to empower your team to get out of reactive “fire-fighting” mode and back into the creative, challenging, exciting work of building great software.
Fast to connect, faster to value: meeting you in the tools you already use
If your DevOps, SRE, or on-call team is tasked with maintaining complex infrastructure, you may rely on a multitude of tools to detect and respond to incidents. There are great tools to observe systems across your full technology stack; tools to notify you when incidents occur; tools to track the status of in-progress and follow-up actions; and tools to communicate with other team members. For on-call teams that are under pressure to reduce mean time to resolution (MTTR), this ever-growing list of tools can pose problems: Incident, event, and operational data is fragmented, siloed, or redundant, making it harder to find the information needed to diagnose and resolve incidents.
AIOps platforms promise to solve these problems with a centralized, intelligent feed of incident information that displays everything you need to troubleshoot and respond to problems behind a single pane of glass. Unlocking this value, though, can require a significant time commitment and workflow shift, potentially costing your team hundreds of hours in integration, configuration, training, and onboarding tasks.
The New Relic AI approach is radically different: It combines the value of an intelligent system with minimal configuration requirements. New Relic AI is source and data agnostic, integrating with PagerDuty, New Relic Alerts, Splunk, Prometheus, Grafana, Amazon CloudWatch, and other data sources via our REST API. New Relic AI works out of the box without requiring weeks to onboard and study your data, and it learns over time, automatically aggregating, correlating, and prioritizing your incident data to help your teams reduce alert fatigue. This streamlined, enhanced information is available right in your team’s existing incident management tools like PagerDuty, ServiceNow, OpsGenie and VictorOps, so you don’t need to reinvent the way you respond to incidents.
Going a step further, we integrate with the notification and collaboration tools you already use and deliver critical insights, like automatic anomaly detection to your Slack channels or other notification channels of your choice. Crucial information about your production system is now accessible at your fingertips, with no need to change your on-call workflow.
More intelligence throughout the entire DevOps cycle
Rather than narrowing our approach to one specific aspect of the incident response process, we strengthen the relationships between each stage of the process to create a more powerful solution. Focusing only on faster detection, faster understanding, faster response, or faster follow-up is not enough; you need a tool that thinks like your best SREs—from a systems perspective.
Proactively detect anomalies
The first step of the incident response process is detecting potential problems. New Relic AI provides automatic anomaly detection that you can configure in minutes with only a few clicks.
It’s easy to tell the system which applications and services you’d like to monitor for anomalies, and send real-time failure warnings to tools where you want to receive notifications, like Slack. For many on-call teams that are collaborating on problems via Slack, this surfaces critical context about potential problems within the tools where your teams are already getting work done. You can also set webhooks to send failure warnings from New Relic AI’s Proactive Detection to custom notification channels of your choice.
"New Relic AI's proactive detection capability was very easy to set up and use. There were zero agent configuration changes or deployments needed," said Jeffrey Hines, Senior Site Reliability Engineer, Signify Health. "Specifically, it helped my team achieve speed, agility and provided operational visibility which ultimately helps us reduce incidents, integrate machine learning and analytics into operations and improve overall customer experience.”
Reduce alert noise & fatigue
On-call teams are familiar with noisy alerts triggered by low-priority, irrelevant, or flapping issues. These can lead to alert fatigue, cause distractions, and increase the probability that a critical signal will go unnoticed. New Relic AI’s Incident Intelligence uses a baseline of industry-standard knowledge, and then learns from your data and your team’s feedback to intelligently suppress alerts you don’t care about and correlate related incidents, without excessive configuration, training, or onboarding. Customers already using New Relic AI have reported that they have seen automatic reductions of noise in excess of 80%, along with more streamlined and useful alerts.
"Today, the biggest problem IT Ops teams struggle with the most is making sense of vast volumes of event alert noise, impacting a team’s ability to focus on building flawless software. With New Relic AI, our teams will have a clear understanding how specific issues affect business services, allowing them to quickly identify and prioritize the most business-critical issues. With this launch, we look forward to harnessing the power of targeted intelligence and ultimately optimizing cost.” Peter Hammond, Global Head of Technology Operations, Morningstar, Inc.
Transparency, trust, and control
You and your team need to trust that correlations aren’t missing key signals, and that trust comes from transparency. We believe AIOps tools shouldn’t be a black box, which is why New Relic AI clearly shows you exactly why and how issues are correlated so you can trust that no signals are being missed. Using AI and machine learning (ML), New Relic AI can suggest relevant correlations based on your historical data.
You can also build your own decisions and inform the correlation engine by telling New Relic AI what data to compare and what to correlate. Set the frequency and duration thresholds, and choose out-of-the-box similarity algorithms as desired to fine tune the correlation engine.
Diagnose and respond faster
Once an issue is identified and your team is paged the investigation and troubleshooting process begins. Getting closer to the root cause and determining steps to resolution usually account for the majority of the time between an issue occurring and its remediation. New Relic AI accelerates this process by giving you useful context about your existing issues, including their classification based on the “Four Golden Signals” (latency, traffic, errors, and saturation), and information on any related components, so you can get to probable root cause faster and isolate the source of the problem. New Relic AI even suggests responders based on your data, and provides the flexibility to easily determine where and how you send issues to your teams; for example, it’s easy to set all incidents with a particular application name to only notify that team’s dedicated PagerDuty service using the pathways feature in New Relic AI.
Just like a new team member, New Relic AI gets smarter and builds system-specific knowledge about your team’s infrastructure as it studies your data. Your team can provide feedback about the quality of issue correlations, automatically surfaced information, and suggested responders, helping the system adjust and deliver even more focused, relevant insights over time.
No change to your existing incident management workflows
New Relic AI meets you where you are, with correlated, enriched incidents and context delivered within your existing incident management workflows and tools, so you don’t need to change the way you respond to incidents. Simply connect your existing tools as data sources and destinations via New Relic’s guided configuration interface or our REST API and webhooks, and the solution takes care of the rest, ingesting incident data from your toolchain, enriching it with context, providing smart suggestions and guidance, and delivering relevant insights about incidents to your tools of choice.
Smarter tools for more perfect software
New Relic’s mission is to instrument, measure, and improve the internet to help our customers create more perfect software, experiences, and businesses. In order to do this, we believe it’s critical to embrace solutions that are easy to connect and configure, work with the tools teams already use, create value throughout the entire observability process, and learn from data patterns and user feedback to get smarter over time. New Relic AI is one more step in this journey. It’s already making a difference for busy DevOps, SRE, and NOC teams, and we’re excited to see the value it can bring to your teams, too.
Resources for getting started
- Accelerate Incident Response with AIOps: An introduction to AIOps best practices with New Relic AI (eBook)
- Accelerate Incident Response with AIOps (Webinar)
- New Relic AI documentation
To see how New Relic AI can help you and your team, request a demo.