현재 이 페이지는 영어로만 제공됩니다.

Alerting has always been one of the most important and most fragile parts of operating modern systems.

When alerts work well, teams respond quickly, confidently, and with minimal disruption. When they do not, engineers lose trust, incidents take longer to resolve, and real problems either hide in the noise or surface too late.

Most teams today are familiar with alert fatigue. What is discussed less often is the other side of the problem: missed incidents caused by inadequate coverage.

Modern alerting must solve both.

New Relic Smart Alerts, now in preview, are designed to help teams protect focus without sacrificing visibility, improving detection quality at the very front of incident response.

The reality of operating always-on systems

Modern production environments do not behave predictably.

Traffic patterns shift throughout the day. Infrastructure scales automatically. Workloads behave differently across regions, environments, and release cycles. A metric that looks abnormal at one moment may be completely expected an hour later.

Yet many alerting strategies still rely on static thresholds and manual tuning. Those approaches assume that “normal” behavior can be defined once and enforced consistently.

In practice, that assumption breaks down quickly.

Teams are forced into a constant cycle of adjustment: tightening thresholds to catch issues earlier, loosening them to reduce noise, and revisiting them again as systems evolve. Even then, blind spots remain.

The result is not just noisy alerts. It is uncertainty.

Noise is only half the problem

Alert fatigue is real and costly. Too many alerts interrupt engineers unnecessarily, erode confidence, and slow response when something truly matters.

But the opposite failure mode is just as damaging.

To avoid noise, teams often loosen thresholds or disable alerts entirely. Over time, coverage erodes. Incidents are detected later, sometimes only after customers are impacted.

This creates the observability paradox many teams experience daily:

  • You want complete coverage
  • You want alerts engineers trust
  • You are often forced to choose one

Smart Alerts are built to reduce that tradeoff.

Why traditional alerting struggles at scale

Static thresholds work best in stable, predictable systems. Modern systems are neither.

As environments grow more dynamic, static thresholds require constant manual tuning. That tuning does not scale across hundreds of services, each with different traffic patterns and usage profiles.

Manual tuning also introduces inconsistency. Different teams make different choices. Alerts behave unpredictably across environments. On-call engineers are left guessing whether an alert is actionable or ignorable.

At scale, alerting becomes brittle.

Introducing Smart Alerts

Smart Alerts focus on improving the quality of detection rather than simply reducing alert volume.

Instead of relying solely on fixed thresholds, Smart Alerts use historical behavior and patterns to help identify when system behavior is genuinely abnormal.

At a high level, Smart Alerts are designed to help teams:

  • Maintain broad alert coverage without constant manual tuning
  • Reduce false positives caused by expected variability
  • Detect meaningful deviations earlier
  • Standardize alert behavior across services and teams

The goal is straightforward:

When an alert fires, engineers should trust that it deserves attention.

What Smart Alerts are responsible for

Smart Alerts answer a specific and critical question:

Is something happening right now that requires attention?

They are intentionally focused on detection. They determine when to interrupt engineers, not how to resolve the issue.

By improving signal quality at the front of the incident lifecycle, Smart Alerts make everything downstream work better.

What Smart Alerts are not

Clarity around scope is essential.

Smart Alerts do not:

  • Perform root cause analysis
  • Explain why an issue occurred
  • Provide incident command center context
  • Correlate dependencies or recommend fixes

Those capabilities exist elsewhere in the New Relic platform.

Smart Alerts sit at the front door of incident response. Their responsibility is to ensure that the door opens at the right time and stays closed when it should.

This separation of concerns is deliberate and necessary.

How teams use Smart Alerts in practice

Teams typically adopt Smart Alerts in environments where manual tuning has reached its limits.

Common starting points include:

  • High-traffic services with frequent false positives
  • Business-critical paths where missed incidents are unacceptable
  • Large, multi-team environments where alert consistency matters

By improving detection accuracy, Smart Alerts help downstream investigation and response workflows function more effectively.

When alerts are credible, engineers spend less time questioning signals and more time addressing real issues.

Designed for real operational workflows

Smart Alerts are built to integrate with existing operational practices.

They respect current tagging, routing, and ownership models. They do not require teams to redesign incident workflows or adopt new mental models overnight.

Instead, they provide a more reliable detection layer that other tools and processes can depend on.

This allows teams to scale alerting practices without increasing cognitive load on on-call engineers.

Why this matters now

As systems become more complex and delivery velocity increases, both noise and blind spots grow more expensive.

Alerting is no longer just a configuration task. It is a core reliability capability.

Smart Alerts help teams strike a sustainable balance between focus and coverage, which is essential for operating modern, always-on systems.

Getting Started

Getting started with Smart Alerts is straightforward:

Step 1: Access Alerts Overview

Log into one.newrelic.com. Navigate to Alerts in the left-hand navigation menu and select Alerts Overview. This central hub allows you to visualize coverage gaps across your entire organization.

Step 2: Analyze Coverage Gaps

Review the Entity monitoring coverage section to identify which services (APM), browser applications, or synthetic monitors are Uncovered or only have Partial coverage. Click Manage coverage to see specific recommendations based on industry-standard condition standards.

Step 3: Apply and Customize

Choose your preferred method to close gaps:

  • Automated: Click Cover all gaps to apply recommended thresholds across all entities instantly.
  • Targeted: Use the Cover (N) button on specific entity cards (e.g., APM Services) to apply defaults by type.
  • Custom: Click Configure to manually adjust thresholds, add filters (like environment = 'production'), and set issue creation preferences.