AI is now generating a substantial amount of the code your organization ships to production, code that on the surface, looks great. According to recently released The 2026 State of AI Coding, 93.5% of technology leaders rate AI-generated code as higher quality during the review stage. But there is a catch. Once that code actually ships, 78% of those same leaders report an increase in production incidents.
This disconnect exists because AI coding tools have perfect visibility into the source code, but they cannot anticipate how newly generated code or agent workflows will actually behave under real-world conditions, creating “agent debt.” Similar to technical debt in software development, agent debt refers to the accumulation of risk that occurs when engineers rapidly build and deploy AI-generated code and agent workflows without fully validating, refining, or cleaning them up. And as a consequence, fixing it requires additional time, resources, and expertise.
To realize the true ROI of AI, platform teams need a trust layer between AI-generated code and production reality, transforming blind AI-driven development into governed acceleration. These are the challenges and opportunities we’re unpacking at New Relic NOW 2026. Register to watch the recording to see why intelligent observability is the foundation your AI-first strategy depends on.
AI-driven development is no longer emerging, it is the default
In the Superhuman era, generating code was never going to be the hard part. AI-augmented engineers can now move at unprecedented speed to build systems far more complex than any one person could manage alone. But that velocity creates new risk: 62% of organizations already ship AI-generated code to production without manual verification. The result is greater potential for failures and unexpected behavior, which can undermine trust in AI-generated code, agentic workflows, and AI-first systems.
To navigate this transition successfully, AI observability needs to move from an optional safety net into the load-bearing trust layer of AI-driven development. It is the mechanism that closes the AI code gap. By feeding deep production telemetry back into the development loop, platform teams can catch agent debt before deployment and ensure AI-generated code performs as reliably in production as it appears in the pull request.
The right tools to close the loop
Closing the AI code gap takes more than bolting AI features onto existing monitoring tools. It takes production awareness built directly into your AI-driven development workflows.
The New Relic Intelligent Observability Platform does exactly that. By integrating AI Observability natively into capabilities like Service Architecture Intelligence (SAI/IDP), it transforms observability from a passive set of dashboards into an essential building block for high quality AI-driven development. The payoff is real: fewer outages, protected revenue, faster delivery, and engineering talent focused on growth instead of cleanup.
95% of leaders rate observability as very or extremely important for AI-generated code.
To close the gap between AI-assisted development and production stability, New Relic equips teams with the capabilities they need to meet AI challenges head-on and move into production with confidence:
- AI Observability: Empowers enterprises to safely scale in the agentic era by transforming unpredictable AI systems into transparent, governed assets through real-time cost management, deep reasoning visibility, rigorous quality guardrails, and autonomous incident resolution.
- Preflight: Initially released as AI Code Observability, New Relic Preflight is a new open-source solution that extends production-grade monitoring directly into your IDE, transforming fragmented, unmonitored AI usage into an auditable enterprise advantage.
- ChatGPT App Monitoring: Comprehensive, real-time visibility into ChatGPT applications to optimize workloads and secure revenue channels.
Introducing New Relic Autopilot
The hard part of an incident isn't seeing that something broke. It's knowing why and what to do next, often at 3 a.m.. Today, SRE Teams burn hours troubleshooting, stitching together signals across dashboards, logs, traces, and runbooks before they can even begin to fix the problem. This manual investigation is slow, toil-heavy, and fragile, with engineers spending about a third of their time fighting fires. With outage costs doubling year over year to roughly $2 million per hour, enterprises need a better way to triage incidents, find root cause, and recommend remediations, without the late-night escalation chain and the guesswork that comes with it.
New Relic Autopilot is the autonomous SRE that investigates, explains, and helps fix incidents using real-time data from your environment. It compresses investigations from hours into minutes, accelerating MTTR, protecting revenue during critical outages, and freeing engineers from the toil that consumes a third of their week. By grounding answers in your telemetry, runbooks, retros, and past incidents, Autopilot democratizes tribal knowledge across the team, enabling faster recovery, lower operational costs, more resilient systems, and the confidence to resolve incidents beyond human scale, without waking a senior SRE in the middle of the night.
Autopilot helps teams resolve incidents beyond human scale by providing:
- Ever-growing team of expert agents: Domain specialists in Kubernetes, Kafka, and root-cause analysis (with more coming soon), each grounded in expert insight and concrete data. Every result is a well-structured, factual account of the problem, its cause, and the recommendation, plus quick answers, onboarding help, snapshots, and ad-hoc analysis.
- Improved agentic automation: Works where you do with Slack integration (mentions, incident channels, and threads, with Teams coming soon), and workflow actions triggered on a SEV1, after a deploy, or on a schedule, running synchronously for the moment after a page and asynchronously for deep investigations.
- Full in-context answers: Leverages New Relic Knowledge to ground answers in your runbooks and retros and recalls similar past incidents. New Relic Agentic Ecosystem connections to Jira and GitHub pull in the specific code details that pin down a problem. Long-term memory, scoped to an individual, a set of accounts, or the whole org, captures tribal knowledge and disperses it across the team, helping a first-week responder make decisions like a seasoned pro.
- Trusted self-improvement: Autopilot leads with its recommendation and shows the reasoning and the data behind every insight, and features a rating system where low scores are automatically routed for improvement.
- Usage controls built-in: Complexity-based, value-following pricing with caps to prevent surprise spikes, including customer controls to guide agent behavior and manage cost.
New Relic Autopilot will be available in late July 2026.
New Relic Ground Truth
Autopilot is only half the story. As AI-driven development accelerates, teams are investing in specialized AI agents for operations and production, but the results are often limited by the quality of the data those agents can access. Public APIs and basic query tools give agents raw telemetry, then leave them to do the heavy lifting. To answer a single mission-critical question, an agent may need to fire the same basic query tool ten times, burning tokens and round trips just to piece together an answer it still may not fully trust. Teams need dependable, structured, agent-optimized access to their richest insights through the agents they already run.
New Relic Ground Truth solves this by providing AI-optimized tools that give the agents you already run, whether GitHub Copilot, Claude Code, AWS DevOps, or an orchestrator you’ve built, direct access to the deepest insights in your New Relic data. These are insights you simply can’t get through public APIs or basic tools; they’re surfaced specifically for AI agents and built on the Intelligent Observability Platform customers already trust. That makes the agents you’ve already invested in measurably smarter, sharper, and more efficient, without forcing you to rip out your stack or maintain your own tooling.
Ground Truth also reduces tool calls and token spend while accelerating MTTR on the questions that matter most by replacing many basic queries with a single curated insight. In fact, one large enterprise measured a 1.1% error rate from New Relic Ground Truth tooling, well below the competing ITOps vendors it evaluated, across more than 1,300 users.
- Exclusive, agent-optimized insights: Access New Relic richest insights built for on-platform experiences and impossible to get through public APIs or basic query tools. Each tool returns a curated answer in a single call, purpose-built for how AI agents reason.
- Better agent token efficiency: One premium tool call replaces many basic queries, so your agents reach answers with far fewer round trips and far fewer tokens. Less triangulation and token spend, more resolution.
- BYOA (Bring Your Own Agent) in context: Works with the agents and orchestrators you already run, including GitHub Copilot, Claude Code, AWS DevOps, or your home grown. New Relic is the grounding, you bring the agent.
- Proven, reliable access: Battle-tested tooling with dependable, structured access to your New Relic data, governed by existing auth and role-based access controls (RBAC) to control agents access.
Ground Truth will be available alongside Autopilot in late July 2026.
New commitments: FedRAMP High and DoD IL 4
In an AI-first world, governance is non-negotiable. We are excited to announce our commitment to elevate our FedRAMP authorization from Moderate to High and achieve DoD Impact Level 4. This means aligning with 400+ rigorous federal controls to provide public sector and regulated enterprises (finance, healthcare, retail) with the ultimate proof that New Relic has the security, audit trails, and data boundaries required to be in high-impact data environments.
New Relic for Startups, from first customer to superhuman scale
Startups today are building on vibe-coded codebases from sprint one, making critical architectural decisions that will dictate how they scale. Without a dedicated platform to help them manage agent debt effectively, these nimble organizations can never become the next billion dollar company.
Today we are proud to announce our revamped Startup Program to provide early-stage companies with the professional-grade observability tools required to establish system reliability, user trust, and rapid scale from day one. This program is our direct commitment to ensuring the early-stage builders defining the next generation of software have the foundational observability they need to from their first customer to enterprise scale, no re-platforming as they grow.
Learning from teams already running AI agents in production at scale
To ground these challenges in reality, New Relic NOW 2026 is turning the mic directly to the practitioners. In our live customer panel, you will hear from AI-native founders and engineering leaders, including experts from OpenAI and LaunchDarkly, who are already running large language models in production. This is a conversation about real builders making real decisions, focusing on the concrete architectural choices that determine whether an AI-first product thrives or collapses under its own operational weight.
The consensus from these industry pioneers is clear: monitoring AI in production is no longer a "nice-to-have" luxury, but load-bearing infrastructure. By choosing New Relic, these teams are actively catching conflicting prompts, memory pollution, and logic errors that would have otherwise slipped into production unseen. They are proving that when you equip developers with deep telemetry natively in their workflow, you don't just manage agent debt, you prevent it from accumulating in the first place.
Relentless innovation to solve our customer’s new challenges
At New Relic Advance in February, we highlighted our commitment to deliver the specific tools practitioners need to navigate this shift. Those commitments have reached general availability and are now market-leading innovations. Each one reflects our focus on turning chaotic firefights into targeted resolution workflows:
- eBPF Network Metrics: Deep, infrastructure-level network visibility that identifies hidden bottlenecks with zero instrumentation overhead.
- Mobile Session Replay: True production visibility into actual user behavior on the front end, enabling rapid, targeted issue resolution.
- Notebooks: A collaborative analysis layer that empowers teams to use variables to turn one-off investigative queries into dynamic, repeatable SRE runbooks.
- New Relic Knowledge: Institutional intelligence that fuses real-time telemetry with historical incident data and system changes, surfacing deep operational context precisely when you need it.
Looking Ahead
As we look toward the next AI innovations and beyond, the compounding nature of agent debt presents a formidable challenge. What happens when organizations have two full years of accumulated AI-generated code running in production, layered with conflicting agent workflows, overlapping tool calls, and absolutely no provenance regarding who prompted what? This is not a hypothetical future scenario. For the fastest-moving engineering teams, it is already happening today. Addressing this mounting complexity requires foresight, and New Relic is the most capable observability platform thinking about AI governance at this architectural depth.
To explore where this goes next, we invite you to tune in to the live conversations happening today at New Relic NOW 2026. Our Developer Relations leaders are taking the stage to discuss the emerging problems they are seeing in the field—the challenges that don't have easy solutions yet—and how the industry must evolve to meet them. View the on-demand recording, download the full 2026 State of AI Coding report, and be part of the community that is defining what it means to build reliable software in the Superhuman Era.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.