The Extended Berkeley Packet Filter (eBPF) is a groundbreaking kernel technology introduced in Linux 4.x, enabling bytecode to run directly within the Linux kernel. Functioning as a lightweight, sandboxed virtual machine embedded in the kernel, eBPF provides controlled access to specific kernel resources in a secure and efficient manner. New Relic is advancing a unified approach to observability by leveraging the power of eBPF with the release of a new eBPF-based agent. This single, lightweight, high-performance agent is strategically designed to consolidate multiple planes of observability—including network performance, APM telemetry, infrastructure metrics, and logging data—into a cohesive stream. By operating within the kernel with minimal overhead, the eBPF agent bypasses traditional, resource-intensive collection methods, allowing customers to deploy one comprehensive solution that reduces complexity, streamlines management, and ensures complete visibility across their dynamic environments without the burden of managing disparate agents for each data type.
Modern distributed applications don’t just fail in code—they often fail at the network layer, where blind spots make it difficult to prove what’s really happening. When latency spikes or connections intermittently fail, teams often bounce between disparate tools, trying to manually correlate symptoms into a root cause.
Today we’re introducing eBPF Network Metrics, which provides deep visibility into TCP and DNS behavior directly from the Linux kernel by leveraging our new eBPF Agent, offering high-fidelity, real-time data without requiring changes to application code or traditional packet capture methods.
The Hidden Cost of Network Blind Spots
The network layer is often the hardest to exonerate during an incident. Known as "Mean Time to Innocence" (MTTI) . teams may spend hours of investigation and cross-team troubleshooting when network visibility is limited.
Network performance degradation remains one of the most common causes of application slowdowns in distributed environments. Industry research highlights the scale of this challenge:
- Undetected degradation: A recent survey found that more than 60% of network burnouts are first discovered by users or customers rather than monitoring systems.
- Operational impact: Network performance degradation can cost organizations hundreds of thousands of dollars annually through lost productivity and mitigation efforts.
- Downtime Costs: Research shows that unplanned IT downtime averages more than $14,000 per minute and can approach $24,000 per minute for large enterprises.
Kernel-level telemetry and unified observability workflows can significantly reduce the time required to identify the source of network-related performance issues.
Stop Guessing: Kernel-Level TCP and DNS Signals
Traditional monitoring can indicate that requests are slow, but it often fails to reveal the underlying cause. With eBPF Network Metrics, you gain access to granular signals collected directly from the kernel to explain network behavior:
- TCP Handshake Latency: Spot connection setup delays before a request reaches your service.
- Retransmissions: Identify packet loss and unstable network paths that degrade performance.
- Abnormal Closures: Catch unexpected connection termination patterns that trigger intermittent errors and hard-to-reproduce timeouts.
- DNS Resolution Failures: Detect name-resolution issues without guessing whether the application or the provider is at fault.
- Socket Level Errors: Use specific codes to differentiate between network path issues and server configuration problems.
Working smarter with network metrics alongside APM
Unlike traditional network monitoring tools that require context switching from application monitoring to pure network monitoring, this data is surfaced in a new Network Metrics view tab integrated directly with your APM entities. By correlating network behavior with application and infrastructure performance, your team can move from building hypotheses to validating actual causes in a single workflow.
Whether your environment relies on traditional language-specific agents or modern eBPF-driven APM for Kubernetes, Network Metrics integrates seamlessly into your existing workflow:
- Automatic Conflict Resolution: If an APM agent is already present, the eBPF agent is designed to "back off" the APM-specific data collection. This ensures that customers do not experience duplicate data ingestion or redundant overhead.
- Net New Visibility: While it intelligently avoids duplication, Network Metrics still delivers its unique kernel-level TCP and DNS signals as a net new addition to your telemetry.
Real-World Application and Network Troubleshooting:
Imagine a critical checkout service that suddenly experiences intermittent slowness. Traditional APM indicates a rise in response times, but your code hasn't changed, and standard infrastructure metrics show healthy CPU and memory levels. This is where eBPF network metrics provide the "smoking gun" by allowing you to follow a logical diagnostic path:
- Detecting the Error Pattern: From the APM Summary or the new Network View, you notice a surge in "Abnormal Connection Closures" and TCP errors.
- Isolating the Latency Source: By drilling into the Latency tab, you see a dramatic spike in TCP Handshake Latency to an external service. This may happen if an external API used by the application is not performing due to increased load on the external API or service outage. This confirms the bottleneck is occurring during connection establishment—before the request even reaches your application code.
- Confirm the Cause: To pinpoint the cause, you examine TCP flags (packet codes). You discover a high volume of SYN-ACK delays and RST (Reset) packets originating from a specific downstream dependency.
The Result: Instead of hours of guesswork or manual packet captures (like Wireshark), you have documented evidence that a misconfigured firewall or network path is the root cause. You can now hand off the issue to the network team with specific process-attributed data, reducing your Mean Time to Resolution (MTTR) from hours to minutes.
Getting Started
eBPF Network Metrics is now Generally Available and offers a single-agent, language-agnostic approach to collecting kernel-level network telemetry across modern environments. . It helps teams close visibility gaps without the need for a patchwork of language-specific agents.
You can begin using eBPF Network Metrics with these three simple steps:
- Deploy the Agent: Click here to install the lightweight eBPF agent on your Linux hosts or Kubernetes clusters - no need to swap app libraries or restart your applications.
- Identify Your Service: Navigate to the APM UI and select the specific service entity you want to investigate. The eBPF agent automatically maps network data to your existing services.
- Validate and Troubleshoot: Open the Network Metrics tab within your service entity. Here, you can immediately begin correlating application latency spikes with real-time kernel signals like handshake delays, DNS failures, or TCP retransmissions.
本ブログに掲載されている見解は著者に所属するものであり、必ずしも New Relic 株式会社の公式見解であるわけではありません。また、本ブログには、外部サイトにアクセスするリンクが含まれる場合があります。それらリンク先の内容について、New Relic がいかなる保証も提供することはありません。