Introduction
In the world of big data, Databricks is a mission-critical platform. But how do you ensure your workloads are running efficiently, cost-effectively, and reliably? The Databricks Integration from New Relic delivers total visibility for your entire Databricks estate, allowing you to troubleshoot, optimize, and connect performance directly to cost—all from a single, unified observability platform. This integration is designed to give you immediate, actionable insight into your Databricks performance, health, and consumption.
Key Capabilities
The Databricks Integration is an open-source community project that provides a comprehensive suite of telemetry collection capabilities across your Databricks estate. These capabilities ensure you have the full, in-context data you need for deep analysis and optimization.
Databricks Component | Key Telemetry Collected |
|---|---|
Databricks Component Spark Applications | Key Telemetry Collected Executor memory, CPU, and storage metrics; Job, stage, and task durations; Task I/O metrics |
Databricks Component Lakeflow Jobs | Key Telemetry Collected Job and task run durations, start/end times, and termination codes |
Databricks Component Lakeflow Spark Declarative Pipelines | Key Telemetry Collected Update and flow durations, start/end times, and completion statuses; Pipeline event logs |
Databricks Component SQL Warehouses/Serverless Compute | Key Telemetry Collected Query execution and compilation durations; Query I/O metrics (bytes/files read, rows read, etc.) |
Databricks Component Classic Compute (Clusters) | Key Telemetry Collected Driver and worker node CPU and memory metrics; Driver and executor logs; Spark event log |
Databricks Component Consumption & Cost | Key Telemetry Collected Billable usage system records; List pricing data; List price per job and job run |
Key Benefits
The integration translates deep telemetry into clear business value, helping you maximize your Databricks investment.
Accelerate Troubleshooting and Improve Reliability
Stop wasting time correlating data across disparate tools. The New Relic Databricks Integration offers a "single pane of glass" for all your Databricks telemetry, drastically speeding up issue resolution.
- Unified View: See Spark applications, Lakeflow jobs, and infrastructure telemetry in one place, allowing you to quickly spot bottlenecks.
- Contextual Visibility: Understand how your Databricks performance impacts, and is impacted by, your broader application and infrastructure ecosystem.
- Pinpoint Issues: Use detailed metrics like stage duration, task I/O, and job termination codes to pinpoint the exact root cause of slow or failing jobs.
Enhance Performance and Resource Utilization
Deep performance data allows you to tune your resources and code for maximum efficiency.
- Spark Optimization: Identify and optimize long-running Spark jobs or those with heavy shuffle operations. Tune classic compute resources by reviewing executor memory and RDD storage metrics.
- Query Optimization: Identify and optimize long-running SQL queries or queries with data spilling by viewing execution duration and I/O metrics.
- Resource Tuning: Prevent over- or under-utilization of classic compute clusters by monitoring driver and worker node CPU and memory metrics.
Optimize Investment and Control Costs
By connecting workload performance metrics directly to DBU consumption and estimated costs, you gain the ability to optimize your Databricks spend.
- Cost Efficiency: Identify which features or workloads are driving consumption using billable usage data broken down by SKU.
- Job-Level Cost Analysis: Pinpoint the most expensive jobs and job runs to focus your optimization efforts.
Effortless Setup, Immediate Insights
Getting the Databricks Integration up and running is designed to be effortless, providing immediate insight without complex setup.
- Seamless Installation: Thanks to the provided cluster init script, the integration deploys easily on to your cluster nodes, requiring no special commands or extra infrastructure.
- Automatic Instrumentation: With the turn of a key, the integration can optionally and automatically instrument your cluster nodes with New Relic Infrastructure and Logs.
- Immediate Business Value: Once installed, immediately gain value by installing the provided sample dashboards.
Get Started
Get started today in three simple steps.
- Install: Install the integration on to your cluster nodes using the provided init script to easily connect your Databricks platform to New Relic.
- Verify: Verify data is flowing correctly with one simple query.
- Visualize: Install the example dashboards using the guided installation to gain immediate visibility with pre-configured views.
Learn More
To learn more about the integration, visit the official New Relic Databricks Integration repository, or jump straight to the Getting Started section for immediate setup instructions.
Conclusion
The New Relic Databricks Integration moves beyond simple monitoring—it delivers total visibility and the actionable insights required to troubleshoot faster, optimize consumption, and ensure the health and reliability of your data pipelines. Take control of your Databricks platform today and transform your data operations from complex to fully observable.
本ブログに掲載されている見解は著者に所属するものであり、必ずしも New Relic 株式会社の公式見解であるわけではありません。また、本ブログには、外部サイトにアクセスするリンクが含まれる場合があります。それらリンク先の内容について、New Relic がいかなる保証も提供することはありません。