“You’re not a real engineer until you’ve accidentally sponsored AWS’s quarterly earnings.”
It’s obviously meant as a joke, but I've heard it often across the industry because it rings true for many of us. The cloud promised a cheaper, more scalable infrastructure, yet it’s created serious sticker shock for many organizations. Cloud platforms make it easy to spin up resources but rarely provide controls to rein things back in. Teams scale workloads without tracking where dollars are going, and by the time a cost anomaly alert triggers, engineers scramble to manually identify which application caused the spike. This reactive firefighting is slow, stressful, and significantly inflates Mean Time to Resolution (MTTR). By the time the root cause is found, the problem has already morphed into an “unknown expense,” causing forecast-to-actual deviations to balloon and costs to outpace budgets.
At New Relic, we realized we needed to change how we managed costs as we moved to the cloud. As our workloads expanded and our platform scaled, our cloud costs began rising. It became clear that to scale responsibly, we had to get smarter about how we used and managed our resources. We needed visibility into what was driving spend and the ability to make cost decisions in real time, without sacrificing performance or reliability. So, we turned to what we know best: observability. By applying our own platform’s capabilities to understand cost patterns and inefficiencies across services, we built a FinOps practice that redefined how we manage cloud operations. By treating cost as an observable metric alongside performance and reliability, we reduced our cloud production cost per GB by 60% and have sustained these savings for several quarters.
In this blog, we’ll share how we achieved that reduction in cloud costs, what we learned along the way, and how you can apply those same principles, and tools, to make your own cloud spending more efficient and predictable.
Cutting cloud costs starts with visibility
The first fundamental step in reining in cloud spending is to seek cost insights and establish cost controls. You can't control what you can't measure. However, the modern cloud billing model is inherently reactive. The financial data arrives post-factum, often weeks after the resources were consumed, forcing teams to play catch-up, trying to find quick, short-term fixes to avoid budget overruns. In the end, the core challenge for any organization is the inability to trace dollars back to the specific source code and infrastructure that consumed them. Without that direct link understanding cloud utilization and the actual cloud cost per unit, cost remains an "unknown expense" that's impossible to predict or manage effectively.
At New Relic, we recognized that while our observability platform collected detailed telemetry across compute, storage, and network resources from our infrastructure agent deployed, this data was isolated from our billing data. This represented a critical gap in context: engineers saw system performance, but they lacked the essential dollar cost tied to every resource. At that point, we knew we had to engineer a fundamental connection layer, a "data bridge," which could automatically correlate every technical metric with its unit cost. With this correlation, we could easily transform our platform into a powerful financial lens, showing us service spend drivers, utilization waste, and cost estimate in real-time. So we began correlating resource metrics such as CPU hours, memory usage, I/O rates, network egress, with their respective cost units from our cloud providers. That’s when visibility truly clicked. Our dashboards no longer showed just latency, throughput, or error rates, they showed the cost footprint of every service alongside those metrics. Below is an example dashboard.
This granular visibility immediately forged a shared language across the business. Engineers could evaluate their work by financial footprint, and finance teams could link budget variance to technical events and resources immediately. While real-time monitoring was a massive victory, the true leverage lay in shifting cost consciousness upstream.
Shifting cost left for optimization
Once we had our financial lens, we naturally moved our focus from analyzing past spending to preventing future waste. The entire FinOps movement is about shifting cost left, meaning we embed cost-efficiency into the development lifecycle itself. This shift was easy to adopt because our engineers inherently understand that the best product must also be a financially optimized product. Our optimizations focused on two main practices:
Cost impact forecasting
We conduct cost impact forecasting before key deployments to understand the financial ripple effects of each change and new features and Product launches. This is now a standard step for all major feature development. Teams use correlated cost telemetry to see what a change will actually cost in terms of resources and dollars. This visibility creates a quick feedback loop, pushing teams to find the financially optimized configurations that still meets performance requirements. The finance team is also able to perform gross margin analysis for new Product launches.
Iterating for efficiency
DevOps teams experiment with different implementations to identify which one delivers the best performance at the lowest cost. When applied methodically to every project, these simple optimizations add up to significant cost savings. But this goes beyond simple right-sizing, it's about iterative analysis. We use traces to pinpoint expensive patterns like high-latency database queries or excessive logging that drives up costs. Teams quickly compare the cost-to-performance ratio of different instance types, services, or storage tiers to find the optimal resource choice for their workloads.
When applied methodically to every project, these simple optimizations can lead to significant cost savings.
Making cost a real-time KPI
The final stage of our FinOps journey was about sustained execution, making sure the efficiency gains from the design and development work persisted under production load. Cost had to be managed as a live operational metric, just like latency or error rates. This phase relied heavily on a few core principles:
- We established clear utilization goals tied to financial accountability. For instance, for our engineering teams, utilization KPIs were set directly by project, team, and budget owners, instantly cementing financial ownership for every running workload.
- To maintain our high standards, our teams conduct weekly reviews, specifically targeting low-utilization projects for right-sizing, consolidation, or decommissioning. This disciplined, regular action was the engine that drove up our overall cloud utilization and ensured our cost efficiency was sustained over time.
- To protect these hard-won gains, we needed an autonomous safety net against unexpected spikes. Our teams integrated anomaly detection services directly with the correlated cost telemetry we had built. This meant that the system proactively flagged and alerted service owners for any potential cost anomalies such as an out-of-control autoscaling event or a sudden misconfiguration. This capability allowed our teams to connect the financial spike to its technical root cause in minutes, rather than days, intervening instantly to cap spending.
The results speak for themselves. By following this systematic process, we reduced our cloud production cost per GB by 60%, and we've sustained these savings for several quarters.
From internal success to Cloud Cost Intelligence (CCI)
As we refined our approach and saw sustained results, we realized something important: the manual work we'd done to connect observability data with cost information was exactly what other organizations needed too. Every company we spoke to had the same blind spots, comprehensive performance telemetry and detailed billing reports, but no way to bridge the gap between them.
So we built those capabilities into our platform as Cloud Cost Intelligence (CCI). It automates the work we had to do manually, pulling cost data from cloud providers and correlating it with telemetry from applications, infrastructure, and Kubernetes clusters.
CCI gives teams the same visibility we created for ourselves. Instead of waiting for monthly bills, you get hourly cost updates showing trends, including both the actual cost and the estimated costs for the last 48 hours(estimated based on telemetry usage). You can break down spending by application, service, region, or team to see exactly where resources are being consumed. For teams planning deployments, it enables cost forecasting based on historical data and current usage patterns so you can estimate the financial impact before shipping a change. CCI using Anomaly Alerts notifies engineering teams of cost spikes in their services. Engineering teams have now been able to quickly respond to these Alerts to address these Cost spikes and optimize how their Services scale.
The screenshot below shows CCI’s Summary dashboard, where total cloud spend, cost trends, and regional breakdowns are displayed alongside hourly estimates and top cost variances.
Here, cost becomes just another observable signal. One that can be queried, visualized, and acted on like any other performance metric.
Note: This feature is currently provided as part of a preview program pursuant to our pre-release policies.
Puntos principales
FinOps is shifting from financial reporting to intelligent engineering. It’s no longer about tracking spend after the fact, it’s about designing systems that optimize cost, performance, and reliability from the start. When teams can assess the financial impact of a change as easily as they check performance, cost optimization stops being a special project and becomes standard engineering practice. This fundamentally shifts the conversations from "Why did we spend that much?" to "How much value is this specific resource providing?"
Applying this FinOps methodology enabled us to engineer and sustain our 60% cloud cost reduction. With Cloud Cost Intelligence (CCI), we’re aiming to bring the same practices to our customers. CCI unites financial insight with observability, giving teams the context to make technical and business decisions in the same view. It helps engineers see the cost impact of their work in real time, while empowering organizations to run leaner and scale with confidence.
As we move forward, our goal is simple: make cost intelligence an invisible part of good engineering. By embedding FinOps principles directly into the tools developers already use, we’re building a foundation for sustainable, data-driven innovation where performance and cost move together by design.
Las opiniones expresadas en este blog son las del autor y no reflejan necesariamente las opiniones de New Relic. Todas las soluciones ofrecidas por el autor son específicas del entorno y no forman parte de las soluciones comerciales o el soporte ofrecido por New Relic. Únase a nosotros exclusivamente en Explorers Hub ( discus.newrelic.com ) para preguntas y asistencia relacionada con esta publicación de blog. Este blog puede contener enlaces a contenido de sitios de terceros. Al proporcionar dichos enlaces, New Relic no adopta, garantiza, aprueba ni respalda la información, las vistas o los productos disponibles en dichos sitios.