10 Common Struggles with Cloud-Native Infrastructure

Cloud-native architecture adoption is still growing, with more than 6.8 million cloud-native developers, according to the Cloud Native Computing Foundation (CNCF). This includes engineers in organizations that use container orchestration tools, serverless platforms, or a combination of both. Whether your team is moving to Kubernetes, deploying in multiple clouds, or using a hybrid cloud design, cloud-native infrastructure can bring great benefits, but also multiple pain points.

Are you looking to transition to a cloud-based infrastructure? In this blog post, we’ll cover the most common challenges with cloud-native architecture to look out for.

1. Inefficiencies are a big deal when you’re paying in a utility model.

Utility-style pricing is different for cloud-native infrastructure than what teams might be used to for purchased physical or virtual servers. There are additional costs in cloud-native architecture, and you’ll need to manage costs properly in a utility-style pricing model. Cost is variable and proportionate to compute consumption.

With both on-premises deployments and more traditional migrated architectures—for example, Amazon EC2 instances—you have a series of stepped sunk costs, such as physical or virtual machines. One modernization option is to adopt serverless functions here instead. But individual serverless functions can cost more money in cloud-native architectures if not properly defined. High memory, high concurrency and long-running functions can increase the costs exponentially if your function is designed to scale out. Identifying inefficient cloud resources is critical with cloud-native infrastructure to avoid additional costs.

2. You’re working with short-lived components and variable microservices.

With on-premises or hosted instances, you have a known, fixed pool of compute resources. Cloud-native microservices, on the other hand, are typically on-demand, elastic in both quantity and duration, often stateless, and are ephemeral in nature, only existing while needed. If a microservice no longer exists or has changed state, it’s hard to determine what went wrong at a point in the past.

With New Relic, you can see telemetry over time and correlate that with user complaints, event notifications, or performance anomalies. You can record and monitor deployments, tracking them with deployment markers so you can correlate them with your application's performance.

3. You have little insight into underlying infrastructure.

When you move to cloud-native infrastructure, a big challenge is simply the lack of insight. You don't know what’s going on with the underlying infrastructure.

You need an observability solution that can help you determine what services are the source, and identify if the problems are with the cloud-based infrastructure that exists outside of your administrative control.

Cloud vendors have monitoring services, and New Relic can give you visibility across those services. This helps you determine what has gone wrong, looking across the whole of your system. With New Relic, you can establish cloud integrations for Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform.

4. Security becomes even more important, yet more difficult.

That lack of visibility described in the previous challenge becomes even more problematic when it comes to security. It’s hard to see everything, so you might miss some important security risks. Investigation becomes costly when there is so much security data to analyze. In cloud-native infrastructure, you need to determine which data is relevant.

With many tools and multi-cloud and hybrid-cloud environments, it's hard to enforce consistent policies. A cloud infrastructure that isn't configured properly is at risk for attacks. All this, and you need to be able to respond quickly.

With New Relic, you can integrate with your cloud provider’s security services. Check out integrations including AWS CloudTrail. If you’re tired of silos and additional configuration, using New Relic Security RX can help you get your observability and security all in one place.

5. Increase in development and release cycles requires coordination.

With cloud-native infrastructure, you’re dealing with more frequent deployment cycles and continuous integration and continuous delivery (CI/CD) pipelines. Organizations have built DevOps practices for engineering teams to collaborate using cross-functional teams, but we’ve been noticing the “shift left” trend as devs need to catch problems before they become painful and expensive to solve later.

To deploy more often, you need to ensure stability. You’ll want to track deployments so you can correlate them to your application’s performance. Here’s where deployment markers help you understand cause and effect, seeing when issues happened related to deployment.

With New Relic, you’ll view these deployment markers in your application performance monitoring (APM) charts. Other ways that New Relic can help with CI/CD pipelines is through integration with CI/CD tools that you are already using and better collaboration between teams with issues visible right in your IDEs with CodeStream.

6. You need to manage and monitor multiple systems and multiple clouds.

In a static system or even a virtual machine, there are more processes and approvals for adding a new system, but in a cloud-native environment they can be added with just a line of code. Managing and monitoring all these changes from the quick proliferation of resources and services can be overwhelming.

Despite this disorderly environment, engineering teams still need to maintain the availability, performance, and security of their applications, even when they span multiple cloud environments, physical machines, virtual machines, containers, stateful containerized workloads, orchestration engines, and serverless platforms.

You’ll need visibility across all these systems and clouds, without being locked into using one particular cloud vendor. An observability platform that lets you see everything in one place, but still integrate with other tools you’re using, helps you move through the chaos.

7. It’s difficult to manage system configuration settings.

Ideally there will be a single source of truth for infrastructure-as-code, but in practice some settings can’t be centrally managed or implemented through code. Also, systems configurations can drift over time as application owners and systems operators make localized changes outside the deployment pipeline.

Once that drift happens, you no longer have a predictable environment that everyone can understand. Configuration drift increases with complexity, like you have with cloud-native infrastructure.

With New Relic, you can identify specific components, services, or configurations that are operating in anomalous or unpredictable ways. You can quickly use data to determine the actual behavior and how it differs from intended operation.

8. You need to deal with an abundance of flexibility and overcome analysis paralysis.

When you have so many choices, figuring out the right choice is critical. Monitoring all the pieces of your cloud-native environment means you can collect a lot of data. But how do you make sense of it without being stuck?

You need an observability platform where you can weigh your choices and optimize for the best action. Consider the cost-performance ratio between different cloud services, and how to rightsize your environments. With the right data, you can experiment and fail fast, then move on quickly. With New Relic you can get deep visibility into all your infrastructure in one place.

For example, instead of manually sifting through dashboards to understand why problems occur and what they affect, you can get to the root cause of every incident.

9. Reliability issues are entwined in your cloud-native architecture.

Even within a cloud, you must architect the right way to use the cloud capabilities to avoid reliability issues. Some groups might use multiple regions, and achieving reliability can be challenging and costly.

Within a cloud, you want to have observability into your reliability. Then you’ll be able to see whether or not your approaches are successful by observing uptime, overall performance metrics like slow page load time. Architecting for reliability is not enough—you’ll need to be able to observe reliability, all along the way, from the frontend user experience to the infrastructure in the backend.

Reliability can take a lot of different formats. You often might not know where your risks are. So you can’t make decisions about highly available architecture without observing over time. That's why it's important to understand cloud costs.

You’ll need an observability tool to see where your reliability is paying off. Learn how to successfully influence reliability decisions with New Relic.

10. You need both organizational shift and teams staffed with the right skills.

As organizations transition to working with cloud-native infrastructure, they find that their teams' DevOps process and culture needs to shift to CI/CD pipelines, as discussed in Challenge 5. But teams also need new skills as they shift their culture.

When there are outages in a cloud-native environment, it’s all hands on deck. Teams will need to build cloud architect skills. Specialized expertise includes building and architecting microservices-based applications, container- and Kubernetes-based applications, and applications that leverage the services from public cloud providers.

The shift in culture needed with cloud-native architecture naturally aligns with a culture of observability. When micro-failures are expected and rapidly mitigated, everyone needs observability, both operations and development teams. Both dev and ops engineers care about metrics like Apdex, average response time, error rates, and throughput.

With observability, you can deal with these failures quickly. You get the freedom to become more nimble.

We hope reviewing these cloud infrastructure struggles have helped. If you're curious to know how else New Relic can help you, see how we worked with ZenHub.

How New Relic can help

One temptation with all these challenges is to expect monitoring services that are included with your cloud providers to take care of all your concerns. Cloud vendors are amazing at providing basic building blocks. But when it comes to observability, you need deep capabilities with customization, and you’re not going to get there with the built-in cloud monitoring tools.

You need full-stack observability to get the insights in a dedicated platform. With New Relic, you have one place for unified visibility into all your telemetry, across multiple tools and across your entire system. You can see change in the context of your entire system, visualize relationships and dependencies across your cloud-native infrastructure, and quickly find root causes.

If you’re not already using New Relic, sign up for a free account. Your account includes 100 GB/month of free data ingest, one free full-access user, and unlimited free basic users.

Dayne Miller

Dayne Miller is a Sr. Partner Solution Consultant at New Relic. He has an extensive background in enterprise migrations, compute and network infrastructure, network engineering and security, and the AWS cloud.

Die in diesem Blog geäußerten Ansichten sind die des Autors und spiegeln nicht unbedingt die Ansichten von New Relic wider. Alle vom Autor angebotenen Lösungen sind umgebungsspezifisch und nicht Teil der kommerziellen Lösungen oder des Supports von New Relic. Bitte besuchen Sie uns exklusiv im Explorers Hub (discuss.newrelic.com) für Fragen und Unterstützung zu diesem Blogbeitrag. Dieser Blog kann Links zu Inhalten auf Websites Dritter enthalten. Durch die Bereitstellung solcher Links übernimmt, garantiert, genehmigt oder billigt New Relic die auf diesen Websites verfügbaren Informationen, Ansichten oder Produkte nicht.

780+ Integrationen für Ihren Einstieg ins Stack-Monitoring. Kostenlos.

Alle Integrationen

In this article

10 common struggles with cloud-native infrastructure