Welcome to part one of a new series focused on how to implement observability in Azure using New Relic. As we start this journey, we'll focus on what data should be collected in Azure and why it matters, before moving on to more advanced instrumentation in future iterations. In this introductory post, we cover foundational elements, including:
- Azure concepts: Understanding the terminology and structure.
- Data capture essentials: Identifying what telemetry data can be collected.
- Integrating with New Relic: How to ingest this information into New Relic.
But before jumping into the details, let’s address a few questions:
Why is it important to implement observability for your Azure environment?
Azure is an incredible platform, offering over 200 different services that can be used to build solutions for businesses. However, outages can directly impact revenue and cause strain on internal teams, making it crucial to detect and resolve them promptly.
Who should read this blog?
This post is for anyone seeking guidance, direction, or tips on how to successfully set up New Relic in an Azure account from the perspective of an ex Microsoft Solutions architect.
What benefits does New Relic offer on top of what Azure provides?
New Relic builds on the information available in Azure and enhances it by:
- Automated relationship discovery and mapping: New Relic maps infrastructure metrics, logs, database queries and more to your applications - automatically. Reducing the engineering effort and cognitive load to understand the data and where to look.
- Curated views: Focus on a specific entity while leveraging an opinionated view of what we’ve found is important for customers (as an observability leader).
- End-to-end visibility: Gain comprehensive insights into your entire digital estate. From a user clicking a button on a UI all the way through to a query executing in a database.
- Mapping technology to business processes: Breaking down silos and visualising technology in context to the business process being supported.
Azure concepts
Azure’s governance structure centers around a Tenant Root, which serves as the foundation where all Subscriptions are linked. All cloud resources, such as virtual machines (VMs), are deployed to a Subscription.
For governance, Microsoft recommends setting up Management groups, which are essentially folders where you can group subscriptions and apply Role-Based Access Controls (RBAC) and policies (guardrails for cloud usage).
To help visualize this, I’ve included a diagram that showcases Azure’s governance structure alongside AWS. This side-by-side comparison will hopefully help any reader with existing AWS experience.
data:image/s3,"s3://crabby-images/47dae/47daed37062e49eeb97fa53e2ed7579c09bf57ff" alt="Azure governance structure"
Also, here’s a quick comparison of the different terminology used between AWS and Azure:
AWS | Azure | Purpose |
Accounts | Subscriptions | Where cloud resources are deployed (contained) |
Organisations | Management groups | Manage accounts at scale (with controls/ groupings); these can be nested as well |
CloudWatch - metrics | Azure monitor - metrics | Built-in solution to collect, store, and query resource metrics |
CloudWatch - logs | Log analytics | Built-in solution to collect, store, and query resource logs |
X-ray | Application insights | Application performance monitoring (APM) |
Cloudtrail | Activity logs | Capture and store audit events in the cloud |
CloudWatch Logs | Diagnostic settings | Enable resource logging |
Now that we have a handy reference table, you might wonder: What data is actually available in Azure?
This depends on the scope (the level) and the Azure Resource deployed, as different resources will have different telemetry options available.
Tips:
- Unlike the AWS Management Console, which splits cloud resources by region, the Azure UI has a global view. The UI displays a column for region, which can be filtered.
Scopes
Not to be lazy, but the official definition from Microsoft docs is:
“In Azure, you can specify a scope at four levels: management group, subscription, resource group, and resource. Scopes are structured in a parent-child relationship. Each level of hierarchy makes the scope more specific”.
data:image/s3,"s3://crabby-images/369b7/369b7eac705cb02d03e65ad17197b709f097b299" alt="management group"
For the purpose of observability, I’m going to focus on:
- Subscription: Where the resources are deployed to.
- Resource: The cloud service used (VM, serverless function, database, etc.).
- Application: The application (code) running on the Azure Resource (2). This isn’t an official Azure Scope but it's vital from an observability perspective.
Building off the first drawing that visualizes the account structure, let’s dive into these three scopes: Subscription, resource, and application.
data:image/s3,"s3://crabby-images/20b3b/20b3bf6705bf89f31d51e18bddaba0af3a32682a" alt="azure-scope"
1 - Subscription
An Azure Subscription’s Activity Log is a platform log that provides insights into subscription level events. These logs include information like when a resource is created, modified or deleted as well as any Azure Service outages. To view the subscription settings, you can browse to the subscription in the Azure Portal, then click the Activity Log tab on the left, then Export Activity Logs:
data:image/s3,"s3://crabby-images/a4b64/a4b64bb95c2b4b48dcdd7f596bbd15a5b8ec90c9" alt="diagnostics"
This will then bring up the Diagnostic Settings for the subscription:
data:image/s3,"s3://crabby-images/2d24e/2d24ec8ca45a14ea1849277c6532a63370fed668" alt="diagnostic settings"
Subscriptions support exporting eight different Log Categories:
- Administrative: Captures administrative operations performed on resources, including create, update, and delete actions.
- Security: Logs related to security events and changes, such as permissions and access attempts.
- ServiceHealth: Provides information about the health of Azure services and any noteworthy events affecting service availability.
- Alert: Contains information about alert rules and trigger events that notifies you about operations and conditions.
- Recommendation: Includes suggestions from Azure Advisor for improving the cost, security, performance, and reliability of your Azure resources.
- Policy: Tracks compliance state changes and activities related to Azure Policy assignments.
- Autoscale: Logs autoscaling events and actions that modify the number of instances to meet performance criteria.
- ResourceHealth: Reports the health status of Azure resources and identifies any ongoing issues affecting their operation.
To learn more about diagnostic settings in Azure Monitor, click here.
Tips:
- Do you need to publish all log categories? No.
- What should you publish? Start by collecting the Administrative, ServiceHealth, Alert, Autoscale, and ResourceHealth logs. Then optionally include other logs based on your individual requirements.
- If you’re enabling this at scale (across multiple subscriptions):
- If the environment follows the Cloud Adoption Framework (CAF) pattern, you can modify the Policies to point to EventHub (New Relic’s integration) rather than a Log Analytics Workspace.
- Otherwise, using infrastructure as code (IaC; for example, Terraform or Bicep) or your subscription vending process.
2 - Resource
Capturing the Activity logs at the Subscription level is just the beginning. With this foundational step in place, we can now turn our attention to the Azure resources that are deployed. There are two things we need to capture at this scope:
Logs (Azure - Diagnostic Settings):
In the diagram above, you’ll see that there’s a box for Azure Kubernetes Service (AKS) as the Azure resources. I chose this because AKS has Diagnostic Settings that can be configured, but it’s important to know that not all Azure services support Diagnostic Settings. So when provisioning an Azure Service, it’s important to check which logs are available.
Q: If I have Azure Metrics and the subscription level diagnostic setup, why do I need the resource level diagnostics?
A: The resource level diagnostics are resource specific, for example Azure Container Apps has metrics for replicaCount and restarts,which are great. But why did my container restart? Was it a failed health check? An out of memory (OOM) restart? To get this level of granularity in managed services, you need to enable diagnostics.
To see if the Azure resources diagnostics are configured, open the Monitoring tab of the specific resource:
data:image/s3,"s3://crabby-images/400db/400dbcfa88e1486d658ca2c60df8f8c64dbb30f8" alt="diagnostic settings"
Or by opening Azure Monitor and expanding the Settings tab.
data:image/s3,"s3://crabby-images/8d8b4/8d8b43a566c2a7a5cea99c2ee56d4755af3e1189" alt="Azure monitoring settings"
Metrics
Azure provides metrics for all cloud resources for 90 days. The metrics available depend on the Azure service being used (such as CPU usage for a VM, vs messages in a queue service). These metrics can be easily ingested into New Relic by using the Azure integration which is set up in the New Relic portal or by using the New Relic Azure Marketplace integration. We’ll cover the differences below in detail.
But essentially:
- If you’ve got an existing New Relic account, it’s easier to just use the Azure integration.
- If you’re looking to set up New Relic for the first time and want to be billed through Azure (which will use the Microsoft Annual Consumption Commitment - MACC), it’s best to evaluate the Azure Marketplace integration.
Tips:
- Don’t enable EVERYTHING; understand what each log category is for. There’s a few instances in Azure where the logs are very verbose or duplicates of each other (such as AKS Kubernetes Audit and Kubernetes Admin Audit Logs).
- A handful of Azure Services have a metric category in the diagnostic settings. If you’ve already got metric polling enabled with New Relic, you could be doubling up on data.
- New Relic keeps dimensional metrics for 13 months!
3 - Application
The final area where we need to focus on capturing data is the actual application running in Azure. There’s numerous ways to instrument an application running in the cloud. But to summarize them:
- Use an extension/plugin: Azure offers extensions for services such as app services (similar to Elastic Beanstalk), Function Apps (for example, Lamba functions), Logic Apps, and Virtual Machine Scale Sets (EC2 AutoScaling Groups). To configure this, you just need to enable the extension on the cloud resource, then set up some variables (such as New Relic license key and other metadata).
- Pre-install it: An example of this could be including the New Relic agent in a Dockerfile (container manifest) and deploying the container image (with New Relic pre-installed) or pre-installing the New Relic infrastructure agent into a VM image. This is useful for Azure services like Container Instances or Linux Functions where you don’t have control of the infrastructure layer.
- Auto-instrumentation: New Relic provides an auto-instrumentation capability for Kubernetes. When a pod is created, it will automatically inject our APM agent and start capturing telemetry.
Although there’s options to pre-install or auto-instrument, to have full control and get the most out of the application, you’ll need to instrument it at the application level. By doing this, you can:
- Publish custom metrics, which could provide business insights or simplify performance / troubleshooting.
- Use logs in context, which maps the applications logs to the APM portal within various pages in the New Relic UI.
- Capture Traces, which provide detailed transaction breakdowns, real-time graphical dependency maps and visualisations of identified errors.
Even if you have the Azure Metrics, Diagnostic Settings you still need this data as it’s critical to creating a complete view of this application.
Diagnostic Settings
We’ve been talking about Diagnostic Settings but we haven’t looked at the configuration options. Diagnostic Settings offer four Destination options to which you can publish (one or more).
data:image/s3,"s3://crabby-images/6f7a7/6f7a70e3ec7182f44a7d5d4b1f2d3ed1abaa3abf" alt="diagnostic settings"
- Log analytics workspace: Microsoft’s logging solution, where the logs can be queried, alerted on, and added to dashboards.
- Storage account: Great for long-term archives or keeping logs for compliance. It’s Azure blob storage service (such as Azure’s S3 bucket equivalent).
- EventHub: This is a cloud-based service designed for high-throughput data streaming, capable of processing millions of events per second with minimal delay. The concept is the same as publishing data to AWS Kinesis Firehose or a Kafka Topic. New Relic leverages this option as part of our log forwarder solution (more details below).
- Partner solution: An example of this is the New Relic Azure marketplace offering.
Tips:
- Microsoft is moving away from diagnostic settings to data collection rules over the next few years.
- When deploying the Azure Log forwarding solution, if there’s Azure policies that deny public networks, you’ll need to use the private network parameters available in the IaC template used to deploy the Azure log forwarder.
How can I get this data into New Relic?
Below is a list of how to publish Metrics and Diagnostic Settings (logs) into New Relic. It doesn’t include guidance for collecting the application level telemetry data. This is briefly touched on above and will be the topic of our next blog post.
Option 1: New Relic Azure integration (metrics)
New Relic has an Azure integration that pulls metrics from Azure. When setting this up you need to be aware of the legacy options that are displayed when setting up the integration. To reduce complexity, we introduced a new option called: Azure Monitor metrics. This approach allows customers to select which metrics are collected directly from the Azure Monitor API.
data:image/s3,"s3://crabby-images/24870/2487026b7a614847c0c7f80707aac5dedf6c99ce" alt="Azure integration settings"
The Azure Monitor option also has an:
- Improved polling frequency, which means a faster time to glass
- Additional metadata mapping (such as cloud tags)
- Up-to-date metrics available in Azure
- Support for all Azure resources with metrics
Tips:
- Use the Azure Monitor Metrics integration.
- Don’t use a single Azure Service Principle when setting up the integration for multiple subscriptions. Depending on the size of your subscription, we recommend one per subscription. Why? Azure has API polling limits for each service principle.
- Always set a reminder for when the service principles client secret will expire.
- Be careful with using the Azure Monitor integration and the legacy options, as it could capture duplicate metrics.
- Consider using tags in Azure to exclude which metrics are ingested into New Relic (if this is a consideration). An example of this could be tagging resources with Newrelic: true to collect metrics for all cloud resources with this tag.
To learn more about the New Relic Azure monitor integration, check out the New Relic documentation.
Option 2: Azure Log Forwarder (logs)
Similarly to the AWS Metric stream solution that leverages Amazon Data Kinesis Firehose to publish data to a New Relic endpoint, New Relic offers an infrastructure as code template that can provision Azure resources to publish logs sent to it into New Relic.
Each Azure diagnostic setting can then point to the event hub instance that was provisioned. When the log is published, it triggers an Azure function which publishes the log to the New Relic API.
data:image/s3,"s3://crabby-images/e48f7/e48f7b468ba7310e08e534185538f1874a9b8cd9" alt="Azure diagnostic setting"
The code is available on GitHub. You can find New Relic official documentation here.
Tips:
- New Relic supports drop-rules to remove logs before ingestion and obfuscate rules to mask any sensitive data from being ingested.
- In enterprise organisations Azure policies are used to control how resources are deployed. As a result, the default parameters in our infrastructure as code templates may not work. There is an option for private networking, or you could set up an exemption to the policies for this integration (for example, Scope=ResourceGroup).
Option 3: Azure Native integration - Marketplace (logs and metrics)
The Azure Native New Relic service can be provisioned through the Azure Marketplace. It allows customers to automatically collect Azure metrics, subscription logs and resource diagnostic logs. It creates an easy-to-understand experience to install and uninstall the New Relic agent on VMs.
Once the integration has been installed from Marketplace, it will generate a list of monitored resources:
data:image/s3,"s3://crabby-images/8fca5/8fca570dbd51b4b4e55a41ae22a318ac5a2fb866" alt="marketplace integration"
The metrics column shows the collection of metrics from Azure Monitor. These are the same metrics that can be ingested using the Azure Monitor integration from the New Relic portal.
However, the logs column requires a deeper dive. The Logs to New Relic column indicates whether Azure diagnostic settings have been configured to export data to the New Relic partner solution. This option, however, is not available for all Azure services.
data:image/s3,"s3://crabby-images/2fbc8/2fbc8dd50abb2279eabbef5be6261e865a79fca7" alt="logs to New Relic"
Summary
If you have an existing New Relic account and want to keep the existing billing constructs:
- Get the logs: Set up the Azure log forwarder and publish the diagnostic settings for your services. Use the New Relic log forwarder.
- Get the metrics: Use the Azure integration.
Alternatively, for new customers or those wanting to transact from their Azure spend, leverage the New Relic Azure Marketplace integration for logging and monitoring.
data:image/s3,"s3://crabby-images/fc438/fc438fdf5d39212062a8fce6d28ea9862a355685" alt="Azure marketplace integration"
Once the platform setup is complete, customers can then focus on instrumenting their applications using New Relic APM, infrastructure, or browser to instrument the applications or code running in Azure. For more information, consult the instrumentation guide.
Next in the series
Foundations lay the groundwork for robust and scalable cloud management, and they’re critical for efficient monitoring and analysis.
Coming up next, we’ll build upon the basics and dive deeper into how to implement New Relic for a variety of different Azure services. Stay tuned!
Additional Resources
本ブログに掲載されている見解は著者に所属するものであり、必ずしも New Relic 株式会社の公式見解であるわけではありません。また、本ブログには、外部サイトにアクセスするリンクが含まれる場合があります。それらリンク先の内容について、New Relic がいかなる保証も提供することはありません。