In our previous blog on how to monitor Microsoft 365 (M365), we delved into service overviews and the critical importance of synthetic user login monitoring. In this blog, we set our sights on a core component that forms the backbone of secure identity and access management: Active Directory Federation Services (AD FS).

As organizations increasingly migrate their operations to the cloud, ensuring the robustness of identity and authentication mechanisms becomes paramount. AD FS plays a pivotal role in this landscape, acting as the linchpin for seamless and secure single sign-on (SSO) experiences within the M365 ecosystem.

In this installment, we aim to equip you with the knowledge and tools needed to ensure the reliability and security of your AD FS implementation: navigate the terrain of certificate validations and metadata exchange documents, and unravel key elements that warrant vigilant oversight in your M365 environment Let's dive into the world of AD FS and uncover the essentials of effective monitoring.

Validate AD FS certificates

Validating AD FS certificates is a crucial aspect of maintaining a secure and reliable authentication infrastructure within M365. Certificates serve as cryptographic keys that facilitate secure communication between different components of the AD FS environment. Regular validation ensures that these certificates are not only genuine but also up to date, reducing the risk of unauthorized access or security breaches. An expired or compromised certificate can lead to service disruptions, hindering the seamless flow of authentication requests. By enforcing rigorous certificate validation practices, organizations can fortify their AD FS implementation, enhance overall security, and provide users with a consistent and trustworthy SSO experience.

Follow the steps below to get started with monitoring your AD FS certificates.

AD FS monitoring is implemented as an on-host integration for the New Relic infrastructure agent. All of the configuration and necessary scripts are provided in a dedicated GitHub repository.

This integration will typically run on the same server that hosts the AD FS role. In order to get the integration deployed, follow these steps:

  1. Install the New Relic infrastructure agent (if you need assistance, follow the guided install).

  2. Copy the configuration file adfs-cert.yml and the PowerShell script GetExpiringCertificates.ps1 into the agent’s integration folder. These are the default locations:

  • Linux: /etc/newrelic-infra/integrations.d/
  • Windows: C:\Program Files\New Relic\newrelic-infra\integrations.d
  1. Restart the infrastructure agent service.

The configuration file is straightforward and looks like this:

integrations:
- name: nri-flex
   interval: 24h
   config:
     name: M365AdfsCertificate
     apis:
       - event_type: M365AdfsCertificate
         shell: powershell
         timeout: 299000
         commands:
           - run: '& "C:/Program Files/New Relic/newrelic-infra/integrations.d/GetExpiringCertificates.ps1"'

Line 7 in the above configuration defines the name of the event where the data from this integration is stored in New Relic. We instruct the infrastructure agent to leverage the Flex integration (line 2) to leverage a PowerShell shell (line 8) in order to call the script defined in the run command (line 11).

The actual PowerShell script looks like this:

$expiring_certs = Get-ChildItem -Path cert: -Recurse -ExpiringInDays 365  | Select-Object Issuer, NotBefore, NotAfter, Subject, FriendlyName, SerialNumber, Thumbprint

# Build an empty array to add our results to
$results = @()

$StartDate = (Get-Date)

foreach ($item in $expiring_certs) {

 $ts = New-TimeSpan -Start $StartDate -End $item.NotAfter
 $tsDaysReverse = $ts.Days * -1

  # Build a custom object to pass into the results
 $cert = [ PSCustomObject ]@{
   certSubject               = $item.Subject
   certIssuer                = $item.Issuer
   certSerialNumber          = $item.SerialNumber
   certNotBefore             = ( [DateTimeOffset ]$item.NotBefore ).ToUnixTimeSeconds()
   certNotAfter              = ( [DateTimeOffset ]$item.NotAfter ).ToUnixTimeSeconds()
   certThumbprint            = $item.Thumbprint
   certExpiringIn            = $ts
   certExpiringInReverseDays = $tsDaysReverse
   certExpirationDate        = $item.NotAfter | Get-Date -Uformat %s
   certFriendlyName          = $item.FriendlyName
 }

 $results += $cert
}

$results | ConvertTo-Json

The first thing the script does is leverage the Get-ChildItem module to get all certificates that are expiring in the next 365 days. Next, we construct an empty array that will be returned at the end. In a loop, we create new objects for each certificate found and add it including all the details to the results array. The final array will be converted into JSON and returned as output of the script.

In the New Relic UI, we use the entity explorer to look at all the raw data that’s being collected.

We can also build a custom dashboard to visualize the data in a meaningful way.

Although we can refer to the data and dashboard this isn’t something that I want to manually check from time to time. Ideally, I want to get an alert notification if, for example, there’s a certificate about to expire in the next 30 days. This would probably give me enough time to renew a certificate or create a new one. With New Relic, this can easily be done by setting up an alert condition using a New Relic Query Language (NRQL) query.

SELECT min(certExpiringIn.Days) as 'Cert expiring' from M365AdfsCertificate facet certSubject

In the threshold configuration, I can specify to trigger an incident whenever that query returns a value below 30.

Availability of the metadata exchange document

Ensuring the availability of the metadata exchange document is paramount for maintaining a resilient AD FS infrastructure within the M365 environment. The metadata exchange document contains critical information about AD FS endpoints, certificates, and other key metadata necessary for secure communication and authentication. Regularly checking its availability is essential to guarantee that this vital information is readily accessible to federation partners and other components in the ecosystem. An unavailable metadata exchange document can disrupt the federation process, leading to authentication failures and potential service outages. Proactively monitoring its availability allows organizations to identify and address issues promptly, ensuring the uninterrupted flow of authentication data and contributing to a robust and reliable M365 experience for users.

Follow the steps below to get started with monitoring your metadata exchange document.

AD FS monitoring is implemented as an on-host integration for the New Relic infrastructure agent. This integration will typically run on the same server that hosts the AD FS role. To deploy this integration, follow these steps:

  1. Install the New Relic infrastructure agent (if you need assistance, follow the guided install). Note: If you already followed the steps described above on validating certificates, you can skip this step and start with step 2.

  2. Copy the configuration file adfs-metadata-xml.yml and the PowerShell script GetMetadataXML.ps1 into the agent’s integration folder. These are the default locations:

  • Linux: /etc/newrelic-infra/integrations.d/
  • Windows: C:\Program Files\New Relic\newrelic-infra\integrations.d
  1. Restart the infrastructure agent service

Again, the configuration file is straightforward and looks like this:

integrations:
- name: nri-flex
   interval: 60m
   config:
     name: M365AdfsMetadata
     apis:
       - event_type: M365AdfsMetadata
         shell: powershell
         timeout: 299000
         commands:
           - run: '& "C:/Program Files/New Relic/newrelic-infra/integrations.d/GetMetadataXML.ps1"'

Line 7 in the above configuration defines the name of the event where the data from this integration is stored in New Relic. We instruct the infrastructure agent to leverage the Flex integration (line 2) to leverage a PowerShell shell (line 8) in order to call the script defined in the run command (line 11).

The PowerShell script looks like this:

add-type @"
   using System.Net;
   using System.Security.Cryptography.X509Certificates;
   public class TrustAllCertsPolicy : ICertificatePolicy {
       public bool CheckValidationResult(
           ServicePoint srvPoint, X509Certificate certificate,
           WebRequest request, int certificateProblem) {
           return true;
       }
   }
"@
[System.Net.ServicePointManager]::CertificatePolicy = New-Object TrustAllCertsPolicy

$metadataUrl = "https://localhost/FederationMetadata/2007-06/FederationMetadata.xml"
$result = Invoke-WebRequest $metadataUrl -UseBasicParsing

# Build a custom object to pass into the results

$jsonResult = [ PSCustomObject ]@{

   metadataXMLURL    = $metadataUrl
   statusCode        = $result.StatusCode
   statusDescription = $result.StatusDescription
   rawContentLength  = $result.RawContentLength

}

$jsonResult | ConvertTo-Json

In line 14 we define the URL of the metadata exchange document which we then pass into the Invoke-WebRequest function (line 15). Next, we analyze the result and create an object with some details about the metadata exchange document, including the results from the web request; that is, whether or not the request was successful.

In the New Relic UI, we use the entity explorer to look at all the raw data that’s being collected.

We can also build a custom dashboard to visualize the data in a meaningful way.

As we’ve seen in the previous example with expiring certificates, I want to take the proactive route and have New Relic alert me whenever a metadata exchange document is no longer available. With New Relic, this can easily be done by setting up an alert condition using an NRQL query.

SELECT latest(statusCode) FROM M365AdfsMetadata where metadataXMLURL is NOT NULL facet metadataXMLURL

The above query returns the latest status code for each of the metadata exchange documents that I’m monitoring. If for any of these paths a status code of 400 or above occurs, I want to get an incident triggered.

Conclusion

Now that we've unraveled the intricacies of AD FS monitoring, it's time to empower your organization with a robust solution. By following the steps outlined in this blog, you can enhance your monitoring capabilities and fortify your Microsoft 365 environment.