Join your telemetry data with subquery JOINs

Subquery JOINs, lookups, and GeoIP allow you to resolve business-critical incidents with your data in one simple query. In this blog, we'll take a deep dive on subquery JOINs, including joining related datasets, making calculations across datasets, and tips on writing great subquery JOIN statements. But first, a brief explanation of why this powerful query capability is so important to helping you better understand your data.

Your telemetry data is vast, so knowing how to query your data effectively is crucial to answering the questions you have about the performance of your systems. Often, combining various data can illuminate correlations and trends, giving you a better understanding of the health and performance of your apps and systems. One way to combine data is using subqueries, which allow you to use the result of one query as the basis of another. Subqueries are limited, however, because subqueries can only return a single result or list of results.

Which brings us to today’s topic: Subquery JOINs, a more powerful solution to solve these needs. With subquery JOINs, you can join two result tables together, allowing for analysis and enrichment across datasets. In this blog you’ll learn just how easy it is to do. Check out the video or continue reading below to learn more.

Joining related datasets

Whether you want to combine application performance monitoring (APM) event data and metrics, your logs and infrastructure, your synthetics and NrAuditEvent, or your custom events, subquery JOINs can help. Consider this next example of joining infrastructure data with transaction data:

FROM ProcessSample SELECT average(cpuPercent) FACET containerId
FROM Transaction SELECT average(duration) FACET containerId

In this example there are two datasets: average CPU percentage of infrastructure by container (ProcessSample) and average duration of the transaction on the app by container. Often, data from different sources is correlated. In this case you can determine if a container's higher CPU usage is causing slower transaction by using the following subquery JOIN:

FROM Transaction
  JOIN (FROM ProcessSample SELECT average(cpuPercent) AS cpu 
  FACET containerId LIMIT MAX) ON containerId
SELECT average(duration)/latest(cpu) FACET containerId, containerName

This is amazing! With this subquery JOIN, you can see the containers that have a higher transaction duration (on average) compared with their average CPU percentage (on average). With this data you can then research the topmost containers to ascertain if there's a bug to fix or optimizations to be made.

Next let’s take a deeper look at the syntax.

How to write a subquery JOIN

Below, you’ll learn how to get started with the syntax. To begin with, understand that a subquery JOIN requires three components:

Dataset 1
Dataset 2
Primary key that links the two datasets together

FROM Event [INNER|LEFT] JOIN (subquery) ON [key =] key SELECT ...

There are a few simple rules to the structure:

The JOIN clause must always follow immediately after the FROM clause.
- The JOIN can be prefixed with the join type. INNER or LEFT is optional, and defaults to INNER when omitted.
Parentheses containing a subquery must immediately follow JOIN.
The ON clause must immediately follow the subquery and has two forms (more details below).

Read the subquery JOINs documentation to learn the nuances of the syntax.

Enriching your data with subquery JOINs

Let's look at an example where subquery JOINs help enrich your data. In the next example, there's APM data in two places: PageView provides information on a page visited by an end user, while PageAction provides the actions taken on the page. These two events both have session IDs that identify the session of an end user, but some data is contained in one event and not the other.

In this example, city is an attribute in PageView but not PageAction, whereas currentUrl is an attribute in PageAction but not PageView.

FROM PageView SELECT count(*) FACET session, city
FROM PageAction SELECT count(*) FACET session, currentUrl

With subquery JOINs, you can enrich the PageAction dataset with the missing data! By combining the data using the session ID, you can see not only the URL with the most clicks, but also where these actions came from:

FROM PageAction 
  JOIN (FROM PageView SELECT count(*) FACET session, city
  LIMIT MAX) ON session
SELECT count(*) FACET city, currentUrl

Tip: Make sure to include LIMIT MAX in the inner subquery to maximize the number of results joined to the outer query. The current max result limit is 5,000 rows.

Calculations across datasets

In this last example, you’ll learn how to parse two log datasets to match one another, then run calculations to find the error percentage of the logs.

Logs can often have data buried within the log message. Using enhanced string parsing functions like aparse(), you can extract key values. In this case, the primary key application ID is within the log message of both Log and Log_Error:

WITH numeric(aparse(message, '%appId: * %')) AS app_Id
FROM Log SELECT count(*) FACET app_name, app_Id SINCE 1 day ago

FROM Log_Error SELECT count(*) AS errCnt 
FACET numeric(aparse(message,'%applicationId: * %')) AS application_Id 
SINCE 1 day ago

To find the error percentage by application, a LEFT JOIN is required. This is because not all of the applications have errors, and an INNER JOIN would exclude these apps.

WITH numeric(aparse(message, '%applicationId: * %')) AS application_Id
FROM Log 
  LEFT JOIN (FROM Log_Error SELECT count(*) AS errCnt 
     FACET numeric(aparse(message, '%appId: * %')) AS app_Id 
     SINCE 1 day ago LIMIT MAX) ON application_Id=app_Id
SELECT (latest(errorCount) OR 0)/count(*) AS errorPercentage 
FACET app_name, application_Id SINCE 1 day ago

Even though there was data stored in two different log partitions and the application ID was hidden in the log messages, you were able to parse out the data you needed and calculate the error percentage. Using latest() was also helpful to ensure the aggregations of the inner query were calculated correctly with the outer query.

JOINs: Limitations

Subquery JOINs have a limit of 5,000 results.
Alerting is not supported. JOIN queries have the same limitations as standard subqueries, due to architectural limitations of streaming in our system today.
SELECT * is not supported in the inner subquery of a subquery JOIN.
Queries that generate very high-cardinality relationships are currently restricted to a maximum of 1:100, which means a single primary key cannot map to more than 100 rows of the joined query.

Tips for using subquery JOINs

Try not to alias a column within a subquery using an existing column in the outer query, as this can cause errors.
Make sure the primary keys match their attribute type (for example, string, numeric, etc.). Mismatches in attribute types cause the query to return no results. For example, ON account=account_number will not work if account is a string and account_number is a number.
Adding LIMIT MAX to the inner query will maximize the number of results, and therefore maximize the number of JOINs.

When joining against an aggregated dataset, be sure to use latest(), which will preserve the aggregated value in the outer query.

Conclusion

With subquery JOINs, combining two datasets is easy. Best of all, your team can analyze the combined data to gain insights about the health and performance of your application that otherwise may not be obvious or easy to correlate. There are too many possibilities to count how data can be joined, but what’s certain is that you can do it with subquery JOINs.