4 Ways to Manage PII in Your Log Pipeline

Thanks to the European Union’s General Data Protection Regulation (GDPR) and more recent privacy laws in various jurisdictions, many of us are familiar with personally identifiable information (PII)—any data that can be used to identify a specific individual, like Social Security numbers, mailing or email addresses, and phone numbers. But the proliferation of SaaS platforms and cloud computing, in general, has expanded the scope of PII considerably. From online banking to social media to online shopping, many using location-based services as middleware, every digital business needs to account for managing PII in some way.

Many New Relic Log Management customers want to know what they can do to keep PII out of the log stream. This post outlines four approaches that every log management user should be aware of since there are often legal and contractual reasons to keep PII out of the log stream that impact how and when you must remove the data.

Understanding the log pipeline

Thinking of a logging system as a pipeline, it makes sense to know all of the points at which you can make a decision and take an action to prevent PII from entering it. Consider the pipeline starting in the customer’s data center (or domain of control) and moving through one or more forwarder/processor layers, which can also be in that customer’s domain of control. After that, the data moves into New Relic’s platform.

PII management cycle along the log pipeline diagram

The following sections cover specific domains within the log pipeline where you can take practical and precise action to control the transmission, persistence, and access of data containing PII.

Application-level governance

Internal company development practices can ensure that certain information does not ever get emitted as logs, or if it does, does not enter the centralized logging pipeline stream. One standard is the Open Web Application Security Project, or OWASP, whose logging guidelines explicitly call out things that shouldn’t be written directly as application logs:

Session identification values (consider replacing with a hashed value if needed to track session-specific events)
Access tokens
Application source code
Sensitive personal data and some forms of PII like health information, government identifiers, vulnerabilities
Authentication passwords
Database connection strings
Encryption keys and other master secrets
Bank account or payment cardholder data
Data of a higher security classification than the logging system is allowed to store
Commercially sensitive information
Information that is illegal to collect in the relevant jurisdictions
Information a user has opted out of or not consented to collect because that consent was either never given or has expired

Adopting standards like this, even if you need to make some exceptions, is the first line of managing PII in your system architecture.

You can also prevent sensitive data from being logged in verbose log messages by filtering them using the log4j utility for logging. This approach helps prevent sensitive information, such as CVV2 codes, from being logged at the source. This code example shows how we can extend the PatternLayout class to mask out credit-card-like numbers. More fully developed examples are available on the web, but this brief one will suffice for demonstration.

public class CardNumberFilteringLayout extends PatternLayout {

   private static final String MASK = "$1++++++++++++";

   private static final Pattern PATTERN = Pattern.compile("([0-9]{4})([0-9]{9,15})");



   @Override

   public String format(LoggingEvent event) {

       if (event.getMessage() instanceof String) {

           String message = event.getRenderedMessage();

           Matcher matcher = PATTERN.matcher(message);



           if (matcher.find()) {

               String maskedMessage = matcher.replaceAll(MASK);

               @SuppressWarnings({ "ThrowableResultOfMethodCallIgnored" })

               Throwable throwable = event.getThrowableInformation() != null ?

                       event.getThrowableInformation().getThrowable() : null;

               LoggingEvent maskedEvent = new LoggingEvent(event.fqnOfCategoryClass,

                       Logger.getLogger(event.getLoggerName()), event.timeStamp,

                       event.getLevel(), maskedMessage, throwable);



               return super.format(maskedEvent);



            }



        }



        return super.format(event);



  }



}

Forwarding layer governance

Forwarding layers are typically decoupled from both the application layer and from the log storage backend (in our case, New Relic) and have the primary job of moving logs out of the customer’s security domain and into the SaaS provider’s (for example, New Relic’s) security domain. Many organizations implement a set of filters and transformations in that layer that can be used to enforce monitoring and security standards. For instance, if unwanted PII is present, you can apply drop or obfuscation rules at this layer when configuring forwarders (for example, in a fluentd or Logstash config file).

It’s simple to configure a rule in your forwarder that drops the whole log record using the filter_grep plugin:

<filter **>

  @type grep

  <exclude>

    key message

    pattern /credit_card/

  </exclude>

</filter>

A more useful approach may be to simply obfuscate a certain set of fields as they are being forwarded, using the record transformer operator.

<filter **>

@type record_transformer

<record>

credit_card_number REDACTED

</record>

</filter>

This example will take any field named credit_card_number and replace it with the text “REDACTED”. It should be noted that organizations should be managing these filters and transformations in a structured way, as code, in a configuration management system that is subject to code review and development standards similar to those mentioned in the first section on application-level governance.

Pre-persistence SaaS backend governance

In this case, the New Relic backend provides a couple of different facilities for removing PII from log data. In fact, you can:

Drop rules: Drop a complete log record or selected fields from it based on a drop filter rule. These drop filter rules allow you to drop either the entire log record or just an offending field. In this example, we suspect there is a field named credit_card_number that is getting into some of our logs, so we set up a drop filter to remove it.

"Create a new drop filter" example screenshot

You can also use a drop rule with a filter. For example, when your message field contains more data that you would rather drop than keep, you can set up a grok parse pattern to extract what you care about into separate fields and, in parallel, a drop rule that removes the original message field.

2. Automatic obfuscation rules: You can also automatically obfuscate certain fields that match common patterns for things like credit card numbers and Social Security numbers.

The log management service automatically masks patterns of numbers appearing as credit cards or Social Security numbers. It replaces all integers, including spaces and hyphens that may be used as delimiters, with a string of Xs. Numbers that appear to be a credit card (13 to 16 digits) are obfuscated as XXXXXXXXXXXXXXXX. For example:

Numbers with hyphens, such as 4111-1111-1111-1111
Numbers with spaces, such as 4111 1111 1111 1111
Numbers with 13 (Visa), 14 (Diner's Club), 15 (American Express, JCB), or 16 digits (Visa, Mastercard, Discover, JCB), such as 4111111111111111

Nine-digit numbers with hyphens that appear to be Social Security numbers, like 123-45-6789, are obfuscated as XXXXXXXXX. Nine-digit numbers with spaces, such as 123 45 6789 or hyphens in a different pattern, such as 12-345-67-89, are not automatically obfuscated.

If you need to opt out of automatic obfuscation, learn more about the obfuscation rules.

3. Custom obfuscation rules: In April 2022, New Relic released an obfuscation UI. This powerful feature allows you to hash or mask sensitive data in your logs. After logs have been shipped to New Relic, you can obfuscate any sensitive information in the logs before logs are stored in the NRDB database by creating a custom obfuscation rule. Obfuscation rules are managed by detailed obfuscation expressions. For example, simply add a new expression with a useful name like Credit Card Number, as shown in this screenshot:

New Relic Create obfuscation expression screenshot

In addition to the UI, you can also create these rules in NerdGraph using the logConfigurationsCreateObfuscationExpression mutator under logConfigurations.

4. Alerting conditions: When you want to keep the log for audit purposes, you can create an alert condition to warn you whenever a log contains secure information. This assumes such events of PII seepage are relatively rare in our environment. Below is an example of creating a New Relic alert policy. In this example, let’s name this alert Privacy Policy.

"create alert policy" example screenshot

Next, we’ll add a custom NRQL alert condition that looks for logs where there is a non-NULL field named credit_card present, AND that field appears unredacted (e.g., does not contain Xs).

"Add a NRQL condition" example screenshot

Finally, we will associate the new alert policy with a notification channel to let our on-call staff know about it.

Post-persistence SaaS backend governance

New Relic allows you to manage telemetry data into a master and sub-account structure. For example, you can give a highly privileged user access to all data in a parent account and related sub-accounts while giving another user access to one or more sub-accounts based on their role and security clearance. New Relic’s account access rules will guard privileged data as configured by your account owner. This solution’s upstream effect is that you will need to route data into relevant accounts in the forwarding layer of the log stream, before delivery to New Relic.

The New Relic master/sub-account hierarchy looks something like the illustration below and includes easy-to-use tools for setting up these relationships in your account. You can route content based on your internal company access policies to appropriate sub-accounts. Users with access to one or more sub-accounts will be able to view data in those accounts, but will not be able to view content on accounts to which they have not been added. Typically, most privileged users are given master account access and can view all data within the hierarchy.

master account hierarchy example

Next steps

So there you have it—several approaches for managing PII in your log data. Now that you have these techniques, sign up for a free New Relic account to put them into action.

By Jim Hagan

Jim Hagan is a Boston-based Enterprise Solution Architect with New Relic. He has 20 years of experience as a software engineer, with expertise in geospatial technology and time series analytics. Before joining New Relic, he worked on highly distributed logging and metrics platforms at Wayfair.

The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.

780+ integrations to start monitoring your stack for free.

See All Integrations

In this article

4 ways to manage PII in your log pipeline