How to Extend Observability and Bring Value to DevOps and Customers

8 min read
an image of multicolored computer code in orange and light blue, on a black background

Cognitran plays a unique role in the automotive industry. We develop software as a service (SaaS) that is used by leading automotive manufacturers to author, manage, and deliver diagnostic, repair, and maintenance information for technicians. In fact, manufacturers cannot hand over a vehicle without a platform like ours in place. They need to perform a pre-delivery inspection to check that everything on the vehicle from front to back is safe and ready to go to market. Our solution gives them that ability with everything from a simple checklist to the specific process and protocols for programming the vehicle so it is properly prepared to operate.

Cognitran has two main SaaS offerings: Blaise for authoring and content management, and ITIS, our online content delivery platform for technicians. Both are hosted entirely in AWS where we make extensive use of microservices using Dockerized containers orchestrated with Terraform. We have customers around the world—including many well-recognized brands such as Ford, Aston Martin, Jaguar, Land Rover, Harley-Davidson, Triumph, Kawasaki, and JCB—and they rely on our software platform to be available 24/7/365. Our environment is highly complex, and end-to-end observability is always a challenge but essential to deliver a great digital customer experience. That’s where New Relic comes in.

With New Relic, which we’ve been using since 2013, we get the comprehensive observability we need to help the ops team detect and understand incidents, and enable developers to better understand performance impacts and how the systems are being used so they can make more informed design changes. We have also taken observability another step further by enriching New Relic data with custom attributes that return valuable business intelligence, which enables us to understand customer usage across different markets. In both cases we have learned some valuable lessons, which I’ll explain.

Getting the most from New Relic observability

One of our main use cases for New Relic is observability. It allows us to pull in information from across all our systems to get a real-time, end-to-end picture of a customer’s experience, regardless of which systems they touch when using either Blaise or ITIS. For example, when someone authors information in Blaise, it typically gets reviewed and goes out for translation, then gets published and indexed. New Relic allows us to follow every step to see how long it takes to pass through these different systems and proactively identify if there are any issues along the way that we need to address. 

We also use information collected by New Relic to baseline common behavior and identify anomalies. The alerting feature is especially important as it monitors our systems 24/7 and lets us know immediately if a process exceeds set thresholds. Alerts are integrated into Slack communication channels so we can route them to the best team to address the issue. Our success with this approach is detailed in the FutureStack 2021 session How to Turn an Out-of-Hours Alert Into a Customer-Facing Root Cause Analysis delivered by Cognitran Consultant Anthony Pounds-Cornish.

From our experience, we recommend companies pull data into New Relic from as many different sources as possible. Traditionally, we used tools like Glowroot, Graylog, and Prometheus, but it was difficult and time-consuming to tie together information from multiple disparate systems to get a complete picture of system performance. The key with New Relic is that it draws information from any source into one place. That way, if you’re trying to find the root cause of a problem, every possible signal is accounted for right there in front of you. It’s much easier to see, for example, which transaction was executed at the moment the system logged a high spike in memory or CPU usage. As a result, you can determine the root cause of the problem much quicker.

Another best practice we have adopted at Cognitran is using the API in New Relic to deliver markers when releasing code updates. By having that point in time recorded in the system, we can observe operations before and after the release and trace any changes back to that point in time. Instead of waiting for feedback from the end-user, positive or negative, the ops team can observe right away how a deployment is impacting key performance or customer experience metrics and adjust resources or code as needed.

Adding value with business intelligence

We have also taken the observability capabilities within New Relic to another level by attaching additional useful information to the New Relic agent information collected on each event. It might be information specific to one of our customers or relevant to a particular market segment. We've used a variety of different dashboards to find out all sorts of interesting insights. For example, we can determine the most popular day of the week for people to look at our subscription offers or how often that interest in subscriptions results in a purchase.

One example of how we use business intelligence from New Relic is the annual Harley-Davidson dealer show. The show usually drives a lot of people to our ITIS system, which leads to higher request counts and the requirement for much more resourcing. With New Relic, we’ve been able to compare that demand year-over-year so we can anticipate the number of production nodes needed to handle traffic from the next show.

Additionally, because we integrated New Relic with the Blaise authoring system, we can take the information on all the new models Harley-Davidson are introducing that year and project the number of additional service manuals or owner manuals they will need. From there, we can gauge how much additional infrastructure we will need to put in place.

We also use New Relic to collect information from a search dashboard we built. We use that intelligence to help customers find documents that are relevant to them. For example, if we observe a lot of customers using the same search term but not getting the results they want, we can feed that information back to the authoring team to change the terminology so it matches what our customers typically use. The goal is to help people get the right information they need more quickly.

We’ve also used intelligence from the search dashboard to compare how long searches take to return results before and after changes to the infrastructure. To improve query response times, we focus on optimization of Elasticsearch indexes via configuration or cluster alterations. For relational database indexes, we no longer have the need for a SQL analysis tool like “top.” On the dashboard, we can watch the slow queries move from one side of the bar chart to the other, closer to zero, and see that we're actually making a positive change. 

Leveraging intelligence to inform DevOps decisions

Observability means that at any point in time our team can know precisely what is going on across all systems and markets collectively. For development, that means being able to spot trends that indicate potential future issues that can be addressed by proactively making changes in the software and deploying an update, thus avoiding impact on applications and customer services. 

For example, we can see the growth of different types of published information. If a particular customer is starting to use more video content, that will have an impact on bandwidth more than if they were just adding more text-based documentation. This kind of insight then drives development to solve potential issues associated with that trend.

Customers also benefit because we give them direct access to the same intelligence we have internally, which enables them to adjust their processes to improve efficiency without needing support from Cognitran. One example is related to translation. Customers authoring a document will typically have it translated into as many as 42 languages. However, if they have the visibility to determine that their users only access a handful of translations, they can confidently deprioritize the others.

Customers also appreciate the transparency we provide. By sharing data with our customers, they can see for themselves in real time that a service is running slowly and why it's happening, which builds trust in the information Cognitran provides because it’s backed by hard data.

Taking observability to a whole new level

At Cognitran, we are taking the end-to-end observability enabled by New Relic to a whole new level. Using business intelligence based on system trends helps us respond more proactively not only to system issues that could impact the customer experience, but also to market trends that affect strategic business decisions. Customers are also using insights from New Relic to make informed decisions on how they approach their entire aftersales business as well as how they roll out information regarding vehicle fixes. For both Cognitran and our customers, New Relic has become a resource for continually improving our products and strengthening our market position in the automotive industry.