When Simply Business started, we looked at best practices in tech and software development to incorporate them into the way we operate. But best practices on paper don’t always translate into effective cultural and organization-wide processes for a scale-up.
As we shifted from an Elasticsearch, Logstash, and Kibana (ELK) stack to using New Relic log management across all software engineering teams, we learned three key lessons.
U.K. research shows that one in four people rate insurance as the worst industry for allowing customers to do things digitally. At Simply Business, we believe that insurance should be straightforward, easy to organize, and instant. We make it easy for customers—particularly small companies and sole traders in the UK, Europe, US, and Canada—to purchase insurance instantly and digitally, without having to jump on a call or bring in paperwork.
1. Create pre-filtered saved views
The ELK stack used to be seen as a best-in-breed tech stack for observability. But as we started working closely with other teams, like sales and customer service, we found that it didn’t meet all of our needs. An ELK stack is laden with features, but most go unused even by the bulk of software engineers. It is also incredibly complex for non-technical teams to manage.
In the ELK stack, the data schema takes precedence. You can’t have field name collisions. You can’t write a particular attribute because it has a reserved name. You’ll find that this quickly becomes a problem—someone logs date time as ISO 86001, then as a UNIX timestamp. In an ELK Stack, this means you can't query those logs anymore. That’s just one challenge in enabling teams to use logging capabilities.
Most users need a glorified search box to filter logs to answer key questions around data ingestion or query latency. In New Relic log management, the query syntax is intuitive: it looks like Gmail label syntax. Those who aren’t used to SQL can still make use of it. We have our own custom attributes shared across all applications complete with tags and business-specific labels. From a maintenance perspective, we're not using data partitions or anything fancy. We can query the entire sum of all our logs and just use filters, and it's still fast enough for most of what we need.
There’s just less maintenance with no need to maintain schemas and no need to manage instances—even on a hosted ELK solution.
2. Use tagging to make accessible dashboards
Our developers work across the company in collaboration with business and customer-facing teams. We send dashboards to a lot of people.
We create dashboards for our consultants that show latency and how long the average person has been waiting on a call. These dashboards can be adapted for business teams and sales managers. Then stakeholders can run with it. They don’t need to know a query language and can filter through the data without support. This is key. Our dashboards can be used across the organization to keep everyone on the same page about our customer experience goals.
Logs in context has been super useful for the teams that have adopted it, becoming an integral part of the workflow. We debug things by going from alerts to logs. An alert tells us something is wrong with a time-bound query for your application logs. We also generate that in our query language because we know what the application is called, what the branch is called, and when the error was thrown. We then pre-populate that query.
3. Avoid best-in-breed when it breeds complexity
We initially thought that offering best-in-breed observability tools in each category would benefit our software developers uniformly. We quickly realized that engineer productivity was impacted, and mental strain increased with every new tool we added. Gaining the competency to use one tool effectively is difficult enough, when three or four are added to a stack it becomes almost impossible. Developers stick to the software they already know, it’s human nature, even if there are better tools out there.
Money was another big concern with so many tools in use. We were on track to spend a million dollars a year on observatory tools alone. For the size of our company, using that much of our budget for observability isn’t sustainable—and we weren’t getting the return on investment we needed to justify it.
We’ve cut down on our tech stack to primarily use New Relic as our single observability tool. This allows us to have all our telemetry data, including logs, traces and metrics in a single place for the first time, and enables us to monitor, debug, and improve our entire stack. Focusing on one tool has saved us money as well as mental energy. All of the data our developers need is accessible directly from New Relic. Developers don’t have to waste time hunting down two-factor authentication codes to log into multiple systems. They also don’t need to be observability experts—the platform takes care of that. When we asked developers what they liked most about migrating to New Relic, it’s that they don't have to deal with five different interfaces and multiple different query languages.
The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.