Culture-focused companies know that employee engagement and retention are intricately linked with company performance. Building a better workplace, however, requires actionable insight into the employee experience. 

That’s where Culture Amp excels. The company’s platform makes it easy for companies to collect, understand and act on employee feedback to drive performance and competitive advantage. As the leading people and culture platform, Culture Amp is a certified B Corporation (one that balances purpose and profit as a force for good) relied on by more than 2,500 customers including Aegon, Airbnb, Go Cardless, KIND Snacks, McDonald’s, Mercy Health, Salesforce, and Slack. 

A unicorn startup with a valuation of more than AU$1 billion, Culture Amp has set an aggressive growth goal: reach 100 million users or roughly 1% of the world’s population. For the engineering team, achieving this milestone means bringing new features to market faster than ever, without sacrificing security or reliability. 

‘We need to help our teams move quickly and fail fast, turning their ideas into a tangible product in the shortest amount of time to keep us moving towards that goal’, says Matthew Tapper, Lead Site Reliability Engineer at Culture Amp. 

Breaking apart the monolith

Culture Amp was born in the cloud and has been relying on Amazon Web Services (AWS) since day one for its underlying infrastructure. The company’s predominant platform within AWS is a fully containerised, managed environment using Amazon Elastic Container Service, and Amazon Fargate. 

Increasingly, Culture Amp is taking advantage of Amazon Lambda for its scalability and cost effectiveness for the company’s services and applications that are well suited to a serverless function environment.      

While the company has always been a proponent of cloud computing and takes advantage of cloud native services and capabilities, it started out with a more traditional approach to its software platform. ‘Like many companies, we started with a monolithic application’, says Tapper. ‘We outgrew it though and have been working to break it apart into microservices.’ 

However, as the number of microservices increased, so did the complexity of the infrastructure and the code needed to deploy new features to Culture Amp’s restricted environments. The site reliability engineering (SRE) team knew it needed to make it easier, and therefore faster, for engineers developing new features across major product lines to get them into production without compromising on security or reliability. ‘We wanted our SRE team to be an enabler, not a bottleneck’, says Tapper.

Streamlining deployment and automating instrumentation

The answer was to develop an automated platform the SRE team named Silk, which provides everything needed to ship code securely, including CI/CD, roles and permissions, an AWS Cloud Development Kit (CDK) construct library, observability, and a CLI tool. As part of the Silk platform, Culture Amp uses the AWS CDK, a framework for defining cloud resources in code and provisioning them via AWS CloudFormation, to define its infrastructure and encapsulate best practices as a construct. 

Together, the applications that make up the Silk platform give the SRE team a governance gateway that reduces friction without sacrificing control and security. ‘The new Silk platform automates standing up new services in the AWS environment’, says Tapper. ‘This reduced the time to get new features into production from weeks to hours.’ 

A long-time customer of New Relic, Culture Amp decided to not only automate the build, test and deployment of new services, but also to automate instrumentation as well. ‘With our new CI/CD environment, everything that is deployed is instrumented with New Relic, which gives us a good view of how our distributed platform is performing’, says Tapper. 

‘Using the New Relic API, we set up a standard set of alerts and dashboards for the golden signals of latency, saturation, error rates and throughput’, says Tapper. ‘This lowers the barriers to get started, and then teams can customise dashboards and alerts as needed.’

“With our new CI/CD environment, everything that is deployed is instrumented with New Relic, which gives us a good view of how our distributed platform is performing”

Matthew Tapper Lead Site Reliability Engineer, Culture Amp

Expanding observability

With the bottlenecks to moving code to production removed, the SRE team is now focusing on improving observability. ‘Managing our growing microservices environment requires observability’, says Tapper. ‘New Relic gives us a single pane of glass where we can understand what's going on in the underlying infrastructure, our individual applications, and all our microservices. Not only can we make sure we're meeting our service level objectives, but when things go wrong, we can resolve the issue faster.’

He cites a recent example: An alert from New Relic helped the SRE team narrowly avoid a major incident when a main data store nearly ran out of memory. ‘New Relic is key in telling us when things go wrong’, says Tapper. ‘It lets us move fast without the wheels coming off.’

Closing the feedback loop

As the SRE team continues to evolve its role as enabler for the development team, it’s putting best practices into place for observability of its distributed systems. The team uses New Relic to measure and track service level indicators (SLIs), service level objectives, and service level agreements (SLAs) for its applications and services. 

‘Tracking our golden signals with New Relic creates a great feedback loop to balance feature velocity with our reliability objectives’, says Tapper. 

In a continuous deployment environment, New Relic also gives Culture Amp the confidence to deploy frequently. ‘We rely on New Relic to monitor performance before and after deployment, understand the impact of a deployment on the customer experience, and make the decision to roll back changes before they negatively impact customers’, says Tapper. All of which add up to improving the stability of the platform without impacting time to market for new features.