When I joined the myToys Group as head of technology, the company had recently decided to move to Amazon Web Services (AWS) from its on-premises infrastructure. The primary motivation for this cloud migration was scale and speed, because for companies like ours that were born in the on-premises world, the weakest link for scalability is relying on your own infrastructure.
In case you haven’t heard of myToys, it’s part of the myToys Group, an Otto Group company, and myToys.de is the top online store for toys and children’s products in Germany. Besides myToys, the group’s family of brands also includes limango, mirapodo, and yomonda.
Looking back on our experience with our first pilot and then migrating 80% of our additional systems to AWS, I’ll summarize what we learned along the way into “best practices” for companies in similar situations to ours.
Best Practice #1: Choose a meaningful pilot project
Conventional wisdom says to start small and simple when migrating applications to the cloud. With this approach, the team can learn and make mistakes in a low-risk way and gain confidence for migrating larger, more important applications.
However, we already had a team with some AWS experience, and we felt that we needed a highly visible success that demonstrated clear business value as early as possible in the effort. That’s why we chose to start with our highest-value/highest-risk application as our pilot project.
I would not recommend this approach to organizations that don’t have people with at least some AWS experience. In addition to experienced team members, we also have a culture in which people are intrinsically motivated to learn fast and experiment with new technology.
Best Practice #2: Plan a generous buffer for your timeline
I learned early in my career that whatever estimate is presented for the time (and effort) required to do something, it’s best to double it for a more realistic expectation. We thought that because the pilot project was planned as a minimally viable product (MVP)-type of effort that we didn’t need to include too much of a buffer in our goal for going live.
We were wrong. This became the biggest lesson learned from our effort: even for an MVP project focused on a system that you already know very well, do not underestimate how long it will take you to migrate it to a new environment. Add a generous buffer to your go-live goal.
Think of the project like building a house. Builders must coordinate many different people and teams who handle specific, time-dependent tasks throughout the project. There will always be risks and unexpected delays with such projects. It’s the same with technology.
Best Practice #3: Make sure you can measure customer impact
We made an agreement with the business to not move the traffic from our on-premises system to the migrated application on AWS unless we could show that it was achieving at least the same conversion rate as the on-premises system over a 14-day test period. Although not stated to the rest of the business, our intention as a team was to deliver better performance that would drive increased conversions.
First we had to solve the problem of disparate observability tools. While we had agreed on the conversion metric, we hadn’t investigated how we would collect and compare metrics between on-premises and AWS. As our pilot project began attracting 1% of the traffic, we realized that we were collecting different information across multiple tools with no way to accurately compare the data.
Because I have previous experience with New Relic, I knew that it would solve our measurement issue. Once the on-premises and cloud applications were instrumented, we could use New Relic to compare the baseline metrics with those of the pilot application.
We started using New Relic One to support A/B testing in preparation for the 14-day test period. New Relic helped us identify both functional and non-functional bugs impacting areas such as our on-premises recommendation engine, which caused product detail pages to not be generated efficiently and to load slowly.
Using New Relic, we iteratively found issues, fixed them, re-tested, and then further optimized our systems.
After this effort, we exceeded the target of achieving the same conversion rate and generated a slight uplift in conversions on AWS. We did this by improving the customer experience by reducing time to first byte by 20% and reducing our bounce rate.
Best Practice #4: Instrument everything early in the process
After we realized that we had disparate on-premises and AWS tools and couldn’t compare metrics, we instrumented our top on-premises services using New Relic. This delivered an unexpected benefit: it improved our knowledge and understanding of our on-premises services. Now we had an automatically generated map of all of our services and dependencies that we could use for planning the rest of our migration.
While this solved our initial dilemma of not being able to compare metrics across the two environments, we discovered a new issue.
The systems that we migrated to AWS were communicating over the public network to the systems that were still running on-premise. This introduced additional latency and load on our firewall hardware. To fix this issue, we used AWS Direct Connect and deployed better firewall hardware. The whole procurement and setup process took about 1.5 months, which postponed further load testing.
Instrumenting our on-premises applications and setting up Direct Connect earlier in the project plan would have enabled us to run it as a parallel activity.
Best Practice #5: Plan for post-migration product ownership
Getting the team structure right turned out to be our biggest challenge throughout the migration effort. We had to figure out how to move from the concept of creating an MVP to distributing ownership once the application was operational in AWS.
For the pilot migration, we used a dedicated team (also called a task force) made up of members from across all of our other teams. This worked extremely well for our MVP effort, but stopped working well once the system was operational.
What we needed was an ownership model in which the knowledge gained during the migration wouldn’t get dispersed into other teams. This was particularly important because our goal was to migrate more than 80% of applications in a one-year timeframe to AWS.
To resolve the ownership issue, we followed the approach of “You build it, you run it.” Each team, made up of a product owner and engineers, takes over the ownership of the systems they migrate. This approach has been successful for us throughout the migration effort and I’d recommend it to other organizations as well because it ultimately improved our operational excellence.
Moving forward with cloud migration
Our experience and success with the pilot MVP gave us the confidence and skills to migrate the rest of our systems to AWS. The result is that today we can scale within minutes based on traffic volume, and we have a strategy for fully automating scalability going forward.
To learn more best practices for migrating to the cloud, see Preparing to Adopt the Cloud: A 10-Step Cloud Migration Checklist.
Photo courtesy of myToys.