3 strategies for zero downtime database migration

Moving your data is one of the trickiest parts of a cloud migration. During the migration, the location of your data can have a significant impact on the performance of your application. If you don’t migrate the data at the same time you migrate the services using the data, you risk needing to access your data over a distance between your on-premise and your cloud data centers, which can certainly cause latency and throughput issues.

Additionally, during the data transfer, keeping the data intact, in sync, and self-consistent requires either tight correlation or—worse—application downtime. The former can be technically challenging for your teams performing the migration, but the latter can be unacceptable to your overall business.

Moving your data and the applications that utilize the data at the same time is necessary to keep your application performance acceptable. Deciding how and when to migrate your data relative to your services, though, is a complex question. Often companies will rely on the expertise of a migration architect, which is a role that can greatly contribute to the success of any cloud migration.

Don’t miss: New Relic’s Guide to Planning Your Cloud Adoption Strategy

Key areas of database migration design

An effective database migration requires planning and structuring a smooth and successful transition from one database system to another. Here are some key areas to consider in the database migration design:

Assessment and planning

Database inventory: Create an inventory of databases, including their size, complexity, and dependencies.
Requirements analysis: Understand the specific requirements and goals of the migration, such as data format changes, performance improvements, or version upgrades.
Risk analysis: Identify potential risks and challenges associated with the migration and develop mitigation strategies.

Data assessment and cleansing

Data profiling: Analyze and profile the data to identify anomalies, inconsistencies, and potential issues.
Data cleansing: Address data quality issues before migration to ensure accurate and reliable data in the new system.

Schema and code review

Schema mapping: Map the source database schema to the target database schema, considering any required modifications.
Code analysis: Review and update application code to ensure the new database schema and features are compatible.

Backup and recovery strategy

Backup procedures: Develop a comprehensive backup strategy for the source database to prevent data loss during migration.
Recovery plan: Define a rollback plan and establish procedures for recovering from potential issues that may arise during migration.

Testing and validation

Test environments: Set up test environments that mirror the production environment to perform thorough testing of the migration process.
Data validation: Implement validation processes to ensure data integrity and consistency after migration.

Performance considerations

Performance testing: Conduct performance testing to identify and address any performance bottlenecks in the new database environment.
Optimization: Optimize queries, indexes, and configurations for improved performance in the target database.

Monitoring and logging

Monitoring tools: Implement monitoring tools to track the progress of the migration and identify any issues in real time.
Logging: Ensure comprehensive logging to capture detailed information about the migration process for troubleshooting and auditing purposes.

Post-migration Support

Support team readiness: Ensure the support team is ready to address any post-migration issues promptly.
Documentation: Update documentation to reflect changes in the new database environment.

What can cause downtime during database migration?

Several factors can contribute to downtime during a database migration. Here are some common reasons:

Data volume: The larger the database, the longer it takes to migrate. During this time, services may be disrupted or put into a read-only state to prevent data inconsistencies.

Network issues: If the migration involves transferring data over a network, limited bandwidth can slow down the process and result in downtime. Any interruptions or failures in the network connection can also disrupt the migration process.

Data structure changes: If the migration involves changes to the database schema, such as adding or removing columns, it may require additional time and result in downtime.

Code changes: Suppose the migration requires changes to the application code to accommodate the new database structure. In that case, the application may need to be taken offline or put into maintenance mode during the migration.

Data cleansing and transformation: Cleaning and transforming data to fit the new database schema can be time-consuming, leading to increased downtime.

compatibility issues: Differences in database versions or configurations between the source and target systems can cause compatibility issues, leading to downtime.

Backup and restore processes: Taking a backup of the source database before migration can impact production systems, especially if the backup process is resource-intensive. Restoring the backup on the target system also takes time, during which the system may be unavailable.

How to achieve a zero downtime database migration

Whether you have an on-staff cloud architect or not, there are three primary strategies for migrating application data to the cloud:

Offline copy migration
Master/read replica switch migration
Master/master migration

It doesn’t matter if you’re migrating an SQL database, a noSQL database, or simply raw data files—each migration method requires a different amount of effort, has a different impact on your application’s availability, and presents a different risk profile for your business. As you’ll see, the three strategies are fairly similar, but the differences are in the details.

Strategy 1: offline copy migration

An offline copy migration is the most straightforward method. Bring down your on-premise application, copy the data from your on-premise database to the new cloud database, then bring your application back online in the cloud.

An offline copy migration is simple, easy, and safe, but you’ll have to take your application offline to execute it. If your dataset is extremely large, your application may be offline for a significant period of time, which will undoubtedly impact your customers and business.

For most applications, the amount of downtime required for an offline copy migration is generally unacceptable. But if your business can tolerate some downtime, and your dataset is small enough, you should consider this method. It’s the easiest, least expensive, and least risky method of migrating your data to the cloud.

Strategy 2: master/read replica switch migration

The goal of a master/read replica switch migration is to reduce application downtime without significantly complicating the data migration itself.

For this type of migration, you start with your master version of your database running in your on-premise data center. You then set up a read replica copy of your database in the cloud with one way synchronization of data from your on-premise master to your read replica. At this point, you still make all data updates and changes to the on-premise master, and the master synchronizes those changes with the cloud-based read replica. The master-replica model is common in most database systems.

You’ll continue to perform data writes to the on-premise master, even after you’ve gotten your application migrated and operational in the cloud. At some predetermined point in time, you’ll “switchover” and swap the master/read replica roles. The cloud database becomes the master and the on-premise database becomes the read replica. You simultaneously move all write access from your on-premise database to your cloud database.

You’ll need a short period of downtime during the switchover, but the downtime is significantly less than what’s required using the offline copy method.

However, downtime is downtime, so you need to assess exactly what your business can handle.

Strategy 3: master/master migration

This is the most complicated of the three data migration strategies and has the greatest potential for risk. However, if you implement it correctly, you can accomplish a data migration without any application downtime whatsoever.

In this method, you create a duplicate of your on-premise database master in the cloud and set up bi-directional synchronization between the two masters, synchronizing all data from on-premise to the cloud, and from the cloud to on-prem. Basically, you’re left with a typical multi-master database configuration.

After you set up both databases, you can read and write data from either the on-premise database or the cloud database, and both will remain in sync. This will allow you to move your applications and services independently, on your own schedule, without needing to worry about your data.

In fact, to better control your migration, you can run instances of your application both on-premise and in the cloud, and move your application’s traffic to the cloud without any downtime. If a problem arises, you can roll back your migration and redirect traffic to the on-premise version of your database while you troubleshoot the issue.

At the completion of your migration, simply turn off your on-premise master and use your cloud master as your database.

It’s important to note, however, that this method is not without complexity. Setting up a multi-master database is quite complicated and comes with the risk of skewed data and other untimely results. For example, what happens if you try and update the same data simultaneously in both masters? Or what if you try to read data from one master before an update to the other master has synchronized the data?

As such, this model only works if your application’s data access patterns and data management strategies can support it. You’ll also need application specific synchronization and sync resolution routines to handle sync-related issues as they arise.

If your application, data, and business can handle this migration method, consider yourself fortunate and use it. It’s the cleanest and easiest of the three strategies.

Mitigate migration risks

Any data migration comes with some risk, especially the risk of data corruption. Your data is most at risk while the migration is in progress; swift and determined execution of the migration is critical. Don’t stop a migration until you have completed the process or you have rolled it back completely. And never stop a migration halfway through—half-migrated data isn’t useful to anyone.

Risk of data corruption is especially high when migrating extremely large datasets. Offline data copy and transfer tools such as AWS Snowball can help manage the migration of large quantities of data, but they do nothing to help with your application’s data usage patterns during a migration. Even if you use a transfer device such as Snowball, you’ll still need to use one of the migration strategies described above.

As is true with all migrations, you won’t know if you encounter a problem if you can’t see how your application is performing before, during, and after the migration. Maintaining application availability, and keeping your data safe and secure, can only happen if you understand how your application is responding to the various steps in the migration process.

As such, monitoring your application during all aspects of the migration with the New Relic platform, including New Relic APM 360 and New Relic Infrastructure, will help keep your application safe and secure, and your data corruption-free. This is critical for all aspects of your migration, not just your data migration.

By Lee Atchison

Lee Atchison is a recognized thought leader in cloud computing and application modernization. With more than three decades of experience in product development, architecting, scaling, and modernization, Lee has worked at Amazon, Amazon Web Services (AWS), New Relic, and other modern application organizations. He is widely quoted in many publications and has been a featured speaker across the globe. Lee’s most recent book is Architecting for Scale (O’Reilly Media). You can check out his books, courses, articles, and speaking sessions at leeatchison.com, and follow him on Twitter and LinkedIn.

The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.

780+ integrations to start monitoring your stack for free.

See All Integrations