(This post is part one of a two-part series. Part two—Building Megabase—takes a technical look look at how we embraced a microservices architecture to deploy databases in containers with tooling we call “Megabase.”)
As an early adopter of containers, New Relic began using Docker during the project’s beta. Now that we’ve been running Docker in production for nearly four years, we have built a deep organizational knowledge of the benefits and limits of containerizing our infrastructure. A large portion of our production services now run on top of a deployment target we’ve named “Container Fabric,” which helps our engineers consistently hit ever-increasing goals for scale and velocity.
Container Fabric abstracts our hardware from our stateless applications (which don’t require a server to retain information about sessions or communication requests) and allows us to greatly simplify the management of each. Having experienced many of the benefits of containers firsthand, we hoped to continue to leverage Docker for our stateful applications as well (apps such as databases that retain and store information). We use a lot of databases, for both internal engineering projects and for storing customer data, and we hoped to deploy databases faster, with more consistency and greater resource efficiency.
Four inherent challenges of containerizing databases
Unfortunately, putting databases in containers raises a number of inherent challenges. Databases exhibit a few key properties that make them difficult to containerize effectively:
- They require high-throughput and low-latency networking capabilities.
- They require, in our use, an ability to handle persistent data storage.
- They require layers of complex configuration.
- They require disk space to store large amounts of data, and are thus less portable.
Challenge 1: Establishing the right networking conditions
Docker containers are an abstraction built around two Linux kernel features: control groups (commonly know as cgroups) and namespaces. These features offer resource constraints for memory and CPU, but unfortunately they don’t provide effective isolation of storage and network resources on their own—both of which are critical for databases.
There are currently four native network drivers available for Docker: host, bridge, overlay, and MACVLAN. We considered the pluses and minuses of each one:
- Host networking is the simplest and fastest option, as it places containers in the same Linux network namespace as the host on which they’re running. Host networking has no overhead but provides no isolation from the underlying hardware or other containers.
- Docker creates a default Linux bridge named
docker0on the host for containers to use. In bridge mode, a container’s interface is connected to the bridge by a virtual ethernet interface. Using Linux bridging and network namespaces isolates container networks, but it can add a measurable amount of overhead to CPU-bound workloads.
- Overlay networking uses Linux bridges and kernel Virtual eXtensible LAN (VXLAN) features to abstract the container’s networking from the physical network. Overlay networking is powerful and provides a rich feature set, but implementation details can cause performance problems that stress CPUs.
- The MACVLAN driver provides containers with access to host interfaces without the overhead of network address translation (NAT). Unlike the internal networks used by the bridge and overlay drivers, though, MACVLAN can assign containers an address on the external physical network, which can make portability very tricky.
After considering the trade-offs between isolation, performance, and portability for the database container project, we chose host networking. We use randomized port allocation and careful container placement to avoid putting multiple high-throughput services on the same physical host. (For a more in-depth analysis of the factors that influenced our decision, I recommend this reference on Docker networking options.)
Challenge 2: Handling persistent data storage
The New Relic database team had little debate about whether or not to use Docker volumes for persistent data. According to the Device Mapper storage documentation: “Volumes provide the best and most predictable performance for write-heavy workloads … because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write.”
All of our internal projects for stateful containers use volumes. While this makes sense from a performance standpoint, it can make container portability more challenging (we’ll cover that issue later in this post).
In some projects, we use Logical Volume Manager (LVM) to provide an additional layer of storage isolation. When a database is deployed, we create a new logical volume from a thin pool and assign it to a container. If the container fills up its volume by accident, it will not affect any neighboring containers, as we monitor each volume as a separate resource. We can extend the logical volume holding any container’s persistent data and resize the filesystem without service interruption.
As with networking, we carefully place high-throughput services in locations where we can avoid “noisy neighbor” problems on hosts with multiple tenants. Ultimately, we have to maintain a careful balance between adequate levels of resource isolation and resource efficiency. To maintain that balance and meet both our cost efficiency goals and our service-level agreements, we closely monitor the network—observability is key. In fact, part two in this series shows how we use New Relic Infrastructure and New Relic Insights to meet these demands.
Challenge 3: Configuring the database
The prevailing wisdom holds that a container image should be an immutable artifact derived from a particular version of code and configuration. However, building a new image for every database configuration change can lead to container image sprawl.
A side effect of databases’ high-performance requirements is that they require layers of complex configuration tuning. At New Relic, we tune storage controllers, Linux kernel and OS parameters, language runtime, and database parameters in order to meet our performance needs. This can make it difficult to abstract the underlying hardware from our database deployments. In practice, this means that our database team must still own the management of its hardware.
Hardware, kernel, and OS tuning generally happens when a host is provisioned. At deploy time, we size container memory limits and CPU shares to guarantee that adequate resources are granted for each workload and to help avoid contention. We also pass in environment variables at deploy time to ensure the service inside the container can resolve any outside dependencies.
It’s common for databases to have hundreds of tuning parameters. Many of those parameters can be changed dynamically, so it’s crucial that dynamic changes are reflected on disk so they aren’t reverted if the database is redeployed or restarted.
A common solution to this problem is to move database configuration into environment variables (used to specify configuration depending on the context in which the databases will run; a basic, fundamental example includes setting environments for dev, staging, and prod). Environment variables make it easy to change configuration at deploy time and ensure that configuration persists if the container it’s running in is restarted. Mapping database parameters to environment variables helps, but they can make it difficult to discern the container’s final configuration.
To get around these hurdles, we synchronize database-configuration files from our version control system to an object store and pull them in at deploy time (which we’ll illustrate in part two of this series). We’ve found that this trade-off between immutability and knowability versus container image sprawl is acceptable for us given the remaining gains in our solution.
Challenge 4: Big data isn’t portable
Application portability is a key benefit of containers. If a host experiences problems, it should be relatively quick and easy to redeploy your containerized application to another host. However, the physics involved in migrating large data sets makes portability difficult. We found that for databases with a fault-tolerant, shared-nothing architecture, giving up container-level portability was acceptable since we address data redundancy and high-availability (HA) at the application level.
For traditional relational database management systems (RDBMS), additional services are required to provide guarantees like data redundancy and HA. One solution we considered was to push the maintenance of state to a lower-level “storage fabric” that would handle the replication of data across a redundant set of hosts. Unfortunately, this imposed a performance penalty. Another option was to back up and restore data from an object store, but that could negatively impact mean time to recovery (MTTR) during a failure.
In the long run, our best solution to the big data problem was just to embrace “small data”—otherwise known as microservices.
Microservice architecture calls for narrowly defined services that own access to their state. Breaking monolithic databases into smaller, easier-to-manage components fulfills this philosophy. It shrinks the “blast radius” of database problems and enables site reliability engineers (SREs) to perform operational work with less risk of causing major service disruptions. The downside, of course, is that microservices can require fundamental changes to existing applications. They can also complicate cross-domain data access patterns and increase the load on the network.
Embracing a microservices architecture meant we’d now be managing a large fleet of small databases, and this required us to build a new set of tools. At New Relic we’re in the process of building those tools in a project called Megabase, which I describe in greater technical detail in part two.
Delivering value in functional increments
A critical part of our success in containerizing out databases is that we delivered the system in phases. It wasn’t necessary for us to adopt a full-scale container orchestration system in order to start delivering value. Our primary goal was to deliver a consistent database service, and we wanted it be fast and have great resource efficiency.
We now deploy all new database instances in Docker containers, regardless of their type and version. This has allowed us to reach a new level of consistency and reproducibility across our database tiers. Having a single deployment process means we can deliver new databases to internal teams faster and more reliably. Even without full resource isolation, our resource efficiency has greatly improved. We’ve also been able to add support for a new backup solution, monitoring, and operating systems across our database tiers by building new images. For more on this, see the second post in this series.
As we move forward, we’re keeping a close eye on upstream open-source efforts, particularly for those dealing with resource management and orchestration. While upstream work to support stateful containers is still nascent, we feel it’s promising. Of particular note is the work to support stateful services in the Kubernetes and DC/OS projects.
Now, check out part two of this series, Building Megabase, where I take a closer look at the microservices architecture of Megabase.