In just a few short years, containers have dramatically changed the way software organizations build, ship, and maintain applications.

Container platforms, led by the seemingly ubiquitous Docker, are now being used to package applications so that they can access a specific set of resources on a physical or virtual host’s operating system. In microservice architectures, applications are further broken up into in various discrete services that are each packaged in a separate container. The benefit, especially for organizations that adhere to continuous integration and continuous delivery (CI/CD) practices, is that containers are scalable and ephemeral—instances of applications or services, hosted in containers, come and go as demanded by need.

But scalability is an operational challenge.

If you have ten containers and four applications, it’s not that difficult to manage the deployment and maintenance of your containers. If, on the other hand, you have 1,000 containers and 400 services, management gets much more complicated. When you’re operating at scale, container orchestration—automating the deployment, management, scaling, networking, and availability of your containers—becomes essential.

So, what is container orchestration?

Container orchestration is all about managing the lifecycles of containers, especially in large, dynamic environments. Software teams use container orchestration to control and automate many tasks:

  • Provisioning and deployment of containers
  • Redundancy and availability of containers
  • Scaling up or removing containers to spread application load evenly across host infrastructure
  • Movement of containers from one host to another if there is a shortage of resources in a host, or if a host dies
  • Allocation of resources between containers
  • External exposure of services running in a container with the outside world
  • Load balancing of service discovery between containers
  • Health monitoring of containers and hosts
  • Configuration of an application in relation to the containers running it

How does container orchestration work?

When you use a container orchestration tool, like Kubernetes or Docker Swarm (more on these shortly), you typically describe the configuration of your application in a YAML or JSON file, depending on the orchestration tool. These configurations files (for example, docker-compose.yml) are where you tell the orchestration tool where to gather container images (for example, from Docker Hub), how to establish networking between containers, how to mount storage volumes, and where to store logs for that container. Typically, teams will branch and version control these configuration files so they can deploy the same applications across different development and testing environments before deploying them to production clusters.

Containers are deployed onto hosts, usually in replicated groups. When it’s time to deploy a new container into a cluster, the container orchestration tool schedules the deployment and looks for the most appropriate host to place the container based on predefined constraints (for example, CPU or memory availability). You can even place containers according to labels or metadata, or according to their proximity in relation to other hosts—all kinds of constraints can be used.

Once the container is running on the host, the orchestration tool manages its lifecycle according to the specifications you laid out in the container’s definition file (for example, its Dockerfile).

The beauty of container orchestration tools is that you can use them in any environment in which you can run containers. And containers are supported in just about any kind of environment these days, from traditional on-premise servers to public cloud instances running in Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. Additionally, most container orchestration tools are built with Docker containers in mind.

Kubernetes: the gold standard

Originally developed by Google as an offshoot of its Borg project, Kubernetes has established itself as the de facto standard for container orchestration. It’s the flagship project of the Cloud Native Computing Foundation, which is backed by such key players as Google, Amazon Web Services (AWS), Microsoft, IBM, Intel, Cisco, and RedHat.

Kubernetes continues to gain popularity with DevOps practitioners because it allows them to deliver a self-service Platform-as-a-Service (PaaS) that creates a hardware layer abstraction for development teams. Kubernetes is also extremely portable. It runs on Amazon Web Services (AWS), Microsoft Azure, the Google Cloud Platform (GCP), or in on-premise installations. You can move workloads without having to redesign your applications or completely rethink your infrastructure—which helps you to standardize on a platform and avoid vendor lock-in.

The main architecture components of Kubernetes include:

Cluster. A cluster is a set of nodes with at least one master node and several worker nodes (sometimes referred to minions) that can be virtual or physical machines.

Kubernetes master. The master manages the scheduling and deployment of application instances across nodes, and the full set of services the master node runs is known as the control plane. The master communicates with nodes through the Kubernetes API server. The scheduler assigns nodes to pods (one or more containers) depending on the resource and policy constraints you’ve defined.

Kubelet. Each Kubernetes node runs an agent process called a kubelet that’s responsible for managing the state of the node: starting, stopping, and maintaining application containers based on instructions from the control plane. A kubelet receives all of its information from the Kubernetes API server.

Pods. The basic scheduling unit, which consists of one or more containers guaranteed to be co-located on the host machine and able to share resources. Each pod is assigned a unique IP address within the cluster, allowing the application to use ports without conflict. You describe the desired state of the containers in a pod through a YAML or JSON object called a PodSpec. These objects are passed to the kubelet through the API server

Deployments, replicas, and ReplicaSets. A deployment is a YAML object that defines the pods and the number of container instances, called replicas, for each pod. You define the number of replicas you want to have running in the cluster via a ReplicaSet, which is part of the deployment object. So, for example, if a node running a pod dies, the replica set will ensure that another pod is scheduled on another available node.

Docker Swarm: a hardy bit player

Even though Docker has fully embraced Kubernetes as the container orchestration engine of choice, the company still offers Swarm, its own fully integrated container orchestration tool. Slightly less extensible and complex than Kubernetes, it’s a good choice for Docker enthusiasts who want an easier and faster path to container deployments. In fact, Docker bundles both Swarm and Kubernetes in its enterprise edition in hopes of making them complementary tools.

The main architecture components of Swarm include:

Swarm. Like a cluster in Kubernetes, a swarm is a set of nodes with at least one master node and several worker nodes that can be virtual or physical machines.

Service. A service is the tasks a manager or agent nodes must perform on the swarm, as defined by a swarm administrator. A service defines which container images the swarm should use and which commands the swarm will run in each container. A service in this context is analogous to a microservice; for example, it’s where you’d define configuration parameters for an nginx web server running in your swarm. You also define parameters for replicas in the service definition.

Manager node. When you deploy an application into a swarm, the manager node provides several functions: it delivers work (in the form of tasks) to worker nodes, and it also manages the state of the swarm to which it belongs. The manager node can run the same services worker nodes do, but you can also configure them to only run manager node-related services.

Worker nodes. These nodes run tasks distributed by the manager node in the swarm. Each worker node runs an agent that reports back to the master node about the state of the tasks assigned to it, so the manager node can keep track of services and tasks running in the swarm.

Task. Tasks are Docker containers that execute the commands you defined in the service. Manager nodes assign tasks to worker nodes, and after this assignment, the task cannot be moved to another worker. If the task fails in a replica set, the manager will assign a new version of that task to another available node in the swarm.

Apache Mesos (and Marathon): complex but flexible

Apache Mesos, slightly older than Kubernetes, is an open source software project originally developed at the University of California at Berkeley, but now widely adopted in organizations like Twitter, Uber, and Paypal. Mesos’ lightweight interface lets it scale easily up to 10,000 nodes (or more) and allows frameworks that run on top of it to evolve independently. Its APIs support popular languages like Java, C++, and Python, and it also supports out-of-the-box high availability. Unlike Swarm or Kubernetes, however, Mesos only provides management of the cluster, so a number of frameworks have been built on top of Mesos, including Marathon, a “production-grade” container orchestration platform.

The main architecture components of Mesos include:

Master daemon. Part of the master node that manages agent daemons. With Apache Zookeeper, you can create a Mesos Master Quorum, consisting of at least three master nodes, for high availability purposes.

Agent daemon. Another part of the master node that executes tasks sent by the framework (in this case, Marathon).

Framework. Mesos doesn’t run application orchestration workloads; instead, Marathon receives resources from the Mesos master (in the form of offers), and Marathon sends tasks, based on the resource offers, to executors that launch the tasks on agents.

Offer. The Mesos master gathers information about agent nodes’ CPU and memory availability and sends that information to Marathon so Marathon knows what resources are available.

Task. These are basic units of work that Marathon schedules based on resource offers from the Mesos master. Tasks are executed by executors on agent nodes.

Marathon then provides necessary service discovery, load balancing (with HAproxy), cluster resource management, application (i.e., container) deployments, and APIs for managing workloads.

Container orchestration platforms: let someone else manage Kubernetes for you

As noted earlier, Kubernetes is currently the clear standard for container orchestration tools. It should come as no surprise then that major cloud providers are offering plenty of Kubernetes-as-a-Service offerings:

Amazon Elastic Container Service for Kubernetes (Amazon EKS)

Amazon EKS fully abstracts the management, scaling, and security of your Kubernetes cluster, across multiple zones even, so you can focus strictly on your applications and microservices. EKS integrates with popular open source Kubernetes tooling and plenty of AWS tools, including Route 53, AWS Application Load Balancer, and Auto Scaling. The team that manages Amazon EKS are regular contributors to the Kubernetes project.

Google Cloud Kubernetes Engine

Like Amazon EKS, Kubernetes Engine manages your Kubernetes infrastructure so you don’t have to. Google, as the original developer of Kubernetes, has plenty of experiencing running Kubernetes-based containers in production. Kubernetes Engine runs on Google’s network and uses routine health checks in high availability configurations and auto scales to meet whatever demand is placed on your applications.

Azure Kubernetes Service (AKS)

AKS is Azure’s Kubernetes management solution. With AKS, you can secure your clusters with Azure’s Active Directory and deploy apps across Azure’s massive data center offerings. Azure also provides their own container registry and a provisioning portal. And, as Kubernetes enthusiasts likely already know, Brendan Burns, who co-created Kubernetes, is leading the charge behind Azure’s container work.

So how do I choose?

As with most emerging technologies, container orchestration tools have their pros and cons. The platforms that manage Kubernetes for you, from Google, Azure, and AWS, provide a tremendous amount of functionality with very little overhead. Kubernetes, Swarm, and Mesos/Marathon, on the other hand, should be appraised depending on factors such as architecture, HA needs, flexibility, and learning curve.

For instance, Swarm has a fairly simple architecture built directly into the Docker ecosystem while Kubernetes, and Mesos especially, can be much more extensible; in fact, in a Mesos cluster you can deploy containerized applications right next to apps running in traditional VMs. That said, Swarm may be suitable for smaller deployments with little need to scale. Mesos, on the other hand, can scale to tens of thousands of nodes, and Kubernetes is right behind it. On the other hand, the learning curve for Swarm is pretty low; both Mesos and Marathon could likely require some level of specialization in your organization.

Finally, in addition to the container orchestration tools discussed here, there is also a wide range of third-party tooling and software associated with Kubernetes and Mesos. (For example, Helm for Kubernetes and Mesosphere DC/OS for Mesos).