Clustering: Kubernetes in a nutshell

As a system engineer, you have already heard about Kubernetes.

If you are familiar with Hadoop, Spark or Docker, you eventually know Mesos, Yarn, Titus, or Swarm too.

You may be less familiar with BOINC. BOINC is not a cluster management platform but a grid computing platform used, notably, by SETI@home (Search for Extraterrestrial Intelligence).

BOINC is an pre-kubernetes project. It was funded in 2002. It offers many functionalities you'll look for in a regular cluster :

  • Distributing workload over member nodes
  • Managing failures (and the fact that in a volunteer project, a node may disconnect before process termination)
  • Managing heterogeneous computers
  • Gathering and consolidating results

Cluster management or container orchestration ?

Strictly speaking, the state of nodes in a cluster is strongly synchronized. Oracle RAC is the archetype of this. RAC implements complex memory management processes in order to guarantee that transaction submitted to any node in the cluster is ACID-compliant at the cluster level.

Building on CAP theorem as well as many technical articles, like this interesting one, that argue that the ACID juice is not worth the squeeze, recent database software projects have relaxed consistancy constraints. Most of the time, NoSQL cluster technologies are not real clusters.

Consequently, containers (or workloads) orchestrators is eventually a more appropriate designation unless you target specific application where data integrity is a major concern. Orchestrators may help in securing the persistency layer; but they cover a broader scope, from proxy, to web server or application server.

A bunch of orchestrators

Early orchestrator projects were probably initiated by scientists, working in the HPC field(High Performance Computing) like SETI. Then, search engine (and web at large) engineers published their work. Orchestrator is a very competitive domain. There are so many solutions today that you're in trouble to choose the right one for your needs.

In a previous post, we talked to you about Openstack. Openstack, as Kubernetes, is an orchestrator. But when Kubernetes adresses the need to orchestrate containers, Openstack is more hardware and VM oriented.

In this post, we choose to introduce you to Kubernetes. Kubernetes is not the only container orchestrator. We don't even really know if Kubernetes is the best available technology. But Kubernetes is very popular, and its popularity grows at a very fast pace, because its creator, Google, is a big technology opinion-maker, because, by releasing Kubernetes after its competitors, Google likely profits by their experience, and because Kubernetes is not initially tied to another projects (like Hadoop or Spark) and establishes itself as a generalist orchestrator.

A layered approach to Kubernetes

As any software, Kubernetes is a multi-layer design, from functionalities to system and hardware. We structure our analysis into three distinct layers from top down to bottom :

  1. Functional layer: an orchestrator offers services or functionalities, that are user-oriented. It publishes interfaces, GUI, CLI, API, and offers tools to interact with its services. You can limit yourself to exploring this upper architecture of Kubernetes, particularly if you deal with a cloud instance of kubernetes.
  2. Conceptual layer: Kubernetes interfaces operates on conceptual entities, like tenant, pod, container, inventory, worker. At the conceptual level, you get an understanding of the vision of the designers, what they mean with orchestration, which use cases they anticipate, to which logical model they adhere. When you understand both the functional and the logical layers, you really master Kubernetes operation.
  3. System layer: Servers, networks, hypervisors, storage hide behind concepts, functions and services. Concepts are built upon a system (or physical) layers. Understanding the relationships between this two layers is required to deploy Kubernetes.

We will dive into this three layers in the next post.