Demystifying containers — what I learned in my docker/kubernetes journey ? Part 1
Containers are mysterious. The more to try to understand them, the more it gets murkier ;-). Of course, one shall not be required to understand it at great depths if one is not interested enough, but for the ones, who dare to face the apocalypses and are interested to go through the fire of understanding containers in depth, you are on the right page, so brave yourself for some exciting revelations.
I will be sharing my learnings and demystifying some aspects on containers.
Why containerization is required in the first place ?
Years ago, most software applications were big monoliths, running either as a single process or as a small number of processes spread across a handful of servers. These legacy systems are still widespread today. They have slow release cycles and are updated relatively infrequently. At the end of every release cycle, developers package up the whole system and hand it over to the ops team, who then deploys and monitors it. In case of hardware failures, the ops team manually migrates it to the remaining healthy servers. Today, these big monolithic legacy applications are slowly being broken down into smaller, independently running components called microservices. Because microservices are decoupled from each other, they can be developed, deployed, updated, and scaled individually. This enables you to change components quickly and as often as necessary to keep up with today’s rapidly changing business requirements.
- Dependencies Management
- What happens if one application needs an upgraded dependency, but the other does not?
- What happens when you remove an application? Is it really gone?
- Can you remove old dependencies?
- Can you remember all the changes you had to make to install the software you now want to remove?
2. Process Isolation/security
- Containers limit the scope of impact that a program can have on other running programs, the data it can access, and system resources.
3. Shared Resources
- Multiple services in same VM shared CPU and Memory
- Not way to isolate and control the resourcing needs.
4. Portability & ease of deployment
- Packaging and installing software is cumbersome
Containers are just another Linux process
Linux containers are technologies that allow you to package and isolate applications with its dependencies & entire runtime environment — all of the files necessary to run. This makes it easy to move the container application between environments (dev, test, production, etc.) while retaining full functionality.
No more are we going to cry about the problems of “It works on my machine, but not on the server”. The container technology provides the capability for a developer to ship not just the programs but his entire runtime libraries and filesystem. So wherever you run, the environment is going to be the same !!
So how is this even possible ? Is container the new technology which helps doing this magic ?
What I found is that containers in its running state is just another Linux process. Docker for that matter is not at all closely related to the way a container runs. Docker is the technology which helps to build and run containers in an easy way. Docker exploited the deep understanding of the Linux operating system concepts like Namespaces and Cgroup and created tools ( low level runtimes) which helps to create a container using these Linux constructs . Containers as a concept was already there in the Linux operating system , its just that it wasn’t harnessed in the way like Docker did.
Why are containers so portable ?
First reason — is the way Linux Operating System is designed.
To understand the portability concept, we need to understand the basics of a Linux operating system. The Linux operating system is logically separated in to two layers. The kernel space and the user space.
User Space :User space refers to all of the code in an operating system that lives outside of the kernel. Most Unix-like operating systems (including Linux) come pre-packaged with all kinds of utilities, programming languages, and graphical tools — these are user space applications. We often refer to this as “userland.” for example — bash, awk, grep, and yum etc are userspace programs (so that you can install other software).
Kernel Space:
The kernel provides abstraction for security, hardware, and internal data structures. Kernel Space provides APIs to interact with the hardware and low level system functions like opening file etc.
Detailed view — A typical Userspace program makes following hierarchy of calls to Kernel Space.
Due to the clear segregation between user-space and kernel space, the Linux Operating system evolved as developers started building a more customized user-space for custom requirements and you can see already a dozen of distros which are based on same Linux kernel e.g Ubuntu, Centos, Debian etc.. use the same kernel space.
This design enabled containers to have any base image which is based of any popular distribution and it can be run on any Container Host without major issues.
Second reason — OCI ( open container Initiative)
The Open Container Initiative is an open governance structure for the express purpose of creating open industry standards around container formats and runtimes.
Container Engines, Image formats , Kernel versions, registry servers were all different for different technologies which were working on the Container Tech like Docker , Rkt , LXC etc. OCI developed specifications for image formats and runtimes so that all competing Container technology could work in an interoperable fashion.
The standardized format of containers is what makes them portable between registry servers and container hosts. This is another portability aspect which is less talked about.
So basically you can create an image in podman, push it to a Docker Registry and then use a Container Runtime like RKT/CRI-O which can use the OCI runtime implementations to create a container.
Podman -> Image -> Docker registry -> RKT/CRI-O.
What is Compatibility in Containers?
Container images and hosts are designed and engineered to work together. It means Containers are portable but not necessarily “compatible”.
Compatibility is based on
- Hardware architecture (x86 versus ARM) — Containers created for ARM can’t run on X86 and vice versa because the machine language is different.
- Operating system (Linux versus Windows) — Linux containers can’t run on Windows and Vice versa because the userspace and kernel are going to be completely different which won’t work.
- Linux distribution (RHEL versus other distro)
- Age of the Linux distro in the container image — for example, very old images may not work on newer hosts, while very new images may not work on older hosts.
Means x86_64 containers must run on x86_64 hosts
ARM containers must run on ARM hosts
Microsoft Windows containers must run on Microsoft Windows hosts.
Also, as a best practice , always use a Standard operating environment which is versioned & tested together i.e test these components together.
- Container hosts
- Container images
- Container engine
- Container orchestration
How is a Container spawned ?
Above image should give you some idea about the container creation process. Users initiate docker API calls to run a docker image. Docker API Containerd i.e Container Engine is the component which manages the work of pulling the images, managing networking and storage and finally handling over the control to runC to run the containers.
What is Containerd?
Containerd is a high-level container runtime that came from Docker, and implements the CRI spec. It pulls images from registries, manages them and then hands over to a lower-level runtime, which actually creates and runs the container processes.
containerd was separated out of the Docker project, to make Docker more modular. Docker uses containerd internally itself. When you install Docker, it will also install containerd. containerd implements the Kubernetes Container Runtime Interface (CRI), via its cri plugin.
Alright what is CRI then?
CRI stands for Kubernetes Container Runtime Interface. It is the API that Kubernetes uses to control the different runtimes that create and manage containers.
Why CRI standard ?
While docker was used as the container Engine ,Rkt came along as the contender. Kubernetes decided to support RKT as another container Engine but soon realized that it has introduced tight coupling in the Kubelet with Docker and Rkt APIs. This was a perfect recipe for management nightmare handling container engines API. CRI eventually born to clearly segregate a cleaner set of APIs which Kubernetes will use for managing containers ( start, stop etc.) and this interface will keep the Kubernetes API clean and decoupled to container Runtime Implementation.
Containerd, CRI-O are the implementations of the CRI interface. Containerd is not natively written to support CRI but later adapted to support CRI interface. One can now easily switch the container runtime implementation as there is no tight coupling with Kubernetes anymore.
In the next article, we will be going deeper in to Containerd,CRI-O and runC. Hope, so far , you got good idea on the concepts which govern the Container Eco System.
References :
http://crunchtools.com/portability-not-compatibility/https://www.redhat.com/en/blog/architecting-containers-part-2-why-userspacemattershttps://github.com/kelseyhightower/kubernetes-the-hard-way
https://jaxenter.com/nobody-puts-java-container-139373.html -https://www.redhat.com/en/blog/limits-compatibility-and-supportability-containers
http://crunchtools.com/containers-dont-run-on-docker/
https://github.com/opencontainers/runtime-pec/blob/master/README.md
https://github.com/opencontainers/runc
https://www.redhat.com/en/blog/engineering-compatibility-red-hat-universal-base-image