Containerization

What is a container?

Ganga Reddy
5 min readOct 21, 2018

A container is a standard unit of software that packages up a given code and all its dependencies so the application runs quickly and reliably from one computing environment to another. (Source Docker)

Note: I tried to explain the concept behind containers and how containerization works. This should be only read to form a mental model and examples are provided only for illustration.

  1. What is Containerization?

Containerization is the process of bundling your application code with requires packages/libraries required at runtime to execute your application quickly and reliably in any supported computing environment. Think of containers as a unit of your executable application with its runtime, system tools, system libraries & required configuration and environment settings.

2. Why Containerization?

Monolithic applications are proved to be hard maintained, maintaining and CI-CD of such applications is time and energy intensive. Most companies started using Service Oriented Architectures (SOA) (Amazon heavily adopts SOA ).

Containerization offers the following benefits

  • Portability of distributed applications
  • Reproducibility of the application
  • Scaling based on requirements
  • Lifecycle management of containers
  • Memory, CPU, and storage efficiency compared to VM hosting and hence cluster improvisation
  • Aids in better cluster management and observability

3. Container Vs VM

src: http://www.serverpronto.com/spu/wp-content/uploads/2016/05/MJHfm1c.jpg

The above picture explains why it would be much efficient to run your applications as containers instead of running them as VMs. Although typically one can run the container engine operating system on a Virtual Machine for extra security. The main drawback of containerization is lack of isolation from the host operating system.

More info regarding container vs VM can be found here.

4. Container Images

Docker, an open source project, generated the most interest in container technology in the past few years, a command line tool that made creating and working with containers easy for developers and administrators.

A container image is an inert, immutable, file that’s essentially a binary packaged snapshot of a container. Container images are created with the build command, and they’ll produce a container when deployed. In simpler terms, think of an image to the container as a program to process.

A new code version when going through a CI-CD can be configured to create a corresponding image with tags which could be stored in image registries/hubs so that authorized users can access them on-demand.

Let’s assume you want to containerize a python program and make a filename called Docker in your root repository of source code.

FROM ubuntu:15.04
COPY . /app
RUN make /app
CMD python /app/app.py

In order to build a Docker image, just run the below command from your root directory. More on this could be found at best-practices of Docker.

docker build -t helloapp:version1 .

5. Containerization tools

Though Docker, an open source project, a command line tool that made creating and working with containers easy for developers and administrators, generated the most interest in container technology in the past few years, there are other formidable alternative tools for building containers as well. All follow a fairly similar concept of images and containers but with some technical differences. Some notable ones are rkt, LXD, Linux VServer, Windows Containers.

An example of running containers could look this.

docker run -d helloapp:version1

(or a complicated one )

docker run -d -p 5775:5775/udp -p 16686:16686 jaegertracing/all-in-one:latest

Just like Docker, the above ones come with their own set of tools to build and run images.

6. Container Realization

Containers leverage several technologies built into the Linux kernel to realize containers as isolated groups of processes running on the same host

  • namespaces, which offers isolation in terms of pid(process), net(networking sockets, ports, etc..), ipc and mount (Storage system)
  • cgroups, which offers limits, accountability, and isolation of resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.
  • chroots, used to create and host a separate virtualized copy of the software system.

A detailed talk on container realization could be found at https://containersummit.io/events/nyc-2016/videos/building-containers-in-pure-bash-and-c

7. Container Orchestration

Container orchestration as the name suggests deals in orchestrating the containers i.e. managing the lifecycles of containers, especially in large, dynamic environments.

Typically orchestration of containers include activities such as

  • Provisioning and deployment
  • Availability of containers
  • Maintaining affinity (for achieving data locality), maintaining anti-affinity (reliable replication)
  • Scaling up or down depending on load
  • Managing containers in case of node failure, disk failure, etc…
  • Allocation of resources between containers
  • External exposure of containers to the external network
  • Service Discovery between containers & DNS resolution
  • Health monitoring of containers and hosts (liveness checks, health checks)

Modern container orchestration tools like Kubernetes or Docker Swarm, provide an easy way to describe the configuration of your application in a YAML or JSON file depending on the tool. Kubernetes, backed by Google, has established itself as the de facto standard for container orchestration. It’s currently supported by key players as Google, Amazon Web Services (AWS), Microsoft Azure, IBM, Intel, Cisco, and RedHat.

A typical yaml file for a kubernetes deployment would like

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name:
kafka-client-producer
spec:
replicas:
1
template:
metadata:
labels:
app:
kafka-client-producer
version: v1
spec:
nodeSelector:
agentNode:
publicpool1
securityContext:
runAsUser: 1000
fsGroup: 2000
volumes:
- name: kafka-certificates
secret:
defaultMode:
256
secretName: kafka-certificates
- name: kafka-props
secret:
defaultMode:
256
secretName: kafka-props
containers:
- name: kafka-client-producer
volumeMounts:
- mountPath: /opt/certs/
name: kafka-certificates
- mountPath: /opt/conf/
name: kafka-props
image: confluentinc/cp-kafka:4.1.1-2
resources:
requests:
memory:
"256Mi"
cpu:
"400m"
limits:
memory:
"512Mi"
cpu:
"500m"
imagePullPolicy:
Always
livenessProbe:
exec:
command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 -q 2 localhost 2181 | grep imok']
initialDelaySeconds: 15
timeoutSeconds: 5
readinessProbe:
exec:

command: ['/bin/bash', '-c', 'echo "ruok" | nc -w 2 -q 2 localhost 2181 | grep imok']
volumeMounts:
- mountPath: /opt/certs/
name: kafka-certificates
- mountPath: /opt/conf/
name: kafka-props
envFrom:
- configMapRef:
name:
my-configmap
command:
["/bin/bash", "-c", "--"]
args: ["while true;do echo 'Genarating $NUM_RECORDS Messages' && kafka-run-class org.apache.kafka.tools.ProducerPerformance --topic $TOPIC --num-records $NUM_RECORDS --record-size $RECORD_SIZE --throughput $THROUGHPUT --producer-props bootstrap.servers=$KAFKA --producer.config $PRODUCER_CONFIG && echo $SLEEP $SLEEP_DURATION && sleep $SLEEP_DURATION ; done;"]

The one thing that interested me in kubernetes was its attempt to abstract everything a developer needs to run a large scale cluster.

For more on kuberntes, why kubernetes rule the next generation?

If you like my work, buy me a coffee.

--

--