Aveneu Park, Starling, Australia

Live 1?INTRODUCTION In industrial cloud platforms Docker

Live Migration of
Docker Containers using Checkpoint-Restore


Florida International University
[email protected]

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Florida International University
[email protected]

Florida International University
[email protected]




Because of lightweight virtualization techniques
virtual machines become more portable, more efficient and easy to manage.
Docker is an open source lightweight virtual container engine that enables
developers to package their applications and dependencies into a portable
container and then publish to any popular Linux machine for virtualization.
Containers execute in the user space on top of the OS kernel. Docker restricts
containers to run only one process at a time. Although the Docker container is
flexible, lightweight and easy to use, the Docker engine (the runtime system)
lacks a very common function for the conventional virtual machine manager
(entities that run the regular virtual machine): live migration of the
container. With the help of live migration, we can move Docker containers without
any process shut down as well as without having any other user or software
accessing this container noticing the migration.

In this paper, we provide a solution for a live
migration mechanism for containers running cluster computing frameworks for
large-scale data analytics in Docker, using a checkpoint and restore strategy
that stores checkpoints at specific intervals and uses this pre-saved
checkpoint to resume these containers to their previous state and allow the
live migration of these containers into other containers. We are running three
images of Busybox, Ubuntu MapReduce, Apache Spark and we are migrating their
jobs from source to destination container. Evaluation reveals that after
migration CPU and Memory usage drops down at source container and rise at
Destination container.


Live migration, CRIU, Docker container, Checkpointing


In industrial cloud
platforms Docker 1 has gained increasing popularity as a container engine in
recent years. Based on OS-level virtualization Docker serves as a composing
engine for Linux containers, where an application runs in an isolated
environment. Live migration of Docker container has been a topic of interest
because of many reasons such as:

Maintenance without downtime: To
automate live migration from one container to another container during
maintenance and replacement of Hardware.

Load balancing: By implementing
triggers or scheduling algorithms we can automate the migration of container to
rebalance load on Docker containers.

High Availability: Try to achieve
high availability in data centers and cloud platforms using live migration

Thus, in this paper we
proposed a solution of Docker live migration using checkpoint and restore. CRIU
2 is a software tool to allow checkpoint/restore processes for Linux. We can
save this status of running applications by using this tool so that it can
later resume its execution from the time of the checkpoint.

Figure1: Docker container live migration using
CRIU with Big data system MapReduce and Spark running inside it.

Figure 1 shows use case scenario of Docker
container live migration using CRIU. Using checkpoint/restore approach we can
achieve high availability solution which allows us to checkpoint the state of a
running container and restore it later on the same or a different host.

With the help of implementing this
checkpoint/restore feature to docker containers, Big data systems could be
deployed more convenient and high availability in the cloud infrastructure
based on container. Moreover, we are running a Big data system MapReduce and
Spark with WordCount function on real-world datasets.

MapReduce 7 usually divides the input data set
into separate blocks that are handled in a parallel way by the map task. The
input and output jobs are stored in a file system. The framework is responsible
for scheduling, monitoring, and re-performing failed tasks. Apache Spark 8 is
open-source cluster computing network, which provides an interface for fault tolerance
and data parallelism.

The solutions which we are providing in this
paper are mainly: 1) Live Migration of Docker containers using check-point and
restore. 2) Compare different applications Spark Streaming, MapReduce, Storm
performance losses due to live migration.


Live migration of containers has attracted
attention in recent years. We will first discuss about different migration
strategies for containers. 

Live migration using Pre-copy approach 3 is
very common solution for virtual machines and also default approach for
Xen-virtual systems. This approach will continuously copy all memory pages from
source to destination and also repeatedly update all dirty pages on the target
machine. But, limitation of this approach is that it will lead to infinite
iteration rate when dirty page production rate is higher than transfer rate.

In Post-copy approach 4 dirty pages are
transferred when it required by destination. However, Post-copy approach is
improved version of pre-copy approach, it has several limitations such as
unreliability and high downtime.

Check-point and
restore approach 5 for migration provides many benefits including fault
recovery by rolling back applications to a previous checkpoint, better response
time by restarting applications from checkpoints instead of from scratch, and
better system utilization by suspending jobs on demand. CRIU 2 is proving
support for live migration functionality in container technology such as
Docker, OpenVZ and LXC. In SpotOn 6 they also achieved lowest expected cost
for batch jobs by implementing fault-tolerance using checkpoint and restore

Since, there is no official migration tool
available for Docker, we are using external tool CRIU in our work. However, in
paper 7 authors mentioned some drawbacks of using CRIU for Docker live
migration:1) Corruption of layered file system inside container after
restoration on destination.2) It also reduces efficiency and robustness of
migration. In our work we will try to evaluate this claim and also try to
improve it.


In this section, we are going to explain the
live migration of Docker containers based on checkpoint and restore technology.

3.1 Proposed Approach

In our project, two containers are used for live
migration. One is the source container and other is the destination container
where we transfer our running applications from the source container.

The Live Migration Process of Docker containers
can be divided into three main stages:

Stage 1: Freeze the source container by
checkpointing. To perform the migration, first we build Docker containers in
both source node and destination node. Then we run the MapReduce and spark work
inside the source Docker container. After that freeze/lock source container by
blocking memory, processes, file system and network connections, and gets the
state of this container by checkpointing it. After checkpoint, the current MapReduce/spark
process is no longer running These checkpoints are stored and managed by
Docker, unless you specify a custom storage path.

Stage 2: Copy the image file. In the first
phase, the source container’s image file will be copied to the destination
node. A base image is required when to start a Docker container. But all the
change by later will not be recorded to the base image after starting this
container. Instead, these modifications will be seen and stored in additional
layers above the base image layer. So, for the first step, In this stage, we copy
the source image file to the destination Docker container.

Stage 3: Restore into the destination container.
In this stage, we restore the destination container and resume all the frozen
process from their last stage.


2: Sequence diagram for live migration in CRIU

3.2 Live Migration

We are running two Docker container with same
Docker image which is important step of live migration, since we need same
environment setup on destination container to do live migration.  Cluster computing frameworks for large-scale
data analytics like MapReduce and Spark running inside Docker containers.

Our live migration experiment use three
different images in three different containers each time to do the checkpoint
and restore:

image.  We use this image to print
numbers from 0 to infinite running in the source docker container, when it
start printing, we stop this container by checkpointing it and then create a
same docker container in destination note. So the printing stop by a
position(like stop from 9). In the destination node we restore this process and
then it start to printing numbers from previous position(like from 10 ).

Mapreduce. We use this image to wordcount a large scale of dataset, just like
the first experiment, we checkpoint and restore this image between source
docker container and destination docker container.

Spark. We use this image to wordcount a large scale of dataset, just like the
Mapreduce experiment, we checkpoint and restore this image between source
docker container and destination docker container.


4.1 Hadoop MapReduce

Hadoop MapReduce is a programming model for
parallel computing of large-scale datasets. A MapReduce usually divides the
input data set into separate blocks, that are handled in a parallel way by the
map task. The output of the various maps of the frame is then entered into the
reduce task. The usual input and output jobs are stored in a file system. The
framework is responsible for scheduling, monitoring, and re-performing failed

As the name suggests, the MapReduce algorithm
contains two important tasks: Map and Reduce. Map gets a set of data and
transforms it into another dataset, where a single element is broken into
tuples (key / value pairs).  As for
Reduce, reduce gets the output result from Map to be an input, then combines
the data tuples into many smaller set of data tuples. 9 On the other hand,
reduces the output of the map as input and groups the data tuples into smaller
tuples. In MapReduce, data is distributed on the cluster and processed.

4.2 Apache

Apache Spark is a fast, general-purpose
computing engine designed for large-scale data processing. Spark has the
benefits of Hadoop MapReduce; unlike MapReduce, however, the output of the Job
intermediate output can be stored in memory, eliminating the need to read and
write HDFS, so Spark is better suited for data mining and machine learning
MapReduce algorithms that require iteration. Spark takes MapReduce to a higher
level with less expensive Shuffle in data processing. With memory data storage
and near real-time processing power, Spark performs many times faster than
other big data processing technologies.

Spark can speed up the applications running time
in the Hadoop cluster up to 100 times faster in memory and can even speed up
the applications running time 10 times faster on disk. 10 Spark allows user
developers write programs quickly in Java, Scala, or Python. It comes with a
collection of more than 80 high-level operators. It also allows users to
interactively query data in the shell. In addition to Map and Reduce
operations, it also supports SQL queries, stream data, machine learning, and
chart data processing. 11 Developers can use a single capability in a single
data pipeline use case or combine these capabilities.

Hadoop MapReduce vs. Apache Spark

MapReduce is a great solution for all
calculations, but not very efficient for use cases that require multiple
calculations and algorithms. Each step in the data processing flow requires a
Map stage and a Reduce stage, and if you want to take advantage of this
solution, you need to convert all use cases to MapReduce mode.

If users want to do more complex work, user must
concatenate a series of MapReduce jobs in series and then execute the jobs
sequentially. Each job is highly latency-laden, and the next job can not start
until the previous job is completed.

Spark performs in-memory processing of data.
This in-memory processing is a faster process as there is no time spent in
moving the data/processes in and out of the disk, whereas MapReduce requires a
lot of time to perform these input/output operations thereby increasing

Spark has better performance when doing graph
processing, Spark uses a combination of Netty and Akka for distributing
messages throughout the executors. Spark contains a graph computation library
called GraphX which simplifies our life, which can make the graph processing
faster and low latency. 12 GraphX ??is the new (alpha) Spark API for graph computation
and parallel graph computation. Spark RDD has been extended by introducing the
Resilient Distributed Property Graph, a directed multiple graph with properties
at the vertices and edges. As traditional Mapreduce, graph processing is very
inefficient since it involves reading and writing data to the disk which
involves heavy I/O operations and data replication across the cluster for fault

We should think of Spark as a replacement for
Hadoop MapReduce instead of as a replacement for Hadoop. The intention is not
to replace Hadoop, but to provide a comprehensive and unified solution for
managing different big data use cases and requirements.


Hadoop Mapreduce

Apache Spark


Moving the
data/processes in and out of the disk

In-memory processing

Running time


100x Faster


Batch Processing

Real-time processing

Iterative Machine

Slow, heavy


Graph Processing

High latency

 In-built graph support

Table 1: Comparsion Between Hadoop Mapreduce and Apache Spark


Scala is a multi-paradigm programming language
like java, this language is designed to implement a scalable language and to
integrate various features of object-oriented programming and functional programming.
Each value in Scala is an object, including basic data types (Boolean values,
numbers, etc.), and even functions are objects. In addition, classes can be
subclassed, and Scala also provides mixin-based compositions. 13 All in all,
Scala is a functional object-oriented language that incorporates many of the
features never seen before while running above the JVM. As developers grow more
interested in Scala and more and more tools begin to support, the Scala
language is no doubt an essential tool for programming.


Experimental Setup

Operating System

Ubuntu 16.04


5 MB



Table 2: Experiment Hardware Details


We setup two VM for
Docker live migration.







512.00 GB


Ubuntu 16.04

512.00 GB

Table 3: Configuration of Machines for Live Migration


VM 1 is source machine where container 1 is
running and VM 2 is destination machine where we want to perform live
migration. In both source and host we have set up Docker 17.03.0-ce edition
along with experimental flag turn on, which is mandatory step as it allows live
migration. It can be enabled by editing the Docker/config.json file as below:

# /etc/docker/config.json
“experimental”: true
Then we restart Docker service to make all changes to be reflected.
After that we can check the Docker details like this:


version Client:
Version: 17.03.0-ce
API version:  1.26
Go version:   go1.7.5
Git commit:   3a232c8
Built:       Tue Feb 28
08:01:32 2017
OS/Arch:  linux/amd64

Version:                  17.03.0-ce
API version:  1.26 (minimum version 1.12)
Go version:   go1.7.5
Git commit:   3a232c8
Built:       Tue Feb 28
08:01:32 2017
OS/Arch:                 linux/amd64
Experimental: true

(Check-point and restore tool)

One of the important tool for checkpoint and
restore is CRIU. CRIU is built from https://criu.org/. Several dependencies
need to be installed for the CRIU before the source is been installed. After
the dependencies are been installed, we need to check if CRIU is been installed
properly which can be checked by the following command and it should give the output

#criu check

Output: Looks good


5.2 Experiment Design and Result

We have established passwordless ssh between VM1
and VM2 for live migration and IP direction.

First step, we are pulling Hadoop image from
docker website. Then we are running a small example on MapReduce on Docker
container. After that we have checkpoint the container state to save all
images. Then we have transferred checkpoint data on destination container. We
are using secure copy to automate the transfer of checkpoint data. Now we
already have one container on destination VM2 with restore ready state and then
we used restore command with stored checkpoint
state of source container to restart destination container.

Figure 3: CPU and Memory utilization for Busybox
image of source virtual machine

Figure 4: CPU and Memory utilization for Busybox
of image for destination virtual machine

Figure 5: CPU and Memory utilization for Spark
image of source virtual machine


Figure 6: CPU and Memory utilization for Spark
image of destination virtual machine


Figure 7: CPU and Memory utilization for
MapReduce image of source virtual machine



Figure 8: CPU and Memory utilization for
MapReduce image of destination virtual machine


The container based on light-weight
virtualization such as Docker, LXC, OpenVZ are more portable, efficient and
very easy to manage. In the paper we have proposed solution of Docker live
migration using checkpoint and restore using CRIU tool. With CRIU, we can
freeze the running application and checkpoint it to the memory as a collection
of files. By using those stored file we can restore and run the application
from the frozen point. We have studied some current checkpoint and restore
projects and processes to improve the migration performance. in our project we
are migrating three different processes, first is the simple counter program,
Hadoop MapReduce and Apache Spark. From this experiment we have learnt that
checkpoint and restore by using CRIU is not fully supported by Docker, so we
can improve the performance of live migration of CRIU in docker by changing and
improving some code.



Anon. Docker. Retrieved
December 12, 2017 from          https://www.docker.com/


Anon. Docker. Retrieved December 12, 2017
from https://criu.org/Docker


C. Clark, K. Fraser, S.
Hand, J.G. Hansen, E. Jul, C. Limpach, I. Pratt and A. Warfield, Live migration of virtual
machines, USENIX Symposium on Networked Systems
Design & Implementation, 2 (2005) 273-286.


M.R. Hines, U. Deshpande
and K. Gopalan, Post-copy live migration of virtual machines, ACM Sigops Operating Systems
Review, 43 (2009) 14-26.


Yang Chen. 2015. Checkpoint and Restore of Micro-service in Docker Containers. Proceedings of
the 3rd International Conference on Mechatronics and Industrial
Informatics (2015).


Supreeth Subramanya, Tian
Guo, Prateek Sharma, David Irwin, and Prashant
Shenoy. 2015. SpotOn. Proceedings of the Sixth ACM Symposium on Cloud
Computing – SoCC 15 (2015).


Anon.Retrieved December 12,
2017 from


Anon. Apache Spark™ –
Lightning-Fast Cluster Computing. Retrieved December 12, 2017 from


Anon. 2017. Apache Spark vs Hadoop: Choosing the Right
Framework. (November 2017). Retrieved December 12, 2017 from https://www.edureka.co/blog/apache-spark-vs-hadoop-mapreduce


Big Data Processing with Apache Spark – Part 1: Introduction. Retrieved
December 12, 2017 from


Anon. Apache Spark Introduction.
Retrieved December 12, 2017 from


Gonzalez, J. E., Xin, R.
S., Dave, A., Crankshaw, D., Franklin, M. J., & Stoica, I. (2014,
October). GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI (Vol. 14, pp. 599-613).


Anon. Object-Oriented Meets Functional. Retrieved
December 12, 2017 from https://www.scala-lang.org/


Jacques Cohen (Ed.). 1996. Special Issue: Digital Libraries. Commun. ACM 39,
11 (Nov. 1996).




I'm Simon!

Would you like to get a custom essay? How about receiving a customized one?

Check it out