01 Aug 2016, 20:00

Network emulation for Docker containers

TL;DR

Pumba netem delay and netem loss commands can emulate network delay and packet loss between Docker containers, even on single host. Give it a try!

Introduction

Microservice architecture has been adopted by software teams as a way to deliver business value faster. Container technology enables delivery of microservices into any environment. Docker has accelerated this by providing an easy to use toolset for development teams to build, ship, and run distributed applications. These applications can be composed of hundreds of microservices packaged in Docker containers.

In a recent NGINX survey [Finding #7], the “biggest challenge holding back developers” is the trade-off between quality and speed. As Martin Fowler indicates, testing strategies in microservices architecture can be very complex. Creating a realistic and useful testing environment is an aspect of this complexity.

One challenge is simulating network failures to ensure resiliency of applications and services.

The network is a critical arterial system for ensuring reliability for any distributed application. Network conditions are different depending on where the application is accessed. Network behavior can greatly impact the overall application availability, stability, performance, and user experience (UX). It’s critical to simulate and understand these impacts before the user notices. Testing for these conditions requires conducting realistic network tests.

After Docker containers are deployed in a cluster, all communication between containers happen over the network. These containers run on a single host, different hosts, different networks, and in different datacenters.

How can we test for the impact of network behavior on the application? What can we do to emulate different network properties between containers on a single host or among clusters on multiple hosts?

Pumba with Network Emulation

Pumba is a chaos testing tool for Docker containers, inspired by Netflix Chaos Monkey. The main benefit is that it works with containers instead of VMs. Pumba can kill, stop, restart running Docker containers or pause processes within specified containers. We use it for resilience testing of our distributed applications. Resilience testing ensures reliability of the system. It allows the team to verify their application recovers correctly regardless of any event (expected or unexpected) without any loss of data or functionality. Pumba simulates these events for distributed and containerized applications.

Pumba netem

We enhanced Pumba with network emulation capabilities starting with delay and packet loss. Using pumba netem command we can apply delay or packet loss on any Docker container. Under the hood, Pumba uses Linux kernel traffic control (tc) with netem queueing discipline. To work, we need to add iproute2 to Docker images, that we want to test. Some base Docker images already include iproute2 package.

Pumba netem delay and netem loss commands can emulate network delay and packet loss between Docker containers, even on a single host.

Linux has a built-in network emulation capabilities, starting from kernel 2.6.7 (released 14 years ago). Linux allows us to manipulate traffic control settings, using tc tool, available in iproute2; netem is an extension (queueing discipline) of the tc tool. It allows emulation of network properties — delay, packet loss, packer reorder, duplication, corruption, and bandwidth rate.

Pumba netem commands can help development teams simulate realistic network conditions as they build, ship, and run microservices in Docker containers.

Pumba with low level netem options, greatly simplifies its usage. We have made it easier to emulate different network properties for running Docker containers.

In the current release, Pumba modifies egress traffic only by adding delay or packet loss for specified container(s). Target containers can be specified by name (single name or as a space separated list) or via regular expression (RE2). Pumba modifies container network conditions for a specified duration. After a set time interval, Pumba restores normal network conditions. Pumba also restores the original connection with a graceful shutdown of the pumba process Ctrl-C or by stopping the Pumba container with docker stop command. An option is available to apply an IP range filter to the network emulation. With this option, Pumba will modify outgoing traffic for specified IP and will leave other outgoing traffic unchanged. Using this option, we can change network properties for a specific inter-container connection(s) as well as specific Docker networks — each Docker network has its own IP range.

Pumba delay: netem delay

To demonstrate, we’ll run two Docker containers: one is running a ping command and the other is Pumba Docker container, that adds 3 seconds network delay to the ping container for 1 minute. After 1 minute, Pumba container restores the network connection properties of the ping container as it exits gracefully.

# open two terminal windows: (1) and (2)

# terminal (1)
# create new 'tryme' Alpine container (with iproute2) and ping `www.example.com`
$ docker run -it --rm --name tryme alpine sh -c "apk add --update iproute2 && ping www.example.com"

# terminal (2)
# run pumba: add 3s delay to `tryme` container for 1m
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
         pumba netem --interface eth0 --duration 1m delay --time 3000 tryme

# See `ping` delay increased by 3000ms for 1 minute
# You can stop Pumba earlier with `Ctrl-C`

netem delay examples

This section contains more advanced network emulation examples for delay command.

# add 3 seconds delay for all outgoing packets on device `eth0` (default) of `mydb` Docker container for 5 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --duration 5m \
      delay --time 3000 \
      mydb
# add a delay of 3000ms ± 30ms, with the next random element depending 20% on the last one,
# for all outgoing packets on device `eth1` of all Docker container, with name start with `hp`
# for 10 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --duration 5m --interface eth1 \
      delay \
        --time 3000 \
        --jitter 30 \
        --correlation 20 \
      re2:^hp
# add a delay of 3000ms ± 40ms, where variation in delay is described by `normal` distribution,
# for all outgoing packets on device `eth0` of randomly chosen Docker container from the list
# for 10 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba --random \
      netem --duration 5m \
        delay \
          --time 3000 \
          --jitter 40 \
          --distribution normal \
        container1 container2 container3

Pumba packet loss: netem loss, netem loss-state, netem loss-gemodel

Lets start with packet loss demo. Here we will run three Docker containers. iperf server and client for sending data and Pumba Docker container, that will add packer loss on client container. We are using perform network throughput tests tool iperf to demonstrate packet loss.

# open three terminal windows

# terminal (1) iperf server
# server: `-s` run in server mode; `-u` use UDP;  `-i 1` report every second
$ docker run -it --rm --name tryme-srv alpine sh -c "apk add --update iperf && iperf -s -u -i 1"

# terminal (2) iperf client
# client: `-c` client connects to <server ip>; `-u` use UDP
$ docker run -it --rm --name tryme alpine sh -c "apk add --update iproute2 iperf && iperf -c 172.17.0.3 -u"

# terminal (3)
# run pumba: add 20% packet loss to `tryme` container for 1m
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
         pumba netem --duration 1m loss --percent 20 tryme

# See server report on terminal (1) 'Lost/Total Datagrams' - should see lost packets there

It is generally understood that packet loss distribution in IP networks is “bursty”. To simulate more realistic packet loss events, different probability models are used. Pumba currently supports 3 different loss probability models for *packet loss”. Pumba defines separate loss command for each probability model. - loss - independent probability loss model (Bernoulli model); it’s the most widely used loss model where packet losses are modeled by a random process consisting of Bernoulli trails - loss-state - 2-state, 3-state and 4-state State Markov models - loss-gemodel - Gilbert and Gilbert-Elliott models

Papers on network packer loss models: - “Indepth: Packet Loss Burstiness” link - “Definition of a general and intuitive loss model for packet networks and its implementation in the Netem module in the Linux kernel.” link - man netem link

netem loss examples

# loss 0.3% of packets
# apply for `eth0` network interface (default) of `mydb` Docker container for 5 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --duration 5m \
      loss --percent 0.3 \
      mydb
# loss 1.4% of packets (14 packets from 1000 will be lost)
# each successive probability (of loss) depends by a quarter on the last one
#   Prob(n) = .25 * Prob(n-1) + .75 * Random
# apply on `eth1` network interface  of Docker containers (name start with `hp`) for 15 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --interface eth1 --duration 15m \
      loss --percent 1.4 --correlation 25 \
      re2:^hp
# use 2-state Markov model for packet loss probability: P13=15%, P31=85%
# apply on `eth1` network interface of 3 Docker containers (c1, c2 and c3) for 12 minutes

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --interface eth1 --duration 12m \
      loss-state -p13 15 -p31 85 \
      c1 c2 c3
# use Gilbert-Elliot model for packet loss probability: p=5%, r=90%, (1-h)=85%, (1-k)=7%
# apply on `eth2` network interface of `mydb` Docker container for 9 minutes and 30 seconds

$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba \
    pumba netem --interface eth2 --duration 9m30s \
      loss-gemodel --pg 5 --pb 90 --one-h 85 --one-k 7 \
      mydb

Contribution

Special thanks to Neil Gehani for helping me with this post and to Inbar Shani for initial Pull Request with netem command.

Next

To see more examples on how to use Pumba with [netem] commands, please refer to the Pumba GitHub Repository. We have open sourced it. We gladly accept ideas, pull requests, issues, or any other contributions.

Pumba can be downloaded as precompiled binary (Windows, Linux and MacOS) from the GitHub project release page. It’s also available as a Docker image.

Pumba GitHub Repository

16 Apr 2016, 20:00

Pumba - Chaos Testing for Docker

Update (27-07-27): Updated post to latest v0.2.0 Pumba version change.

Introduction

The best defense against unexpected failures is to build resilient services. Testing for resiliency enables the teams to learn where their apps fail before the customer does. By intentionally causing failures as part of resiliency testing, you can enforce your policy for building resilient systems. Resilience of the system can be defined as its ability to continue functioning even if some components of the system are failing - ephemerality. Growing popularity of distributed and microservice architecture makes resilience testing critical for applications that now require 24x7x365 operation. Resilience testing is an approach where you intentionally inject different types of failures at the infrastructure level (VM, network, containers, and processes) and let the system try to recover from these unexpected failures that can happen in production. Simulating realistic failures at any time is the best way to enforce highly available and resilient systems.

What is Pumba?

Pumba

First of all, Pumba (or Pumbaa) is a supporting character from Disney’s animated film The Lion King. In Swahili, pumbaa means “to be foolish, silly, weak-minded, careless, negligent”. This reflects the unexpected behavior of the application.

Pumba is inspired by highly popular Netfix Chaos Monkey resilience testing tool for AWS cloud. Pumba takes a similar approach but applies it at the container level. It connects to the Docker daemon running on some machine (local or remote) and brings a level of chaos to it: “randomly” killing, stopping, and removing running containers.

If your system is designed to be resilient, it should be able to recover from such failures. “Failed” services should be restarted and lost connections should be recovered. This is not as trivial as it sounds. You need to design your services differently. Be aware that a service can fail (for whatever reason) or service it depends on can disappear at any point of time (but can reappear later). Expect the unexpected!

Why run Pumba?

Failures happen and they inevitably happen when least desired. If your application cannot recover from system failures, you are going to face angry customers and maybe even loose them. If you want to be sure that your system is able to recover from unexpected failures, it would be better to take charge of them and inject failures yourself instead of waiting till they happen. This is not a one time effort. In age of Continuous Delivery, you need to be sure that every change to any one of system services, does not compromise system availability. That’s why you should practice continuous resilience testing. With Docker gaining popularity as people are deploying and running clusters of containers in production. Using a container orchestration network (e.g. Kubernetes, Swarm, CoreOS fleet), it’s possible to restart a “failed” container automatically. How can you be sure that restarted services and other system services can properly recover from failures? If you are not using container orchestration frameworks, life is even harder: you will need to handle container restarts by yourself.

This is where Pumba shines. You can run it on every Docker host, in your cluster, and Pumba will “randomly” stop running containers - matching specified name/s or name patterns. You can even specify the signal that will be sent to “kill” the container.

What Pumba can do?

Pumba can create different failures for your running Docker containers. Pumba can kill, stop or remove running containers. It can also pause all processes withing running container for specified period of time. Pumba can also do network emulation, simulating different network failures, like: delay, packet loss/corruption/reorder, bandwidth limits and more. Disclaimer: netem command is under development and only delay command is supported in Pumba v0.2.0.

You can pass list of containers to Pumba or just write a regular expression to select matching containers. If you will not specify containers, Pumba will try to disturb all running containers. Use --random option, to randomly select only one target container from provided list.

How to run Pumba?

There are two ways to run Pumba.

First, you can download Pumba application (single binary file) for your OS from project release page and run pumba help to see list of supported commands and options.

$ pumba help

Pumba version v0.2.0
NAME:
   Pumba - Pumba is a resilience testing tool, that helps applications tolerate random Docker container failures: process, network and performance.

USAGE:
   pumba [global options] command [command options] containers (name, list of names, RE2 regex)

VERSION:
   v0.2.0

COMMANDS:
     kill     kill specified containers
     netem    emulate the properties of wide area networks
     pause    pause all processes
     stop     stop containers
     rm       remove containers
     help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --host value, -H value      daemon socket to connect to (default: "unix:///var/run/docker.sock") [$DOCKER_HOST]
   --tls                       use TLS; implied by --tlsverify
   --tlsverify                 use TLS and verify the remote [$DOCKER_TLS_VERIFY]
   --tlscacert value           trust certs signed only by this CA (default: "/etc/ssl/docker/ca.pem")
   --tlscert value             client certificate for TLS authentication (default: "/etc/ssl/docker/cert.pem")
   --tlskey value              client key for TLS authentication (default: "/etc/ssl/docker/key.pem")
   --debug                     enable debug mode with verbose logging
   --json                      produce log in JSON format: Logstash and Splunk friendly
   --slackhook value           web hook url; send Pumba log events to Slack
   --slackchannel value        Slack channel (default #pumba) (default: "#pumba")
   --interval value, -i value  recurrent interval for chaos command; use with optional unit suffix: 'ms/s/m/h'
   --random, -r                randomly select single matching container from list of target containers
   --dry                       dry runl does not create chaos, only logs planned chaos commands
   --help, -h                  show help
   --version, -v               print the version

Kill Container command

$ pumba kill -h

NAME:
   pumba kill - kill specified containers

USAGE:
   pumba kill [command options] containers (name, list of names, RE2 regex)

DESCRIPTION:
   send termination signal to the main process inside target container(s)

OPTIONS:
   --signal value, -s value  termination signal, that will be sent by Pumba to the main process inside target container(s) (default: "SIGKILL")

Pause Container command

$ pumba pause -h

NAME:
   pumba pause - pause all processes

USAGE:
   pumba pause [command options] containers (name, list of names, RE2 regex)

DESCRIPTION:
   pause all running processes within target containers

OPTIONS:
   --duration value, -d value  pause duration: should be smaller than recurrent interval; use with optional unit suffix: 'ms/s/m/h'

Stop Container command

$ pumba stop -h
NAME:
   pumba stop - stop containers

USAGE:
   pumba stop [command options] containers (name, list of names, RE2 regex)

DESCRIPTION:
   stop the main process inside target containers, sending  SIGTERM, and then SIGKILL after a grace period

OPTIONS:
   --time value, -t value  seconds to wait for stop before killing container (default 10) (default: 10)

Remove (rm) Container command

$ pumba rm -h

NAME:
   pumba rm - remove containers

USAGE:
   pumba rm [command options] containers (name, list of names, RE2 regex)

DESCRIPTION:
   remove target containers, with links and voluems

OPTIONS:
   --force, -f    force the removal of a running container (with SIGKILL)
   --links, -l    remove container links
   --volumes, -v  remove volumes associated with the container

Network Emulation (netem) command

$ pumba netem -h

NAME:
   Pumba netem - delay, loss, duplicate and re-order (run 'netem') packets, to emulate different network problems

USAGE:
   Pumba netem command [command options] [arguments...]

COMMANDS:
     delay      dealy egress traffic
     loss
     duplicate
     corrupt

OPTIONS:
   --duration value, -d value   network emulation duration; should be smaller than recurrent interval; use with optional unit suffix: 'ms/s/m/h'
   --interface value, -i value  network interface to apply delay on (default: "eth0")
   --target value, -t value     target IP filter; netem will impact only on traffic to target IP
   --help, -h                   show help

NAME:
   Pumba netem - delay, loss, duplicate and re-order (run 'netem') packets, to emulate different network problems

USAGE:
   Pumba netem command [command options] [arguments...]

COMMANDS:
     delay      dealy egress traffic
     loss       TODO: planned to implement ...
     duplicate  TODO: planned to implement ...
     corrupt    TODO: planned to implement ...

OPTIONS:
   --duration value, -d value   network emulation duration; should be smaller than recurrent interval; use with optional unit suffix: 'ms/s/m/h'
   --interface value, -i value  network interface to apply delay on (default: "eth0")
   --target value, -t value     target IP filter; netem will impact only on traffic to target IP
   --help, -h                   show help

Network Emulation Delay sub-command

$ pumba netem delay -h

NAME:
   Pumba netem delay - dealy egress traffic

USAGE:
   Pumba netem delay [command options] containers (name, list of names, RE2 regex)

DESCRIPTION:
   dealy egress traffic for specified containers; networks show variability so it is possible to add random variation; delay variation isn't purely random, so to emulate that there is a correlation

OPTIONS:
   --amount value, -a value       delay amount; in milliseconds (default: 100)
   --variation value, -v value    random delay variation; in milliseconds; example: 100ms ± 10ms (default: 10)
   --correlation value, -c value  delay correlation; in percents (default: 20)

Examples

# stop random container once in a 10 minutes
$ ./pumba --random --interval 10m kill --signal SIGSTOP
# every 15 minutes kill `mysql` container and every hour remove containers starting with "hp"
$ ./pumba --interval 15m kill --signal SIGTERM mysql &
$ ./pumba --interval 1h rm re2:^hp &
# every 30 seconds kill "worker1" and "worker2" containers and every 3 minutes stop "queue" container
$ ./pumba --interval 30s kill --signal SIGKILL worker1 worker2 &
$ ./pumba --interval 3m stop queue &
# Once in 5 minutes, Pumba will delay for 2 seconds (2000ms) egress traffic for some (randomly chosen) container,
# named `result...` (matching `^result` regexp) on `eth2` network interface.
# Pumba will restore normal connectivity after 2 minutes. Print debug trace to STDOUT too.
$ ./pumba --debug --interval 5m --random netem --duration 2m --interface eth2 delay --amount 2000 re2:^result

Running Pumba in Docker Container

The second approach to run it in a Docker container.

In order to give Pumba access to Docker daemon on host machine, you will need to mount var/run/docker.sock unix socket.

# run latest stable Pumba docker image (from master repository)
$ docker run -d -v /var/run/docker.sock:/var/run/docker.sock gaiaadm/pumba:master pumba kill --interval 10s --signal SIGTERM ^hp

Pumba will not kill its own container.

Note: For Mac OSX - before you run Pumba, you may want to do the following after downloading the pumba_darwin_amd64 binary:

chmod +x pumba_darwin_amd64
mv pumba_darwin_amd64 /usr/local/bin/pumba
pumba

Next

The Pumba project is available for you to try out. We will gladly accept ideas, pull requests, issues, and contributions to the project.

Pumba GitHub Repository

07 Mar 2016, 16:39

Testing Strategies for Docker Containers

Congratulations! You know how to build a Docker image and are able to compose multiple containers into a meaningful application. Hopefully, you’ve already created a Continuous Delivery pipeline and know how to push your newly created image into production or testing environment.

Now, the question is - How do we test our Docker containers?

There are multiple testing strategies we can apply. In this post, I’ll highlight them presenting benefits and drawbacks for each.

The “Naive” approach

This is the default approach for most people. It relies on a CI server to do the job. When taking this approach, the developer is using Docker as a package manager, a better option than the jar/rpm/deb approach. The CI server compiles the application code and executes tests (unit, service, functional, and others). The build artifacts are reused in Docker build to produce a new image. This becomes a core deployment artifact. The produced image contains not only application “binaries”, but also a required runtime including all dependencies and application configuration.

We are getting application portability, however, we are loosing the development and testing portability. We’re not able to reproduce exactly the same development and testing environment outside the CI. To create a new test environment we’ll need to setup the testing tools (correct versions and plugins), configure runtime and OS settings, and get the same versions of test scripts as well as perhaps, the test data.

The Naive Testing Strategy

To resolve these problems leads us to the next one.

App & Test Container approach

Here, we try to create a single bundle with the application “binaries” including required packages, testing tools (specific versions), test tools plugins, test scripts, test environment with all required packages.

The benefits of this approach:

  • We have a repeatable test environment - we can run exactly the same tests using the same testing tools - in our CI, development, staging, or production environment
  • We capture test scripts at a specific point in time so we can always reproduce them in any environment
  • We do not need to setup and configure our testing tools - they are part of our image

This approach has significant drawbacks:

  • Increases the image size - because it contains testing tools, required packages, test scripts, and perhaps even test data
  • Pollutes image runtime environment with test specific configuration and may even introduce an unneeded dependency (required by integration testing)
  • We also need to decide what to do with the test results and logs; how and where to export them

Here’s a simplified Dockerfile. It illustrates this approach.

FROM "<bases image>":"<version>"

WORKDIR "<path>"

# install packages required to run app and tests
RUN apt-get update && apt-get install -y \
    "<app runtime> and <dependencies>" \  # add app runtime and required packages
    "<test tools> and <dependencies>" \     # add testing tools and required packages
    && rm -rf /var/lib/apt/lists/*

# copy app files
COPY app app
COPY run.sh run.sh

# copy test scripts
COPY tests tests

# copy "main" test command
COPY test.sh test.sh

# ... EXPOSE, RUN, ADD ... for app and test environment

# main app command
CMD [run.sh, "<app arguments>"]

# it's not possible to have multiple CMD commands, but this is the "main" test command
# CMD [/test.sh, "<test arguments>"]

Dockerfile

App & Test Container

There has to be a better way for in-container testing and there is.

Test Aware Container Approach

Today, Docker’s promise is “Build -> Ship -> Run” - build the image, ship it to some registry, and run it anywhere. IMHO there’s a critical missing step - Test. The right and complete sequence should be :Build -> Test -> Ship -> Run.

Let’s look at a “test-friendly” Dockerfile syntax and extensions to Docker commands. This important step could be supported natively. It’s not a real syntax, but bear with me. I’ll define the “ideal” version and show how to implement something that’s very close.

ONTEST [INSTRUCTION]

Let’s define a special ONTEST instruction, similar to existing ONBUILD instruction. The ONTEST instruction adds a trigger instruction to the image to be executed at a later time when the image is tested. Any build instruction can be registered as a trigger.

The ONTEST instruction should be recognized by a new docker test command.

docker test [OPTIONS] IMAGE [COMMAND] [ARG...]

The docker test command syntax will be similar to docker run command, with one significant difference: a new “testable” image will be automatically generated and even tagged with <image name>:<image tag>-test tag (“test” postfix added to the original image tag). This “testable” image will generated FROM the application image, executing all build instructions, defined after ONTEST command and executing ONTEST CMD (or ONTEST ENTRYPOINT). The docker test command should return a non-zero code if any tests fail. The test results should be written into an automatically generated VOLUME that points to /var/tests/results folder.

Let’s look at a modified Dockerfile below - it includes the new proposed ONTEST instruction.

FROM "<base image>":"<version>"

WORKDIR "<path>"

# install packages required to run app
RUN apt-get update && apt-get install -y \
    "<app runtime> and <dependencies>" \  # add app runtime and required packages
    && rm -rf /var/lib/apt/lists/*

# install packages required to run tests   
ONTEST RUN apt-get update && apt-get install -y \
           "<test tools> and <dependencies>"    \     # add testing tools and required packages
           && rm -rf /var/lib/apt/lists/*

# copy app files
COPY app app
COPY run.sh run.sh

# copy test scripts
ONTEST COPY tests tests

# copy "main" test command
ONTEST COPY test.sh test.sh

# auto-generated volume for test results
# ONTEST VOLUME "/var/tests/results"

# ... EXPOSE, RUN, ADD ... for app and test environment

# main app command
CMD [run.sh, "<app arguments>"]

# main test command
ONTEST CMD [/test.sh, "<test arguments>"]

Dockerfile

Test Aware Container

Making “Test Aware Container” Real

We believe Docker should make docker-test part of the container management lifecycle. There is a need to have a simple working solution today and I’ll describe one that’s very close to the ideal state.

As mentioned before, Docker has a very useful ONBUILD instruction. This instruction allows us to trigger another build instruction on succeeding builds. The basic idea is to use ONBUILD instruction when running docker-test command.

The flow executed by docker-test command:

  1. docker-test will search for ONBUILD instructions in application Dockerfile and will …
  2. generate a temporary Dockerfile.test from original Dockerfile
  3. execute docker build -f Dockerfile.test [OPTIONS] PATH with additional options supported by docker build command: -test that will be automatically appended to tag option
  4. If build is successful, execute docker run -v ./tests/results:/var/tests/results [OPTIONS] IMAGE:TAG-test [COMMAND] [ARG...]
  5. Remove Dockerfile.test file

Why not create a new Dockerfile.test without requiring the ONBUILD instruction?

Because in order to test right image (and tag) we’ll need to keep FROM always updated to image:tag that we want to test. This is not trivial.

There is a limitation in the described approach - it’s not suitable for “onbuild” images (images used to automatically build your app), like Maven:onbuild

Let’s look at a simple implementation of docker-test command. It highlights the concept: the docker-test command should be able to handle build and run command options and be able to handle errors properly.

#!/bin/bash
image="app"
tag="latest"

echo "FROM ${image}:${tag}" > Dockerfile.test &&
docker build -t "${image}:${tag}-test" -f Dockerfile.test . &&
docker run -it --rm -v $(pwd)/tests/results:/var/tests/results "${image}:${tag}-test" &&
rm Dockerfile.test

Let’s focus on the most interesting and relevant part.

Integration Test Container

Let’s say we have an application built from tens or hundreds of microservices. Let’s say we have an automated CI/CD pipeline, where each microservice is built and tested by our CI and deployed into some environment (testing, staging or production) after the build and tests pass. Pretty cool, eh? Our CI tests are capable of testing each microservice in isolation - running unit and service tests (or API contract tests). Maybe even micro-integration tests - tests run on subsystem are created in ad-hoc manner (for example with docker compose help).

This leads to some issues that we need to address:

  • What about real integration tests or long running tests (like performance and stress)?
  • What about resilience tests (“chaos monkey” like tests)?
  • Security scans?
  • What about test and scan activities that take time and should be run on a fully operational system?

There should be a better way than just dropping a new microservice version into production and tightly monitoring it for a while.

There should be a special Integration Test Container. These containers will contain only testing tools and test artifacts: test scripts, test data, test environment configuration, etc. To simplify orchestration and automation of such containers, we should define and follow some conventions and use metadata labels (Dockerfile LABEL instruction).

Integration Test Labels

  • test.type - test type; default integration; can be one of: integration, performance, security, chaos or any text; presence of this label states that this is an Integration Test Container
  • test.results - VOLUME for test results; default /var/tests/results
  • test.XXX - any other test related metadata; just use test. prefix for label name

Integration Test Container

The Integration Test Container is just a regular Docker container. Tt does not contain any application logic and code. Its sole purpose is to create repeatable and portable testing. Recommended content of the Integration Test Container:

  • The Testing Tool - Phantom.js, Selenium, Chakram, Gatling, …
  • Testing Tool Runtime - Node.js, JVM, Python, Ruby, …
  • Test Environment Configuration - environment variables, config files, bootstrap scripts, …
  • Tests - as compiled packages or script files
  • Test Data - any kind of data files, used by tests: json, csv, txt, xml, …
  • Test Startup Script - some “main” startup script to run tests; just create test.sh and launch the testing tool from it.

Integration Test Containers should run in an operational environment where all microservices are deployed: testing, staging or production. These containers can be deployed exactly as all other services. They use same network layer and thus can access multiple services; using selected service discovery method (usually DNS). Accessing multiple services is required for real integration testing - we need to simulate and validate how our system is working in multiple places. Keeping integration tests inside some application service container not only increases the container footprint but also creates an unneeded dependency between multiple services. We keep all these dependencies at the level of the Integration Test Container. Once our tests (and testing tools) are packaged inside the container, we can always rerun the same tests on any environment including the developer machine. You can always go back in time and rerun a specific version of Integration Test Container.

Integration Test Container

WDYT? Your feedback, particularly on standardizing the docker-test command, is greatly appreciated.

14 Jan 2016, 16:14

Docker Pattern: The Build Container

Let’s say that you’re developing a microservice in a compiled language or an interpreted language that requires some additional “build” steps to package and lint your application code. This is a useful docker pattern for the “build” container.

In our project, we’re using Docker as our main deployment package: every microservice is delivered as a Docker image. Each microservice also has it’s own code repository (GitHub), and its own CI build job.

Some of our services are written in Java. Java code requires additional tooling and processes before you get working code. This tooling and all associated packages are depended from and not required when a compiled program is running.

We have been trying to develop a repeatable process and uniform environment for deploying our services. We believe it’s good to package Java tools and packages into containers too. It allows us to build a Java-based microservice on any machine, including CI, without any specific environment requirements: JDK version, profiling and testing tools, OS, Maven, environment variables, etc.

For every service, we have two(2) Dockerfiles: one for service runtime and the second packed with tools required to build the service. We usually name these files as Dockerfile and Dockerfile.build. We are using -f, --file="" flag to specify Dockerfile file name for docker build command.

Here is our Dockerfile.build file:

FROM maven:3.3.3-jdk-8

ENV GAIA_HOME=/usr/local/gaia/

RUN mkdir -p $GAIA_HOME
WORKDIR $GAIA_HOME

# speed up maven build, read https://keyholesoftware.com/2015/01/05/caching-for-maven-docker-builds/
# selectively add the POM file
ADD pom.xml $GAIA_HOME

# get all the downloads out of the way
RUN ["mvn","verify","clean","--fail-never"]

# add source
ADD . $GAIA_HOME

# run maven verify
RUN ["mvn","verify"]

As you can see, it’s a simple file with one little trick to speed up our Maven build.

Now, we have all the tools required to compile our service. We can run this Docker container on any machine without requiring to have JDK installed. We can run the same Docker container on a developer’s laptop and on our CI server.

This trick greatly simplifies our CI process - we no longer require our CI to support any specific compiler, version, or tool. All we need is the Docker engine. Everything else, we bring ourselves.

BYOT - Bring Your Own Tooling! :-)

In order to compile the service code, we need to build and run the builder container.

docker build -t build-img -f Dockerfile.build
docker create --name build-cont build-img

Once we’ve built the image and created a new container from this image, we have our service compiled inside the container. The only remaining task is to extract build artifacts from the container. We could use Docker volumes - this is one possible option. Actually, we like that the image we’ve created, contains all the tools and build artifacts inside it. It allows us to get any build artifacts from this image at anytime, just by creating a new container from it.

To extract our build artifacts, we are using docker cp command. This command copies files from the container to the local file system.

Here is how we are using this command:

docker cp build-cont:/usr/local/gaia/target/mgs.war ./target/mgs.war

As a result, we have a compiled service code, packaged into single WAR file. We get exactly the same WAR file on any machine just by running our builder container, or by rebuilding the builder container against the same code commit (using Git tag or specific commit ID) on any machine.

We can now create a Docker image with our service and required runtime, which is usually some version of JRE and servlet container.

Here is our Dockerfile for the service. It’s an image with Jetty JRE8 and our service WAR file.

FROM jetty:9.3.0-jre8

RUN mkdir -p /home/jetty && chown jetty:jetty /home/jetty

COPY ./target/*.war $JETTY_BASE/webapps/

By running docker build ., we have a new image with our newly “built” service.

The Recipe:

  • Have one Dockerfile with all tools and packages required to build your service. Name it Dockerfile.build or give it a name you like.
  • Have another Dockerfile with all packages required to run your service.
  • Keep both files with your service code.
  • Build a new builder image, create a container from it and extract build artifacts, using volumes or docker cp command.
  • Build the service image.
  • That’s all folks!

Summary

In our case, Java-based service, the difference between builder container and service container is huge. Java JDK is much bigger package than JRE: it also requires all X Window packages installed inside your container. For the runtime, you can have a slim image with your service code, JRE, and some basic Linux packages, or even start from scratch.

30 Dec 2015, 11:09

Docker Pattern: Deploy and update dockerized application on cluster

Docker is great technology that simplifies development and deployment of distributed applications.

While Docker serves as a core technology, many issues remain to be solved. We find ourselves struggling with some of these issues. For example:

  • How to create an elastic Docker cluster?
  • How to deploy and connect multiple containers together?
  • How to build CI/CD process?
  • How to register and discover system services and more?

For most, there are open source projects, or services available from the community as well as commercially, including from Docker, Inc.

In this post, I would like to address one such problem:

How to automate Continuous Delivery for a Docker cluster?

In our application, every service is packaged as a Docker image. Docker image is our basic deployment unit. In a runtime environment (production, testing and others), we may have multiple containers, created from this image. For each service, we have a separate Git repository and one or more Dockerfile(s). We use this for building, testing, and packaging - will explain our project structure in the next blog post.

We’ve setup an automated Continuous Integration(CI) process. We use CircleCI. For every push to some branch for the service in a Git repository, the CI service triggers a new Docker build process and creates a new Docker image for the service. As part of the build process, if everything compiles and all unit and component tests pass, we push a newly created Docker image to our DockerHub repository.

At the end of CI process, we have a newly tested and ‘kosher’ Docker image in our DockerHub repository. Very nice indeed! However, we are left with several questions:

  • How to perform a rolling update of a modified service?
  • What is the target environment (i.e. some Docker cluster)?
  • How to find the ip of a dynamically created CoreOS host (we use AWS auto-scale groups)?
  • How to connect to it in a secure way without the need to expose the SSH keys or cloud credentials (we use AWS for infrastructure)?

Our application runs on a CoreOS cluster. We have automation scripts that creates the CoreOS cluster on multiple environments: developer machine, AWS, or some VM infrustructure. When we have a new service version (i.e. Docker image), we need to find a way to deploy this service to the cluster. CoreOS uses fleet for cluster orchestration. Fleetctl is a command line client that can talk with the fleet backend and allows you to manage the CoreOS cluster. However, this only works in a local environment (machine or network). For some commands, fleetctl uses HTTP API and for others an SSH connection. The HTTP API is not protected so it makes no sense to expose it from your cloud environment. The SSH connection does not work when your architecture assumes there is no direct SSH connection to the cluster. Connecting through some SSH bastion machine, requiring the use of multiple SSH keys, and creating SSH configuration files, are not supported by the fleetctl program. I have no desire to store SSH keys or my cloud credentials for my production environment on any CI server due to security concerns.

So, what do we do?

First, we want to deploy a new service or update some service when there is a new Docker image or image version available. We also want to be selective and pick images created from code in a specific branch or even specific build.

The basic idea is to create a Docker image that captures the system configuration. This image stores the configuration of all services at a specific point in time and from specific branches. The container created from this image should be run from within the target cluster. Besides captured configuration, we also have the deployment tool (fleetctl in our case), plus some code that detects services which need to be updated, deleted, or installed as a new service.

This idea leads to another question:

How do you define and capture system configuration?

In CoreOS, every service can be described in systemd unit files. This is a plain text file that describes how and when to launch your service. I’m not going to explain how to write such files. There is a lot of info online. What’s important in our case, the service systemd unit file contains a docker run command with parameters and image name that needs to be downloaded and executed. We keep the service systemd unit file in the same repository as the service code.

The image name: usually defined as a repository/image_name:tag string. Tag is the most important thing in our solution. As explained above, our CI server automatically builds a new Docker image on every push of the service to the GitHub repository. The CI job also tags the newly created image with 2 tags: 1. branch tag — taken from GitHub branch that triggered the build (master, develop, feature-* braches in our case) 2. build_num-branch tag — where we add a running build number prefix, just before branch name As a result, in DockerHub, we have images for the latest build in any branch and also for every image we can identify the build job number and the branch it was created from.

As I said before, we keep service systemd unit file in the same repository as code. This file does not contain an image tag, only the repository and image name. See example below:

ExecStart=/usr/bin/docker run —name myservice -p 3000:3000 myrepo/myservice

Our CI service build job generates a new service systemd unit file for every successful build, replacing the above service invocation command with one that also contains a new tag. Using build_num-branch pattern (develop branch in our example), we use two simple utilities for this job: cat and sed. It’s possible to use a more advanced template engine.

ExecStart=/usr/bin/docker run — name myservice -p 3000:3000 myrepo/myservice:58-develop

As a last step, the CI build job “deploys” this new unit file to the system configuration Git repository.

git commit -am “Updating myservice tag to ‘58-develop’” myservice.service
git push origin develop

Another CI job that monitors changes in the system configuration repository will trigger a build and create a new Docker image that captures updated system configuration.

All we need to do now is to execute this Docker image on the Docker cluster. Something like this:

docker run -it —-rm=true myrepo/mysysconfig:develop

Our system configuration Dockerfile:

FROM alpine:3.2
MAINTAINER Alexei Ledenev <alexei.led@gmail.com>

ENV FLEET_VERSION 0.11.5

ENV gaia /home/gaia
RUN mkdir -p ${gaia}

WORKDIR ${gaia}
COPY *.service ./
COPY deploy.sh ./
RUN chmod +x deploy.sh

# install packages and fleetctl client
RUN apk update && \
 apk add curl openssl bash && \
 rm -rf /var/cache/apk/* && \
 curl -L “https://github.com/coreos/fleet/releases/download/v${FLEET_VERSION}/fleet-v${FLEET_VERSION}-linux-amd64.tar.gz" | tar xz && \
 mv fleet-v${FLEET_VERSION}-linux-amd64/fleetctl /usr/local/bin/ && \
 rm -rf fleet-v${FLEET_VERSION}-linux-amd64

CMD [“/bin/bash”, “deploy.sh”]

deploy.sh is a shell script for every service check. If it needs to be updated, created, or deleted, it executes the corresponding fleetctl command.

The final step:

How do you run this “system configuration” container?

Currently, in our environment, we do this manually (from SSH shell) for development clusters and use systemd timers for CoreOS clusters on AWS. Systemd timer allows us to define a cron like job at the CoreOS cluster level.

We have plans to define a WebHook endpoint that will allow us to trigger deployment/update based on a WebHook event from the CI service.

Hope you find this post useful. I look forward to your comments and any questions you have.

30 Dec 2015, 11:09

about

Alexei Ledenev

Software Developer