Thoughts on Docker

Docker has been causing a lot of ripples in all sorts of ponds in recent years. I first started playing with it nearly a year ago now, after hearing about it from someone else at work. At first I didn't really understand what problems it was trying to solve. The more I played with it, however, the more interesting it became.

Gripes About Docker

There were plenty of things that I didn't care for about Docker. The most prominent strike against it was how slow it was to start, stop, and destroy containers. I soon learned that if I store my Docker data on a btrfs partition, things become much faster. And it was great! Things that used to take 10 minutes started taking 2 or 3 minutes. Very significant improvement.

But then it was still slow to actually build any containers that are less than trivial. For example, we've been using Docker for one of my side projects since April 2014 (coming from vagrant). Installing all of the correct packages and whatnot inside of a our base Docker image took several minutes. Much longer than it does on iron or even in virtual machines. It was just slow. Anytime we had to update dependencies, we'd invalidate the image cache and spend a large chunk of time just waiting for an image to build. It was/is painful.

On top of that, pushing and pulling from the public registry is much slower than a lot of us would like it to be. We set up a private registry for that side project, but it was still slower than it should be for something like that.

Many of you reading this article have probably read most or all of those gripes from other Docker critics. They're fairly common complaints.

Lately, one of the things about using Docker for development that's become increasingly more frustrating is communication between containers on different hosts. Docker uses environment variables to tell one container how to reach services on another container running on the same host. Using environment variables is a great way to avoid hardcoding IPs and ports in your applications. I love it. However, when your development environment consists of 8+ distinct containers, the behavior around those environment variables is annoying (in my opinion).

Looking For Alternatives

I don't really feel like going into more detail on that right now. Let's just say it was frustating enough for me to look at alternatives (more out of curiosity than really wanting to switch away from Docker). This search led me to straight Linux containers (LXC), upon which Docker was originally built.

I remembered trying to use LXC for a little while back in 2012, and it wasn't a very successful endeavor--probably because I didn't understand containers very well at the time. I also distinctly remember being very fond of Docker when I first tried it because it made LXC easy to use. That's actually how I pitched it to folks.

Long story short, I have been playing with LXC for the past while now. I'm quite happy with it this time around. It seems to better fit the bill for most of the things we have been doing with Docker. In my limited experience with LXC so far, it's generally faster, more flexible, and more mature than Docker.

What proof do I have that it's faster? I have no hard numbers right now, but building one of our Docker images could take anywhere from 10 to 20 minutes. And that was building on top of an already existing base image. The base image took a few minutes to build too, but it was built much less regularly than this other image. So 10-20 minutes just to install the application-specific packages. Not the core packages. Not configure things. Just install additional packages.

Building an entire LXC container from scratch, installing all dependencies, and configuring basically an all-in-one version of the 8 different containers (along with a significant number of other things for monitoring and such) has consistently taken less than 3 minutes on my 2010 laptop. The speed difference is phenominal, and I don't even need btrfs. Lauching the full container is basically as fast as launching a single-purpose Docker container.

What proof do I have that LXC is more flexible than Docker? Have you tried running systemd inside of a Docker container? Yeah, it's not the most intuitive thing in the world (or at least it wasn't the last time I bothered to try it). LXC will let you use systemd without any fuss (that I've noticed, anyway). This probably isn't the greatest example of flexibility in the world of containers, but it certainly works for me.

You also get some pretty interesting networking options, from what I read. Not all of your containers need to be NAT'ed. Some can be NAT'ed and some can be bridged to appear on the same network as the host. I'm still exploring all of these goodies, so don't ask for details about them from me just yet ;)

What proof do I have that LXC is more mature than Docker? Prior to Docker version 0.9, its default execution environment was LXC. Version 0.9 introduced libcontainer, which eliminated Docker's need for LXC. The LXC project has been around since August 2008; Docker has been around since March 2013. That's nearly 5 entire years that LXC has had to mature before Docker was even a thing.

What Now?

Does all of this mean I'll never use Docker again? That I'll use LXC for everything that Docker used to handle for me? No. I will still continue to use Docker for the foreseeable future. I'll just be more particular about when I use it vs when I use LXC.

I still find Docker to be incredibly useful and valuable. I don't think it's as suitable for long-running development environments or to replace a fair amount of what folks have been using Vagrant to do. It can certainly handle that stuff, but LXC seems better suited to the task, at least in my experience.

Why do I think Docker is still useful and valuable? Well, let me share an example from work. We occasionally use a program with rather silly Java requirements. It requires a specific revision, and it must be 32-bit. It's really dumb. Installing and using this program on Ubuntu is really quite easy. Using the program on CentOS, however, is .... quite an adventure. But not an adventure you really want to take. You just want to use that program.

All I had to do was compose a Dockerfile based on Ubuntu, toss a couple apt-get lines in there, build an image, and push it to our registry. Now any of our systems with Docker installed can happily use that program without having to deal with any of the particularities about that one program. The only real requirement now is an operational installation of Docker.

Doing something like that is certainly doable with LXC, but it's not quite as cut and dry. In addition to having LXC installed, you also have to make sure that the container configuration file is suitable for each system where the program will run. This means making sure there's a bridged network adapter on the host, the configuration file uses the correct interface name, at the configuration file doesn't try to use an IP address that's already claimed, etc etc.

Also, Docker gives you port forwarding, bind mounts, and other good stuff with some simple command line parameters. Again, port forwarding and bind mounts are perfectly doable with straight LXC, but it's more complicated than just passing some additional command line parameters.

Anyway. I just wanted to get that out there. LXC will likely replace most of my Linux-based virtual machines for the next while, but Docker still has a place in my toolbox.

uWSGI FastRouter and nginx

Lately I've been spending a lot of time playing with Docker, particularly with Web UIs and "clustering" APIs. I've been using Nginx and uWSGI for most of my sites for quite some time now. My normal go-to for distributing load is with nginx's upstream directive.

This directive can be used to specify the address/socket of backend services that should handle the same kinds of requests. You can configure the load balancing pretty nicely right out of the box. However, when using Docker containers, you don't always know the exact IP for the container(s) powering your backend.

I played around with some fun ways to automatically update the nginx configuration and reload nginx each time a backend container appeared or disappeared. This was really, really cool to see in action (since I'd never attempted it before). But it seemed like there had to be a better way.

Mongrel2 came to mind. I've played with it in the past, and it seemed to handle my use cases quite nicely until I tried using it with VirtualBox's shared folders. At the time, it wasn't quite as flexible as nginx when it came to working with those shared folders (might still be the case). Anyway, the idea of having a single frontend that could seamlessly pass work along to any number of workers without being reconfigured and/or restarted seemed like the ideal solution.

As I was researching other Mongrel2-like solutions, I stumbled upon yet another mind-blowing feature tucked away in uWSGI: The uWSGI FastRouter.

This little gem makes it super easy to get the same sort of functionality that Mongrel2 offers. Basically, you create a single uWSGI app that will route requests to the appropriate workers based on the domain being requested. Workers can "subscribe" to that app to be added to the round-robin pool of available backends. Any given worker app can actually serve requests for more than one domain if you so desire.

On the nginx side of things, all you need to do is use something like uwsgi_pass with the router app's socket. That's it. You can then spawn thousands of worker apps without ever restarting nginx or the router app. Whoa.

So let's dig into an example. First, some prerequisites. I'm currently using:

  • nginx 1.6.0
  • uwsgi 2.0.4
  • bottle 0.12.7
  • Python 3.4.1
  • Arch Linux

The first thing we want is that router app. Here's a uWSGI configuration file I'm using:

uwsgi-fastrouter/router.ini

[uwsgi]
plugins = fastrouter
master = true
shared-socket = 127.0.0.1:3031
fastrouter-subscription-server = 0.0.0.0:2626
fastrouter = =0
fastrouter-cheap = true
vacuum = true

# vim:ft=dosini et ts=2 sw=2 ai:

So, quick explanation of the interesting parts:

  • shared-socket: we're setting up a shared socket on 127.0.0.1:3031. This is the socket that we'll use with nginx's uwsgi_pass directive, and it's also used for our fastrouter socket (=0 implies that we're using socket 0).
  • fastrouter-subscription-server: this is how we make it possible for our worker apps to become candidates to serve requests.
  • fastrouter-cheap: this disables the fastrouter when we have no subscribed workers. Supposedly, you can get the actual fastrouter app to also be a subscriber automatically, but I was unable to get this working properly.

Now let's look at a sample worker app configuration:

uwsgi-fastrouter/worker.ini

[uwsgi]
plugins = python
master = true
processes = 2
threads = 4
heartbeat = 10
socket = 192.*:0
subscribe2 = server=127.0.0.1:2626,key=foo.com
wsgi = app
vacuum = true
harakiri = 10
max-requests = 100
logformat = %(addr) - %(user) [%(ltime)] "%(method) %(uri) %(proto)" %(status) %(size) "%(referer)" "%(uagent)"

# vim:ft=dosini et ts=2 sw=2 ai:
  • socket: we're automatically allocating a socket on our NIC with an IP address that looks like 192.x.x.x. This whole syntax was a new discovery for me as part of this project! Neat stuff!!
  • subscribe2: this is one of the ways that we can subscribe to our fastrouter. Based on the server=127.0.0.1:2626 bit, we're working on the assumption that the fastrouter and workers are all going to be running on the same host. The key=foo.com is how our router app knows which domain a worker will serve requests for.
  • wsgi: our simple Bottle application.

Now let's look at our minimal Bottle application:

uwsgi-fastrouter/app.py

from bottle import route, default_app


application = default_app()
application.catchall = False


@route('/')
def index():
    return 'Hello World!'

All very simple. The main thing to point out here is that we've imported the default_app function from bottle and use it to create an application instance that uWSGI's wsgi option will use automatically.

Finally, our nginx configuration:

uwsgi-fastrouter/nginx.conf

daemon                  off;
master_process          on;
worker_processes        1;
pid                     nginx.pid;

events {
    worker_connections  1024;
}


http {
    include             /etc/nginx/mime.types;

    access_log          ./access.log;
    error_log           ./error.log;

    default_type        application/octet-stream;
    gzip                on;
    sendfile            on;
    keepalive_timeout   65;

    server {
        listen 80 default;
        server_name localhost foo.com;

        location / {
            include     /etc/nginx/uwsgi_params;
            uwsgi_pass  127.0.0.1:3031;
        }

        error_page 500 502 503 504 /50x.html;
        location = /50x.html {
            root /usr/share/nginx/html;
        }
    }
}

# vim:filetype=nginx:

Nothing too special about this configuration. The only thing to really point out is the uwsgi_pass with the same address we provided to our router's shared-socket option. Also note that it will bind to port 80 by default, so you'll need root access for nginx.

Now let's run it all! In different terminal windows, run each of the following commands:

sudo nginx -c nginx.conf -p $(pwd)
uwsgi --ini router.ini
uwsgi --ini worker.ini

If all goes well, you should see no output from the nginx command. The router app should have some output that looks something like this:

spawned uWSGI master process (pid: 4367)
spawned uWSGI fastrouter 1 (pid: 4368)
[uwsgi-subscription for pid 4368] new pool: foo.com (hash key: 11571)
[uwsgi-subscription for pid 4368] foo.com => new node: :58743
[uWSGI fastrouter pid 4368] leaving cheap mode...

And your worker app should have output containing:

subscribing to server=127.0.0.1:2626,key=foo.com

For the purpose of this project, I quickly edited my /etc/hosts file to include foo.com as an alias for 127.0.0.1. Once you have something like that in place, you should be able to hit the nginx site and see requests logged in your worker app's terminal:

curl foo.com

The really cool part is when you spin up another worker (same command as before, since the port is automatically assigned). Again, there's no need to restart nginx nor the router app--the new worker will be detected automatically! After doing so, each request will be spread out across all of the subscribed workers.

Here's a quick video of all of this in action, complete with multiple worker apps subscribing to one router app. Pay close attention to the timestamps in the worker windows.

While this is all fine and dandy, there are a couple of things that seem like they should have better options. Namely, I'd like to get the single FastRouter+worker configuration working. I think it would also be nice to be able to use host names or DNS entries for the workers to know how to connect to the FastRouter instance. Any insight anyone can offer would be greatly appreciated! I know I'm just scratching the surface of this feature!