Dockerising WordPress

Overview

Container software has really picked up in popularity in part thanks to Docker and the concept is great. So much so I wanted to try it on my blog.

Topology

Current Topology
My setup consisted of a single instance on DigitalOcean running ‘LAMP’ from the LAMP stack. The ‘M’ for MySQL part was running on a VPS of it’s own again on DigitalOcean. I understand the idea of containers was to keep application in self contained images which may have linked or dependent containers. I preferred to keep my database separate because of the way persistence is handled in Docker and I don’t think the data volumes are mature enough yet.

The end state should look like this:
End State Topology

The server would be fronted by a proxy which would preferably also have load balancing capabilities too since deployments are quick and easy to allow scaling horizontally.

Topology

WordPress Topology
I have 2 servers running WordPress in a primary and fail over setup rather than a load balanced one.

To secure communications I have HTTPS for non public pages like admin related content (although this may change in the future). For database connection I use SSH tunnels because it does not require the application to support MySQL SSL connection.

Runtime Configuration

The challenges faced when designing the container setups included configuration at run time and shared public files.

I started off using sed commands in the startup.sh script so when the container was started you could use environment variables to set things like database username, password and host address. I belive this was the proper use of ENV attributes in Docker, I found it hard to get the correct sed command to find the key and replace the value of the key in the wp-config.php file. This worked for mandatory config values where ENV defaults can be set in the Dockerfile, optional environment variables were a whole new headache. It meant the scripted needed to check if a value was set during runtime and replace the values if there were or not. This wasn’t too hard to setup but it was hit and missing whether the correct path was set in the shell when the startup.sh script ran. For example it would find username for MySQL passed as a parameter but it would not pass and set the MySQL password.

I couldn’t figure out why this was happening so i chose to go down a different route: mounting directory volumes on the host into the container. This made the bash script very easy to do and it meant there were less lines of code. The configuration files would be put in a specific folder structure and the root of the directory would be the mount point in the container. The script would check if the config files existed before copying and setting the necessary permissions before starting any of the services requiring the files. The down side to this method was that the configuration files contained a template with defaults in the software version control but the actual values are not backed up as part of that mechanism. If no values were supplied the WordPress setup page loads as if it’s a new install.

Mounting a host volume does mean the data is tied to the host but as I stated above I didn’t like the state of data volumes.

Persistence

I was not using Docker hub to store my container images because I had not setup a private docker registry so it rules out shared volumes between running containers. The other solution is to mount a host volume to the container at runtime which increases the coupling of containers to the host. I used the latter approach to get around the limitation.

For consistency the files need to be synchronised across all servers running the WordPress container. There are various strategies to do this from a storage server such as NFS to distributed file systems like GlusterFS.

I decided to use BitTorrent Sync (pre version 2.0). I used it since it was in beta and it worked well. The technology allows for a truly decentralised file replication where as NFS server is a client-server setup and GlusterFS requires knowledge of each server i.e no self discovery mechanism. This meant the primary server had the static files on the host machine. A host volume would be mounted and copied to the correct directory in the container.

I put BT Sync inside the container because I was going for a primary and fail over topology. So the primary host has the user contents in a directory. This is mounted on running docker run command. The startup script will copy all the files from the host to the correctly exposed directory and perform any file ownership changes needed.

Because the files are copied to the correct location, any new files uploaded are not copied back to the host server so when the container is destroyed and rebuild/run it will only have the old files. To resolve this the host also has BT Sync daemon running to sync all the changes inside the container to the host.

On a secondary server the startup time is not as important. As long as the primary starts up quickly with all the user created content available then the secondary will run as a backup and traffic will only be routed to the server for load balancing or as a fail over situation.

With this in mind the subsequent hosts running a copy of the WordPress Docker container On the fail over server host it will not have any of the user contents. Instead, it will startup with no content including themes and so the site will load with a blank white page (like an error). It will use BT sync to bring all that content to the container from either the host of the primary or any other container running BT sync. Until the minimum of the site theme files are sync’d across the site is not usable and gradually (actually very quickly) will be made available. This is to save space and maintaining 2 copies of user data.

Docker

The Dockerfile is fairly standard install. All configuration is stored outside of the build file where at run time it will mount a config volume from the host and copy files.

The container will have the following services:

I used supervisord to manage long running processes and a bash script to do any pre-setup before invoking supervisord to start the main application services.

To cater for this primary and secondary server setup 2 bash scripts to start the containers were created. One to start the container with the volumes mounted whilst the other only has mounts for configuration files needed at runtime.

BitTorrent Sync

BT Sync is fairly flexible and uses JSON configuration files. The project has a simple and complex example of configuration with comments for documentation. Whilst this isn’t always the best means to describe configuration it just about works.

BitTorrent protocol requires a tracker so that it can pair nodes together otherwise 2 BitTorrent clients doesn’t have a means of discovering where other clients are to share files. Fortunately in BT Sync you can use BitTorrent’s own tracker for this but this can lead to man in the middle attacks or leave traces on a third party server. BT has thought about this and allows you to specify other known clients with their details and either disable going to an external tracker (in which case if you client changes address it won’t be able to find it) or look for clients at those known addresses before going to the tracker to look them up. I chose the former.

Each client requires a shared password for encryption so that no one else can get your files.

The current improvements to the BitTorrent Sync recently have been aimed at the web frontend to manage and configure clients. This is all well and good but not everyone uses a web frontend to configure things all the time and I worry that eventually all configuration is done this way.

This method and synchronising over the DigitalOcean LAN between instances is pretty quick. I believe it takes minutes before the container has all the files (non scientific test).

Load Balancer

I have been waiting years for the HA Proxy 1.5 to come out in public release and also for it then to be included in my preferred Ubuntu distro. The reason for 1.5 was because HA Proxy before 1.5 didn’t have the function to terminate (I believe that’s the right word) an SSL connection so you’d have to run HA Proxy and then redirect any SSL connections to another software to handle that connection. Not anymore!

HA Proxy is configure to perform health checks on the contains and automatically redirect traffic if one of them is down. Also sticky session was required to ensure if someone was logged in they stayed on the server they were logged into if it’s up. Example configuration files can be found on my github page.

Summary

Its far from perfect setup but it made it very easy for the server to scale out. My primary use case was to allow maximum uptime during maintenance / upgrade and it has been working until BitTorrent Sync went to version 2.0. This was when it rolled out a paid plan and also started to shift a lot of the development towards the web UI making a server setup to be hard to impossible. There were obvious improvements I’d make such as service discovery so that each container can tell the load balancer it’s up and running and to add it to the pool of resources. Other things I did not go into were logging and monitoring.

About Danny

I.T software professional always studying and applying the knowledge gained and one way of doing this is to blog. Danny also has participates in a part time project called Energy@Home [http://code.google.com/p/energyathome/] for monitoring energy usage on a premise. Dedicated to I.T since studying pure Information Technology since the age of 16, Danny Tsang working in the field that he has aimed for since leaving school. View all posts by Danny → This entry was posted in Container, Infrastructure, Linux and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.