Cleaning Up After Docker

alt

At Newsweaver, we use Docker for a number of solutions, from building RPMs, building images as part of our Microservices and running tests. Although we are moving to a Internal Registry, we still have a dependency on building images locally from the Dockerfiles in the source code.
While we have enjoyed the benefits of using Docker, we have also come up against a few pain points. The pain point that I will be covering in this post will be on the lack of any native garbage collection provided by Docker and how our lack of understanding resulted in having to take immediate action to free space on our Docker server to enable jobs to run that were necessary for an upcoming Release. These things always happen at the wrong time!!!

Where we went wrong

So as mentioned we had a large number of jobs that would build images based on the Dockerfiles from the source code, while this in itself is not a major risk we often built these images using some versioning for the image name. E.g testlab[version]
What this resulted in was a whole range of images that were no longer used, some were over a year old! Also as images were updated the layers used to create these images were also updated resulting in dangling images.

So now let’s look at the impact these images had on our storage. Using the following command you can see a range of information on your Docker installation.

docker info  

While this command outputs a range of information, what we are interested in is the Data Space values.

Example:

Data Space Used: 45.68 GB  
Data Space Total: 107.4 GB  
Data Space Available: 61.69 GB  

Here you can see the amount of storage that is being used by your Docker installation. As you can see from the above example, we have a reasonable amount of total storage of 107.4GB. What we experienced, because we never did any garbage collection, is that the Data Space Available became less and less until the point was reached when we had zero storage and thus could not build any further images. This resulted in all those jobs failing, just as we were preparing for a release.

The Clean Up

So, as it became clear that storage was our issue we set about investigating ways of cleaning up the images that were no longer used. In our naivety we just presumed we could just delete the images outputted by,

docker images  

So while we ran a command like:

docker rmi $(docker images -q)  

We sat back and expected after this was completed, which took quite some time, that we would be back to a fresh slate and have the required free storage. To our surprise we freed up about 30GB of storage. Not a huge amount considering our available space was 107.4GB. So where was the rest of the storage being used?

Containers, ah the forgotten children. So while we had removed all our images the next port of call to check was the containers. Low and behold we had a huge amount of ‘Dead’ and ‘Exited’ containers.

docker ps -a  

We realised that when we called the command to run some of our containers we did not specify that it should be automatically removed once it has exited. Example:

docker run --rm ...  

Use the below command for a full list of options and their defaults for the docker run command

docker run --help  

So not only had we a large number of images that were now obsolete we also had a huge number of containers, even larger than the number of images, that were no longer used. So again we set about removing these containers and since we were in a rush to free up the necessary storage to get our jobs running again we just did a blanket removal of all the containers.

docker rm $(docker ps -a -q)  

Yay!! We had pretty much all the available storage as free storage. Wait what do you mean ‘pretty much’? Well as we dug a bit deeper we realised there was an issue with the current version of Docker we had.

docker -v  
Docker version 1.8.2...  

Not to go into too much detail, here is the issue we came across https://github.com/docker/docker/issues/12487, but we could not delete some images, even using the force flag. So we had to upgrade Docker to the latest and clear out our current lib for Docker and restart Docker itself. These steps are outlined in the linked issue above. Our current Docker version is 1.10.3

Prevention

So what do we do now that we know the cause and how are we preventing a recurrence in the future?

Well first we updated all run commands for containers to use the removal flag --rm. This has ensured that once a container is finished we remove it automatically.
We have a storage monitor job, just a simple bash script, that checks the Data Space Available on the Docker server. The job is run hourly and is configured on Bamboo and if the storage dips below a certain threshold the job fails.

DOCKER_AVAILABLE_STORAGE=`(docker info | grep 'Data Space Available' | awk '{print $4}')`  

If the monitor job above fails we then kick off a clean up job that removes the following:
Dangling Images

DANGLING_IMAGES=(`(docker images -f "dangling=true" -q)`)  

Dead Containers and Exited Containers, because setting the --rm flag is still a manual task and someone might forget

DEAD_CONTAINERS=(`(docker ps -a | grep '\(Dead\|Exited\)' | awk '{print $1}')`)  

Old Images: Removes any image over 2 weeks old and can be set to exclude images that you don’t want to delete

OLD_IMAGES=(`(docker images -a | grep -vi 'an_image_we_dont_want_to_remove’' | grep '\([^2] weeks\|months\)' | awk '{print $3}' | sort -u)`)  

We force delete all of these collections using the force flag

docker rmi -f IMAGES  
docker rm -f CONTAINERS  

The clean up job is also scheduled to run once a week.

These jobs have ensured that the available storage has never dipped and remained below our threshold. We haven’t had any storage issues since and can now concentrate on moving our Dockerfiles to our local image repository and use the garbage collection for the registry to remove old images.

As a side note most people may not need to go to the length we did to manage our Docker storage so there are pre-made solutions out there. To solve the problem of the lack of Docker "garbage collection", the guys over at Spotify created the docker-gc project on GitHub. This project contains a script that removes all containers that have ‘Exited’ over an hour ago, together with their respective images. In fact we may move to some sort of hybrid between Spotify’s solution and our own.

Gavin Kelly

Read more posts by this author.

Cork

Subscribe to Poppulo Technology Blog

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!