Docker has become the de-facto solution for teams to host their web applications. Turning a modern web application into a docker image, aka dockerization, is simple and straight forward. But over time, like us, you will find yourselves with longer build times and ever-increasing docker image size.
These issues cause team productivity to suffer. You waste precious time waiting for the docker image to build and deploy.
This article documents the steps we took to reduce a docker image size by 17 times from 2.1G to 127MB. If you are like me, you probably don’t have the patience to read the full article. You can find the final Dockerfile here. In short, multi-stage build saved the day!
Size Does Matter
We have a very straight forward simple page React JS application for companies to view candidates’ information and track the status of their application. The app uses yarn to manage package dependencies.
Less than three months into development, we found ourselves with a 2.17 GB large docker image.
Why does it matter for a small shop like ours, you may wonder?
Let’s look at some numbers from development to production.
- Assuming ten developers and each developer builds three times a day, that amounts to 450 min per day.
- Consider an average of 3 deployments a day, 15 min build time, and 10 min to deploy. That gives us 3 * (15 + 10) = 75 minutes
That is a total of 525 minutes per day. How about a month (20 working days)? Well, that is 175 hours per month you could have spent on other creative tasks. You may also wonder about storage costs. That is undoubtedly a factor; however, it wasn’t significant enough for us to prioritize as an issue in our case.
From 2.17GB to 127MB
First, we set out to gather data to understand the overall problem and devise a high-level plan of attack. Second, we identified the caching folder created during the build was the main culprit and used a brute force method to reduce the image size to 700MB from 2.17GB. Finally, we discovered the holy grail: multi-stage build, which further shrank the image size to 127MB.
- Initial Dockerfile
# base image FROM node:13.10.1-alpine # set working directory WORKDIR /app COPY . . ENV PATH /app/node_modules/.bin:$PATH # install COPY package.json /app/package.json RUN apk add yarn RUN yarn install --production RUN yarn build RUN yarn global add serve # start app CMD serve -s build -l 3000
Let’s confirm the size again by the following:
docker image ls | grep admin-prod
- List the sizes by layer. We found yarn install —production alone accounts for 91% of the image size. It was not a surprise, but it did confirm our initial suspicion. Logically that became the first thing we ought to tackle.
docker history — human admin-prod
Note about choosing a small base image: we did not have this particular problem as we had used the node alpine base image from the beginning. We have found alpine to be an excellent base image for Node, so it has been our standard image.
We started by removing unused node packages but stumbled upon a hidden caching folder used during yarn build.
The first probable cause for us was the unused node packages. Like any project, we add new packages all the time, but we rarely removed them. There were 92 packages in our project. We tried to reduce the number by the following:
- npm prune — production removed 17 packages.
- **depcheck **detected more “unused” packages. Unfortunately, removing them caused our application to fail. Besides, the overall image size reduction was insignificant. So small that the total size still stood at 2.17G. There are obvious benefits in removing unused packages, but when it comes to reducing image size, it will be the icing on the cake rather than the main course. That said, we will likely revisit them in a future blog post.
We still haven’t solved the misery of the 2.0G “yarn install -production” layer. The next logical step is to dig deeper into that layer and find out what’s taking up space. Dive: is just the right tool for the task. It only took a few minutes for us to find the problem. /usr/local/share/.cache/yarn alone is taking up 1.1GB. In hindsight, this should have been obvious. The cache folder is used only during the build stage, so it’s a complete waste when included in the final docker image.
Now we have found a root cause. The next is to remove the cache from the final image. There are a few ways to clean up the cache folder. We can use the rm command to remove the cache folder, or we can use yarn’s build-in command: yarn cache clean. That solves the image size issue, except it doesn’t help with build time. There is a reason why yarn uses cache in the first place. We effectively disabled the caching feature by cleaning it up after every build. We added “ yarn cache clean” to the build file. No effect. We tried to add “ rm -rf /usr/local/share/.cache/yarn” still to no avail. As we debug, we found a better option by using the docker option:shm. The option allows the docker guest image to write to a temporary local folder during the build, so we don’t have to worry about removing the cache manually. The solution involved adding — shm-size to the docker build command and a minor change to point to /dev/shm/yarn_cache in the docker build file.
- docker build . -t admin-prod:v2.1 — shm-size 2048M -f Dockerfile-prod
- Dockerfile diff
# base image FROM node:13.10.1-alpine # set working directory WORKDIR /app COPY . . # add `/app/node_modules/.bin` to $PATH ENV PATH /app/node_modules/.bin:$PATH # install COPY package.json /app/package.json RUN apk add yarn RUN YARN_CACHE_FOLDER=/dev/shm/yarn_cache yarn install --production RUN yarn build RUN yarn global add serve # start app CMD serve -s build -l 3000
The resulting image was 700MB. That was a size reduction of 2x. Not bad for a half line change.
But an ideal solution will reduce image size while improving the build time. Wouldn’t it be nice if we can persist the cache folder across docker builds? Like mount volumes during docker run, If we can create a host folder that is accessed by the guest during build time, then the cache folder can be reused across builds. Sadly, we weren’t able to find an out of the box solution to do that. Docker build doesn’t support mounting volumes by design, that is, until the introduction of multi-stage builds.
The multi-stage build introduction always seemed daunting. We knew it for a while, yet we have been avoiding it unconsciously. This time, however, the need to reuse the cache folder across builds drew us back in.
The following is what we came up with:
|# module install|
|FROM node:13.10.1-alpine as module-install-stage|
|# set working directory|
|# add `/app/node_modules/.bin` to $PATH|
|ENV PATH /app/node_modules/.bin:$PATH|
|COPY package.json /app/package.json|
|RUN apk add yarn|
|RUN yarn install --production|
|FROM node:13.10.1-alpine as build-stage|
|COPY --from=module-install-stage /app/node_modules/ /app/node_modules|
|COPY . .|
|RUN yarn build|
|COPY --from=build-stage /app/build/ /app/build|
|RUN npm install -g serve|
|# start app|
|CMD serve -s /app/build -l 3000|
The result is impressive:
- The new image size is down to 127MB from 2.1GB
- Node packages are no longer downloaded every time. They will be downloaded only when package.json file changes. This reduction shaves off 50% to 70% build time on average.
We are pretty happy with where we stand right now, but further improvement can be made by removing unnecessary node packages and keeping the dependency clean.
We started with a 2.1GB docker image that accumulated in less than three months in our hands. The docker image has caused build slowdowns and resulted in many hours of wasted time. We took a 3 step process to gather data, remove the unused cache folder, and finally arrived at multi-stage build.
- On average, we have reduced the build time by 60%, deployment time by 90%, and image size reduction by 17 times. Which translates into roughly 75 hours a month in time saved.
- The final docker file included again for your reference:
# module install FROM node:13.10.1-alpine as module-install-stage # set working directory WORKDIR /app # add `/app/node_modules/.bin` to $PATH ENV PATH /app/node_modules/.bin:$PATH COPY package.json /app/package.json RUN apk add yarn RUN yarn install --production # build FROM node:13.10.1-alpine as build-stage COPY --from=module-install-stage /app/node_modules/ /app/node_modules WORKDIR /app COPY . . RUN yarn build # serve FROM node:13.10.1-alpine COPY --from=build-stage /app/build/ /app/build RUN npm install -g serve # start app CMD serve -s /app/build -l 3000