After our recent studies relating to Docker managing dependencies and ensuring consistent development environments, I was interested in learning more about how to use it because I thought something like this could have saved me many hours of troubleshooting while completing a recent research project. This article, written by Medium user Dispanshu, highlights the capabilities of Docker and how to efficiently use the service in a cloud environment.
The article focuses on optimizing Docker images to achieve high-performance, low-cost deployment. The problem some developers run into is having very large images which slow the build processes, waste storage, and reduce application speeds. I learned from this work that large images result from including unnecessary tools/dependencies/files, inefficient layer caching, and including other full images (like Python in this case). Dispanshu focuses on achieving the solution in 5 parts:
- Multi-stage builds
- Layer optimizations
- Minimal base images (including Scratch)
- Advanced techniques like distroless images
- Security best practices
Using these techniques, the image receives a size reduction from 1.2GB to 8MB! The most impactful change being multi-stage builds to which the writer accredits over 90% of this size reduction. I have never used these techniques before, but my interest definitely peaked when I saw the massive size reduction that resulted from these changes.
The multi-stage builds technique references the build stage and the production stage. By using this technique build-time dependencies are separated from the actual runtime environment which avoids the inclusion of any unnecessary files or tools in the resulting image. Another technique recommends minimal base images using the slim or alpine version (for Python) over the full version for the build stage and for production stage it is recommended to use the scratch base image (no OS, no dependencies, no data or apps). Using a scratch image has pros and cons, but when we are considering image sizes and optimization this is an ideal route.
Another interesting piece of this article is the information relating to advanced techniques like distroless images, using Docker Buildkit, using .dockerignore file, and eliminating any excess files. The way that distroless images are explained by the writer makes the concept and the use case very clear. The high-level differences between the Full Distribution Image, the Scratch Image, and the Distroless Image are described as the different ways we can pack for a trip:
- Pack your entire wardrobe (Full Distribution Image)
- Pack nothing and buy everything at your destination (Scratch Image)
- Pack only what you’ll actually need (Distroless Image)
The analogy makes understanding the relationship between these three image options seemingly obvious, but I can imagine applying any of these techniques described would require some perseverance. This article describes an architecture that juggles simplicity, performance, cost, and security with very impactful results. The results of this article are proof of the value these techniques can provide, and I will be seeking to apply them in my future work.
From the blog CS@Worcester by cameronbaron and used with permission of the author. All other rights reserved by the author.