At EverQuote, we are passionate about our DevOps toolchain and improving our lead time. Teams are encouraged to share best practices and curate patterns that can be replicated across the company. Here is a quick overview of what we've collectively learned so far about running Python applications in production.
For this article, we assume the reader has a basic to intermediate familiarity with:
- Docker: containers and images
- Kubernetes: basic objects such as Pods, Services, Volumes, and Namespaces
- Python: package management tools like
Getting started with your Dockerfile
The official Docker documentation provides a great overview of best practices for writing Dockerfiles. This article builds on these practices and augments them with Python-specific recommendations.
1. Use alpine-based images
The base python official Docker image at time of writing is 341 MB compressed for the latest
3.8.2 tag. 😱 Considering most of that space is unnecessary Debian OS boilerplate, we can do much better.
3.8.2-slim is a much more manageable 58.98 MB. Better, but
3.8.2-alpine wins out at only 34.3 MB. There are diminishing returns to stripping down your container image size, so definitely run your own benchmarks. At EverQuote, we've found that Alpine images and Slim pull from our container registry onto the node in comparable time.
Wait, so if the image pull times are similar, why not just use Slim?
Because security! 🔒 In March 2020, we ran a little in-house experiment to test Anchore, an open source vulnerability scanning tool. We compared a barebones
3.8-slim image against a
3.8-alpine image, and the results were very surprising:
The Slim image had 47 vulnerabilities (granted, almost all were "Negligible" severity). The Alpine image had zero.
All 47 CVEs are being tracked by Debian and pending upstream fixes, so this isn't really Docker's or Debian's fault. The difference is probably just a numbers game: Alpine has fewer distribution-specific CVEs tracked than Debian does in the first place. Still, you can't hack software that isn't there! At EverQuote, we're also starting to explore Google Distroless to further strip away unnecessary software running inside our production containers.
I tried Alpine, and my
pip installtook forever.
That's because the manylinux project that provides pre-compiled wheels for Linux uses glibc as its base C implementation, and Alpine uses musl. This means any libraries that have C-based dependencies must be compiled at build time, which can slow your build down from less than one minute to over ten (in our experience). Several people have inquired about adding support, but none have been accepted.
We don't use Alpine for all of our Python apps. Where build times are impractically long (looking at you,
pandas), switching to
-slim for those
manylinux wheels is a good compromise. We also have an internal package repo for pre-built wheels that work on Alpine, which has sped up many of our builds by a factor of ten.
2. Clean up unnecessary files
To really slim down our packages, we run
pip install with
--no-cache-dir and package all of our Alpine package C dependencies with
apk --no-cache on a
--virtual build-deps virtual package. This ensures that any downloaded packages and source code dependencies are removed from the production image.
3. Avoid running as root
runAsUser: # Require the container to run without root privileges. rule: 'MustRunAsNonRoot'
4. Leverage Docker layer build caching
As an interpreted language, Python dependencies are packaged and installed as source code. This means that if you separate your
COPY ./requirements.txt statement just before the
pip install, you can avoid having to wait for
pip install on every
docker-compose build in local dev. See the aforementioned best practices doc for more details.
5. Separate your abstract dependencies from locked requirements
Even having professionally developed software in Python for over five years I still get confused about setup.py vs. requirements.txt and often refer back to the PyPA documentation. Fancy new tools like Pipenv aim to standardize the solution, but it's still important to understand the difference and apply them in practice regardless of your chosen toolset. In practice, we've found that using Pipenv in our CI/CD pipeline slowed down builds enough that we'd rather stick to the convention of "abstract deps in
requirements.txt is a lock file."
requirements.txt is truly a lock file, you can verify this with
pip install --no-deps -r requirements.txt && pip check.
Can I just see your
Dockerfileand copy it?
Sure! Here's an actual (slightly adulterated for the public) Dockerfile from one of our mission critical applications, used both for local development (Docker Compose) and production (Kubernetes).
FROM python:3.7-alpine # Install C dependencies RUN apk --no-cache add --virtual build-deps \ build-base \ mariadb-connector-c-dev WORKDIR /my_app # Access internal packages and wheels ARG EQ_PYTHON_PACKAGE_INDEX # Install requirements COPY ./requirements.txt /my_app/requirements.txt RUN pip install --no-cache-dir --no-deps \ -i $EQ_PYTHON_PACKAGE_INDEX \ -r requirements.txt && \ pip check COPY ./test_requirements.txt /my_app/test_requirements.txt RUN pip install --no-cache-dir -i $EQ_PYTHON_PACKAGE_INDEX -r test_requirements.txt && \ pip check # Remove source code packages, but keep the libraries RUN apk del build-deps && \ apk --no-cache add mariadb-connector-c # Install app COPY . /my_app RUN pip install -e . # Run as non-root user per default cluster PodSecurityPolicy RUN adduser -u 1050 -S my_app USER 1050 CMD ["python"]
If you are similarly passionate about your tools and want to learn more, please reach out via EverQuote's careers site at careers.everquote.com! Not looking for a job? No problem! Join our Talent Community on the same site. ✨