Python Kubernetes Best Practices

A quick overview of what we've collectively learned so far about running Python applications in production at EverQuote.

Python Kubernetes Best Practices

At EverQuote, we are passionate about our DevOps toolchain and improving our lead time. Teams are encouraged to share best practices and curate patterns that can be replicated across the company. Here is a quick overview of what we've collectively learned so far about running Python applications in production.

For this article, we assume the reader has a basic to intermediate familiarity with:

  • Docker: containers and images
  • Kubernetes: basic objects such as Pods, Services, Volumes, and Namespaces
  • Python: package management tools like pip

Getting started with your Dockerfile

The official Docker documentation provides a great overview of best practices for writing Dockerfiles. This article builds on these practices and augments them with Python-specific recommendations.

1. Use alpine-based images

The base python official Docker image at time of writing is 341 MB compressed for the latest 3.8.2 tag. 😱 Considering most of that space is unnecessary Debian OS boilerplate, we can do much better.

What about -slim?

3.8.2-slim is a much more manageable 58.98 MB. Better, but 3.8.2-alpine wins out at only 34.3 MB. There are diminishing returns to stripping down your container image size, so definitely run your own benchmarks. At EverQuote, we've found that Alpine images and Slim pull from our container registry onto the node in comparable time.

Wait, so if the image pull times are similar, why not just use Slim?

Because security! 🔒 In March 2020, we ran a little in-house experiment to test Anchore, an open source vulnerability scanning tool. We compared a barebones 3.8-slim image against a 3.8-alpine image, and the results were very surprising:

The Slim image had 47 vulnerabilities (granted, almost all were "Negligible" severity). The Alpine image had zero.

All 47 CVEs are being tracked by Debian and pending upstream fixes, so this isn't really Docker's or Debian's fault. The difference is probably just a numbers game: Alpine has fewer distribution-specific CVEs tracked than Debian does in the first place. Still, you can't hack software that isn't there! At EverQuote, we're also starting to explore Google Distroless to further strip away unnecessary software running inside our production containers.

I tried Alpine, and my pip install took forever.

That's because the manylinux project that provides pre-compiled wheels for Linux uses glibc as its base C implementation, and Alpine uses musl. This means any libraries that have C-based dependencies must be compiled at build time, which can slow your build down from less than one minute to over ten (in our experience). Several people have inquired about adding support, but none have been accepted.

We don't use Alpine for all of our Python apps. Where build times are impractically long (looking at you, pandas), switching to -slim for those manylinux wheels is a good compromise. We also have an internal package repo for pre-built wheels that work on Alpine, which has sped up many of our builds by a factor of ten.

2. Clean up unnecessary files

To really slim down our packages, we run pip install with --no-cache-dir and package all of our Alpine package C dependencies with apk --no-cache on a --virtual build-deps virtual package. This ensures that any downloaded packages and source code dependencies are removed from the production image.

3. Avoid running as root

There is plenty of other literature on why it's important container processes not run as root in production. At EverQuote, we configure the default PodSecurityPolicy with

  runAsUser:
    # Require the container to run without root privileges.
    rule: 'MustRunAsNonRoot'

4. Leverage Docker layer build caching

As an interpreted language, Python dependencies are packaged and installed as source code. This means that if you separate your COPY ./requirements.txt statement just before the pip install, you can avoid having to wait for pip install on every docker-compose build in local dev. See the aforementioned best practices doc for more details.

5. Separate your abstract dependencies from locked requirements

Even having professionally developed software in Python for over five years I still get confused about setup.py vs. requirements.txt and often refer back to the PyPA documentation. Fancy new tools like Pipenv aim to standardize the solution, but it's still important to understand the difference and apply them in practice regardless of your chosen toolset. In practice, we've found that using Pipenv in our CI/CD pipeline slowed down builds enough that we'd rather stick to the convention of "abstract deps in setup.py and requirements.txt is a lock file."

If your requirements.txt is truly a lock file, you can verify this with pip install --no-deps -r requirements.txt && pip check.

Example Dockerfile

Can I just see your Dockerfile and copy it?

Sure! Here's an actual (slightly adulterated for the public) Dockerfile from one of our mission critical applications, used both for local development (Docker Compose) and production (Kubernetes).

FROM python:3.7-alpine

# Install C dependencies
RUN apk --no-cache add --virtual build-deps \
    build-base \
    mariadb-connector-c-dev

WORKDIR /my_app

# Access internal packages and wheels
ARG EQ_PYTHON_PACKAGE_INDEX

# Install requirements
COPY ./requirements.txt /my_app/requirements.txt
RUN pip install --no-cache-dir --no-deps \
    -i $EQ_PYTHON_PACKAGE_INDEX \
    -r requirements.txt && \
        pip check

COPY ./test_requirements.txt /my_app/test_requirements.txt
RUN pip install --no-cache-dir -i $EQ_PYTHON_PACKAGE_INDEX -r test_requirements.txt && \
    pip check

# Remove source code packages, but keep the libraries
RUN apk del build-deps && \
    apk --no-cache add mariadb-connector-c

# Install app
COPY . /my_app
RUN pip install -e .

# Run as non-root user per default cluster PodSecurityPolicy
RUN adduser -u 1050 -S my_app
USER 1050

CMD ["python"]

Join us

If you are similarly passionate about your tools and want to learn more, please reach out via EverQuote's careers site at careers.everquote.com! Not looking for a job? No problem! Join our Talent Community on the same site. ✨