1. 33
  1.  

  2. 25

    This issue is not specific to Alpine, Wheels just suck with any less-common ABI/platform combination. Try it with Debian on ARM and you will run into the same issue.

    1. 12

      Yea, it’s not a fair comparison because the OP is using a package with some base extensions and therefore has to pull in gcc and compile those CPython extensions. A better comparison would be to compare something that had the actual python package available in Alpine.

      Or, use a program you wrote whose dependencies are all pure-python without needing to link to outside C resources.

      1. 15

        It might not be fair, but it is relevant. At least in my view. This article mirrors my experience, and I came to the same conclusion. It’s quite rare that I don’t have any library that uses C in my projects, but for which no wheels are available for a ‘standard’ platform. I never had use for any of the Python packages that were included in Alpine (nor the ones that come with Debian, for that matter. I usually try to pin my dependency versions to the exact ones I use in de velopment).

        I’ve experienced a lot of pain trying to use Alpine with Python, and I don’t think it’s worth it for anything other than small, simple projects. Even for building Python-based containers for functions in OpenFaas (where you’d really like to have small images), I ended up having a (relatively heavy) base image with most of my common requirements included, and used that with all my function containers. Which, in the end is an acceptable solution, since you only have to pull the base image every now and then, when it updates. The base images weights around 350mb, but every image derived from it was only a few kBs or (MBs, in some cases).

        Anyway, if you can fault the article for anything it might be for not making it clearer when it’s conclusions are relevant. But I think most people will understand that, and won’t expect a ‘fair’ comparison for clean-room circumstances. As I said, it might not be fair, but it’s good practical advice nonetheless, in a lot of cases.

        1. 2

          Exactly my feeling. The original post and liwakura’s comment are both fascinating when I don’t have this problem, but I’d wager if someone is searching for “Alpine Docker Python Slow” it’s not going to be for the other edge cases.

      2. 12

        The difference is that no one recommends ARM as a way to make smaller, faster builds. It’s understood you’ll have more work because it’s more obscure. But people do recommend Alpine on amd64 for faster, smaller builds, and that wastes lots and lots of time for people who don’t know any better.

      3. 18

        it’s worth noting that the ways in which musl is “broken” as compared to glibc are because glibc have a bunch of nonstandard extensions and features/bugs that don’t match the spec. Of course musl can have bugs, but the ways in which it’s different are because it conforms to the spec and glibc doesn’t.

        1. 9

          It’s easy to become frustrated at musl when working with Alpine Linux (or any distro that uses musl by default), but you bring up a good point that must be remembered in these times of frustration: if the app you want to build doesn’t build with musl because it’s using some glibc-isms not in the spec, it’s not musl’s fault that the app developer uses extensiosn/features in a libc implementation that does more than implement libc.. Unfortunately glibc is quite prolific these days so it’s easy to run into these portability issues :(

        2. 8

          The build time and research arguments are valid and definitely good to keep in mind. However, you can make smaller alpine images by following similar patterns. The Dockerfiles in this article don’t use –no-cache with apk and they leave around the development files after the build is done.

          The following image should build something similar, though, as mentioned, it takes a while to build. I’ll update this post with the final image size when it’s done.

          FROM python:3.8-alpine
          
          ENV PYTHONUNBUFFERED=1 \
            PYTHONDONTWRITEBYTECODE=1 \
            PYTHONHASHSEED=random \
            PIP_NO_CACHE_DIR=off \
            PIP_DISABLE_PIP_VERSION_CHECK=on
          
          RUN apk add --no-cache freetype libpng openblas
          
          RUN apk add --no-cache --virtual .build-deps gcc build-base freetype-dev libpng-dev openblas-dev \
              && pip install matplotlib pandas \
              && apk del --no-cache .build-deps
          

          EDIT: strangely, the build wasn’t also made slowed by untaring matplotlib-3.1.2.tar.gz which is a 40MB file with lots of small files. That’s not to say the build was fast, but it’s worth noting.

          In any sense, the final build size as reported by docker image ls was 469MB.

          1. 3

            This does result in a smaller image, but it means every time you change your Python or APK dependencies you need to reinstall both, without relying on Docker layer caching.

            The image size honestly isn’t a big deal, but the build time is brutal.

            1. 3

              So, I finally got back the final sizes and it actually surprised me. 469MB for the alpine version I posted. Much better than your 851MB, but also larger than python:3.8-slim. Maybe it’s leaving around the source files somewhere? Have you made sure that your python:3.8-slim version can actually run code using matplotlib or pandas? I’d assume that they’re missing the actual libraries needed to run the code (essentially the non-dev versions of what you had to install with alpine).

              At this point, I’m not really willing to take more time to investigate this - you’ve sold me. I’m planning on moving all my images to the slim variant if I can.

              1. 3

                I’ve wasted a lot of time on this as well, and I can only recommend you do that. So much less time spent in Dockerfiles, and more in dot-py-ones…

                I ended up with a Debian-based base image with the build dependencies installed and often used dependencies in a pre-created virtual-env, an Ansible-based toolchain to be able to quickly install any additional system and virtualenv python-package dependencies (which will most likely be overkill for most), and a script that is called as last step in every child-Dockerfile and which cleans up most of the junk that is only used at build time. You could probably also do a lot of that with a multi-stage Docker build.

                Anyway, that makes for fairly quick build times, and quite small images. The base image might not be that small, but since it doesn’t chage that often, and all your other images depend on it and can re-use the cached layer, I only have to pay that cost once (per deployed project), and when it updates…

          2. 7

            How is “building something from source takes longer than using a binary” surprising to anyone? The only fault Alpine made was not having a py3-matplotlib package (which appears to exist, but only in their ‘edge’ or unstable repository right now).

            I don’t even particularly like running Alpine on systems and I don’t think this article is at all fair. This isn’t even comparing performance of the resultant image - the build is 50× slower, sure, but how is throughput and performance of the output? It’s entirely possible that if you plan on using this software for many months, the initial build time may be recouped from other system optimisations. It’s equally possible it won’t, but this data is much more relevant a comparison than a one-time build step.

            1. 3

              One of those articles that’s objectively right (everything they list is true), but subjectively wrong (why do it like that? why focus on some things?).

            2. 7

              This is a great time to recognize the work that the folks on PyPA have been doing. Thanks to their efforts, portable wheels like manylinux1 have become popular enough that doing without them is a chore. This is a huge improvement on the packaging ecosystem of just five years past.

              1. 5

                Oh come on, while I get the overall point, why would you even measure the image with all the dev-dependencies still in the alpine image? How big is it with FROM scratch? I do concede the point that it’s more complicated and will take longer to compile.

                1. 4

                  If you’re using Alpine Linux you need to compile all the C code in every Python package that you use.

                  Is this misleading or am I missing something here? There is an Alpine Linux package for pandas, and I’d guess for most popular Python packages. Is there a reason one would prefer to use pip nonetheless?

                  1. 14

                    In my experience of production Python usage, in many organizations over past 15+ years, very few teams uses system packages. Upstream packages from PyPI (or Conda) are much more common because they’re much more frequently updated and much more complete.

                    Where do you see a Alpine package for matplotlib or pandas, BTW? I can’t find them. I can find NumPy, and it’s 1.17.4 in Alpine and 1.18.1 on PyPI.

                    1. 4

                      Their package search isn’t super discoverable. I randomly got a pop-up to use * and managed to find them this way:

                      https://pkgs.alpinelinux.org/packages?name=*pandas*&branch=edge

                      https://pkgs.alpinelinux.org/packages?name=*matplotlib*&branch=edge

                      It looks like they’re newer and only available in edge at the moment though

                    2. 5

                      Perhaps they need to peg their dependencies to a specific version, either for reproducible builds or for ease of maintenance.

                    3. 1

                      Why? Most Linux distributions use the GNU version (glibc) of the standard C library that is required by pretty much every C program, including Python. But Alpine Linux uses musl, those binary wheels are compiled against glibc, and therefore Alpine disabled Linux wheel support.

                      As others have cited, there are mitigating factors here, but a point I’d like to make on this score is: Testing!

                      This strikes me as a perfect use case for something like TestKitchen although I mostly used that back when I was hip deep in the Chef community, and I suspect there are newer/better tools that do the same thing.

                      Spin up your container in CI, make sure it does what you need it to do. It’s extra work but that’s how you build confidence in your code, right?