1. 2

    I’d like to add that, out of the three carriers that will grant the Spanish Institute of Statistics aggregated location data, (Vodafone, Movistar and Orange), Vodafone and Orange let you opt-out via email or their app.

    Movistar hasn’t enabled any opt-out mechanisms yet.

    1. 3

      In this example, the assignment expression helps avoid calling len() twice:

      if (n := len(a)) > 10:
         print(f"List is too long ({n} elements, expected <= 10)")
      

      Um, no it doesnt?:

      a1 = [10, 20, 30]
      n1 = len(a1)
      if n1 > 2:
         print(f'{n1} is greater than two')
      

      https://docs.python.org/3/whatsnew/3.8.html

      1. 2

        But it helps you avoid calling len() twice with one less line! Readability ∝ 1 / source.count('\n').

        With this change we’re one step closer to finally reaching Perl’s level.

        1. 2

          I guess it’s about the scope of the variable, as in Go:

          if n := len(S); n > 0 {
            fmt.Println(n, "characters long")  // n defined inside if statement
          }
          // n undefined here
          
          1. 2

            Just for others reading this: it’s sadly not. n would still be defined after the conditional block. I assume it doesn’t work like that as Python doesn’t really have that level of scoping elsewhere (for loops leak too, etc).

            1. 1

              Wow, I just tested it and you’re right.

              Then I guess it’s only about convenience, but isn’t “sparse better than dense” and “simple better than complex”?

          2. 1

            It’s a shame they didn’t put the code that it was compared to:

            if len(a) > 10:
               print(f"List is too long ({len(a)} elements, expected <= 10)")
            

            Both are 2 lines; your 4-line alternative also leaves extra variables lying around.

            1. 0

              Both are 2 lines; your 4-line alternative also leaves extra variables lying around.

              So? Do you really think less lines is always more readable? How about now?

              v1 = ((((n1 + n2) - n3) * n4) / n5)
              

              I would rather have something like this, even though its more lines and more variables:

              v1 = n1 + n2
              v2 = v1 - n3
              v3 = v2 * n4
              v4 = v3 / n5
              

              Sometimes readability is more important than lines or variables. Not everything is code golf.

              1. 4

                Okay, but you’ve introduced 3 additional variables here. I can imagine it’s possible to confuse v3 and n3 because the names are so similar. Also, you don’t need that many parenthesis.

                v1 = (n1 + n2 - n3) * n4 / n5
                

                because + and - have the same operator precedence, so they’re evaluated in the order of their use. Later, * and / also have the same precedence so another pair of parenthesis can be removed. I think this 1 line is more clear than 4 lines.

                But I understand your point, less lines not always is better than more lines.

                1. 2

                  Do you really think less lines is always more readable?

                  No, I’m just comparing like for like. The intent of the quoted

                  the assignment expression helps avoid calling len() twice

                  was, in my eyes, to make a comparison with the code snippet that I posted, and not with anything else.

                  1. 0

                    So?

                    So scope matters, a lot. In a toy example it’s easy to dismiss this stuff, but scoping your variables as tightly as possible means simplifies all future modifications and refactorings in real-world code, because you’ve minimized the “world” that can be potentially impacted by any changes to the variable or its assignment.

                    Scoping n to just the loop its intended to be used in is strictly preferable to having it visible (and available for misuse) in all code in the outer scope after the loop.

                    1. 8

                      The version with := leaks n too.

                      >>> a = [1,2,3,4]
                      >>> if (n := len(a)) > 10:
                      ...     print(f"List is too long ({n} elements, expected <= 10)")
                      ... 
                      >>> n
                      4
                      
                      1. 5

                        Wow, that’s a disappointing choice

                        1. 3

                          Variables in for loops are scoped similarly:

                          >>> for i in range(43):
                          ...     pass
                          >>> i
                          42
                          
              1. 6

                My early 2015 MacBook Pro is in an Apple-authorized repair shop since Tuesday after the upgrade to Catalina rendered it unbootable (recovery doesn’t work, Internet Recovery doesn’t detect networks, it doesn’t even let me get into Startup Manager by pressing Option during boot to reinstall macOS from a thumb drive).

                Another friend’s 2017 MacBook Pro had to have it’s OS reinstalled.

                My dad’s 2014 MacBook Air upgraded without issues, but has been unstable ever since it finished upgrading.

                People are having problems with Mail.app, with the filesystem, with the 64-bit BS, they couldn’t use Reminders for a week because they released iOS one week earlier with breaking changes, etc.

                This is not the Apple I signed up for. And I’m writing this from a quick and dirty Ubuntu thumb drive, and I have to say I’m surprised with how stable it is, after leaving desktop Linux half a decade ago.

                1. 4

                  This is not the Apple I signed up for. And I’m writing this from a quick and dirty Ubuntu thumb drive, and I have to say I’m surprised with how stable it is, after leaving desktop Linux half a decade ago.

                  Ubuntu’s pretty good. It gets a lot of fit-and-finish work, in my experience, and Mint is a smaller version of the same thing; a lot of people prefer Mint’s Cinnamon GUI over Ubuntu’s default GUI, as well.

                  1. 4

                    Never used Mint.

                    Chose Ubuntu out of interia. It was what I used since I was 9 up until I got the MacBook. It is what I use on servers (Ubuntu Server or CentOS, depends on how I feel like when in the morning) because It Just Works. If I could go back to GNOME 2 / MATE without having to deal with it myself I would (I obviously can, but tweaking DEs is very annoying and honestly, Unity or whatever it is now is not as bad as it was when it was introduced).

                    Thought about going back to Gentoo (which is where I left off), but I don’t have the time right now.

                    1. 3

                      I was on Mint for 5 or 6 years before moving back to Ubuntu. Fewer rough edges and better stability with the latter.

                  1. 6

                    I just use Things. I have no plan to move away from Apple jail ecosystem in the foreseeable future so…

                    1. 3

                      I also use Things, just on my laptop though (I keep my phone off of email, calendar, etc.).

                      Past monday the macOS Catalina update rendered my Macbook unbootable (sent to apple repair yesterday). In the meantime I’m running a live Ubuntu bootable thumb drive.

                      While Things is not available off-Apple, it’s nice they store everything you do in a single SQLite database file. Until I have my Macbook back I’ll be running Things with a SQL editor.

                      1. 1

                        Past monday the macOS Catalina update rendered my Macbook unbootable (sent to apple repair yesterday).

                        Same. Booted in safe mode, turned out it was a bad kext. Updated it and chugging along happily-er now.

                        1. 1

                          Mine doesn’t even respond to the boot time keystrokes in order to boot in safe mode, or verbose, or boot from a thumb drive…

                          I tried everything, but there’s nothing I could do without tearing it down.

                          May I know your model? Because a friend of mine also had his install broken. Also, is the bad kext related to Little Snitch? Thx.

                          1. 1

                            MacBook Pro (15-inch, 2017) – the bad kext was a corporate MDM thing (“Carbon Black”). But yikes, yours sounds muuuuch worse. I could access safe mode. Recovery was working but even once booted into recovery the dialogs were lagging for 5+ minutes.

                      2. 2

                        Things

                        This comment made me check it out, and damn. I’ve been using Todoist for a couple years and this blows it out of the water. Thanks!

                        1. 1

                          +1 for Things. I have a soft spot for the idea of a bullet journal but Things is just so good.

                          1. 3

                            Things is the only software I’ve ever missed after leaving apple.

                            1. 1

                              I have a mac laptop, but an android phone, so I would be hesitant to use Things.

                        1. 1

                          Not a cloud provider, but if you are a DIY person, your hardware requirements aren’t huge and you have the money to invest in the hardware, you could do something similar to what https://solar.lowtechmagazine.com/about/ does.

                          1. 6

                            Cloudflare’s CEO response to this issue (5 months old): https://news.ycombinator.com/item?id=19828317

                            TL;DR: it’s Archive.is’ authoritative nameservers who return bad results (something in the 127.0.0.0/8 block) when 1.1.1.1 queries them. They have discussed internally to band-aid this with some workaround, but they decided it “would violate the integrity of DNS”.

                            1. 4

                              It’s certainly an interesting response from Cloudflare’s CEO:


                              https://news.ycombinator.com/item?id=19828702


                              the integrity of DNS and the privacy and security promises

                              the integrity of DNS and the privacy and security promises

                              (yes, the above phrase is actually mentioned twice in separate sentences)

                              This information leaks information about a requester’s IP and, in turn, sacrifices the privacy of users.

                              motivation for the privacy and security policies of 1.1.1.1

                              geolocation targeting without risking user privacy and security

                              First, they mention privacy as a reason a lot; a total of 5 times in a 5-paragraph snippet; and “security” 4 times. It has now been debunked that their DoH (which they deem even more private and more secure) is less private, not more. https://lobste.rs/s/sno4wu/centralised_doh_is_bad_for_privacy_2019

                              Whole thing doesn’t stand the most basic litmus test — after resolving a name, you still have to make HTTP/HTTPS requests, revealing not just the /24 subnet, but full 32 bits of the IPv4.

                              nationstate actors have monitored EDNS subnet information to track individuals

                              Yeah, this is just FUD, without even any attempt for a double-blind study. Which nationstate actors? Whom did they monitor? How instrumental was ECS, and how did it even came into the picture? It’s not like local regional providers have any reason to employ ECS. Did someone in China switch to an ECS-compliant third-party provider, without just going for a full VPN? Why? (Doesn’t it prove that it’s the third-party providers, like Cloudflare DNS, that facilitate this monitoring?) Whole thing just doesn’t make sense. You can always just monitor the HTTP/HTTPS traffic instead.

                              If you need real security, you gotta use a real VPN, not a fake DNS bandaid.

                              Lack of ECS on a global anycast resolver is not a security and privacy feature; it’s just a poor form to run a global public internet service.


                              We publish the geolocation information of the IPs that we query from.

                              Where? Not in DNS. Not in whois (there’s no rwhois referral, either).

                              Every other provider that doesn’t provide ECS at least has very easy to understand rDNS on the source IP of their resolver; but not Cloudflare:

                              % dig @a.resolvers.level3.net. o-o.myaddr.l.google.com -t txt +short | cut -f2 -d\" | xargs host
                              4.14.0.8.in-addr.arpa domain name pointer dns-8-0-14-4.dallas1.level3.net.
                              % dig @ordns.he.net. o-o.myaddr.l.google.com -t txt +short | cut -f2 -d\" | xargs host
                              2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.7.0.0.0.0.0.0.0.7.4.0.1.0.0.2.ip6.arpa domain name pointer tserv1.dal1.he.net.
                              % dig @one.one.one.one o-o.myaddr.l.google.com -t txt +short | cut -f2 -d\" | xargs host
                              Host 33.220.162.108.in-addr.arpa not found: 2(SERVFAIL)
                              %
                              

                              We are working with the small number of networks with a higher network/ISP density than Cloudflare (e.g., Netflix, Facebook, Google/YouTube) to come up with an EDNS IP Subnet alternative that gets them the information they need for geolocation targeting without risking user privacy and security. Those conversations have been productive and are ongoing. If archive.is has suggestions along these lines, we’d be happy to consider them.

                              Here, contrary to some popular belief that big providers use anycast and don’t need ECS, he’s basically admitting that other providers actually do use ECS, and do benefit from resolvers with ECS. By not doing ECS, Cloudflare DNS makes all competing CDNs slower than their 3.5 billion USD own CDN. This is in case anyone has any doubts that ECS is actually used by the large providers who have the capacity to do anycast.

                            1. 0

                              Am I the only one running IPsec/L2TP?

                              I do so for three reasons: server software comes preinstalled on my gateway (Mikrotik RouterOS), client software is included with iOS/macOS/Android/Windows, and AFAIK is secure (please let me know if not).

                              I’ve looked into Wireguard and I want to try it, but I don’t like my VPN server running on a host inside the network itself, which is much more probable to go offline and lock me out of the network, as opposed of it running on the very gateway to the network.

                              Any thoughts? I don’t have strong opinions regarding VPNs. Keep in mind I use them both for traffic encryption and for access to my network’s internal services.

                              1. 4

                                Your setup may or may not be secure. No one can really say without looking at in detail, because the configuration for IPSec is pretty complex. Worse the protocol complexity induces complex client/server software which is prone to hard to spot implementation mistakes.

                                This is one of the main reasons I try to push people to Wireguard where there are no security relevant config options and the code base is very small. IIRC, Wireguard is about 4000 lines of code vs ~400 000 for an IpSec implementation.

                                As a quick example, CVE-2017-6297 was a bug in MikroTik’s L2TP client where IpSec encryption was disabled after a reboot. In general, I am quite sceptical of the security of dedicated devices like routers. They have fewer ‘good’ eyes on them due to the relative difficulty of pulling apart their hardware/firmware/closed source software and yet their uniformity makes them an attractive target for well resourced attackers.

                                1. 3

                                  L2TP/IPsec can be problematic with hotel wifi and other braindead networks. Not even NAT-T and IKEv2 always help. OpenVPN will cheerfully work even with double (or quadruple) NAT. Nothing against Wireguard, but I didn’t find it nearly as easy to manage and unproblematic as OpenVPN, especially when performance is not a big concern.

                                  I wonder if the future is self-hosted VDI rather than VPN. It’s convenient for use on the road (just reconnect to a session), and much harder to ban, regulate, or persecute people for in countries with censorship.

                                1. 1

                                  The overkill (and much more secure) way of doing this is referencing images by digest:

                                  image: quay.io/ricardbejarano/nginx@sha256:{SHA256_DIGEST}

                                  I believe this is standard for all registries, and at least I’ve made it work on both Docker and Kubernetes (CRI: containerd).

                                  1. 1

                                    I have heard (but not verified) that Docker Hub at least doesn’t consistently keep untagged images permanently available, so old versions can disappear.

                                  1. 4

                                    Hm. Seems overkill. I just tag my docker images with the git hash. Done. Don’t deploy latest, deploy the tag.

                                    1. 1

                                      I have a trigger for the master branch that tags images as :master, and another trigger on all tags that tags to :latest, so my :latest images are the latest tag, so that I can sort of guarantee that :latest is stable and :master is master’s HEAD.

                                      This is on the Docker Hub, Quay.io does this by default if you leave the default build trigger on.

                                      (I also have a third trigger that tags images with the name of the tag ifself, too)

                                    1. 11

                                      If you want to know whether your browser+OS combo would support this: prefers-color-scheme.bejarano.io

                                      What a coincidence, I wrote it this Wednesday!

                                      1. 7

                                        I very recently discussed with a colleague about this.

                                        The pedantic, sometimes supremacist and cult-like tone of some advocacy by the FSF (and specially RMS) throw me off their boat.

                                        1. 1

                                          This is a great solution if you are already running prometheus, or if you are interested in doing so. I do like the simplicity of hchk.io for cases where I don’t want to run prometheus (and related services/tooling like grafana, and push-gateway).

                                          Great idea and writeup though! Next time I have to run prometheus at a job, I’ll definitely keep this in mind for tracking the errant cron jobs that always seems to sneak in there somewhere.

                                          1. 1

                                            As I mentioned in https://blog.bejarano.io/alertmanager-alerts-with-amazon-ses/#sup1, I do not run Grafana or any dashboarding because I consider it worthless and time-consuming to set up.

                                            Thanks for the feedback!

                                            1. 1

                                              At a small scale the expression browser is sufficient (I use it for most of my work), but once you get beyond that something like Grafana is essential.

                                          1. 4

                                            Nice quick write-up, thanks! One small thing: you can self-host healthchecks ( https://github.com/healthchecks/healthchecks ), so the cost argument is a bit unfair. Promethues does have more features, and I reckon one should install it anyway, which is why the solution you outline is still probably better in a lot of circumstances.

                                            1. 1

                                              Oh wow! I didn’t know that. Thanks for the feedback!

                                            1. 10

                                              With the built-in container support in SystemD you don’t even need new tools:

                                              https://blog.selectel.com/systemd-containers-introduction-systemd-nspawn/

                                              …and with good security if you build your own containers with debootstrap instead of pulling stuff made by random strangers on docker hub.

                                              1. 8

                                                The conflict between the Docker and systemd developers is very interesting to me. Since all the Linux machines I administer already have systemd I tend to side with the Red Hat folks. If I had never really used systemd in earnest before maybe it wouldn’t be such a big deal.

                                                1. 5

                                                  …and with good security if you build your own containers with debootstrap instead of pulling stuff made by random strangers on docker hub.

                                                  I was glad to see this comment.

                                                  I have fun playing with Docker at home but I honestly don’t understand how anyone could use Docker Hub images in production and simultaneously claim to take security even quasi-seriously. It’s like using random npm modules on your crypto currency website but with even more opaqueness. Then I see people arguing over the relative security of whether or not the container runs as root but then no discussion of far more important security issues like using Watchtower to automatically pull new images.

                                                  I’m no security expert but the entire conversation around Docker and security seems absolutely insane.

                                                  1. 4

                                                    That’s the road we picked as well, after evaluating Docker for a while. We still use Docker to build and test our containers, but run them using systemd-nspawn.

                                                    To download and extract the containers into folders from the registry, we wrote a little go tool: https://github.com/seantis/roots

                                                    1. 2

                                                      From your link:

                                                      Inside these spaces, we can launch Linux-based operating systems.

                                                      This keeps confusing me. When I first saw containers, I saw them described as light weight VM’s. Then I saw people clarifying that they are really just sandboxed Linux processes. If they are just processes, then why do containers ship with different distros like Alpine or Debian? (I assume it’s to communicate with the process in the sandbox.) Can you just run a container with a standalone executable? Is that desirable?

                                                      EDIT

                                                      Does anyone know of any deep dives into different container systems? Not just Docker, but a survey of various types of containers and how they differ?

                                                      1. 4

                                                        Containers are usually Linux processes with their own filesystem. Sandboxing can be good or very poor.

                                                        Can you just run a container with a standalone executable? Is that desirable?

                                                        Not desirable. An advantage of containers over VMs is in how easily the host can inspect and modify the guest filesystem.

                                                        1. 5

                                                          Not desirable.

                                                          Minimally built containers reduce attack surface, bring down image size, serve as proof that your application builds in a sterile environment and act as a list with all runtime dependencies, which is always nice to have.

                                                          May I ask why isn’t it desirable?

                                                          1. 1

                                                            You can attach to a containerized process just fine from the host, if the container init code doesn’t go out of it’s way to prevent it.

                                                            gdb away.

                                                          2. 3

                                                            I’m not sure if it’s as deep as you’d like, but https://www.ianlewis.org/en/tag/container-runtime-series might be part of what you’re looking for.

                                                            1. 1

                                                              This looks great! Thank you for posting it.

                                                            2. 3

                                                              I saw them described as light weight VM’s.

                                                              This statement is false, indeed.

                                                              Then I saw people clarifying that they are really just sandboxed Linux processes.

                                                              This statement is kinda true (my experience is limited to Docker containers). Keep in mind more than one process can run on a container, as containers have their own PID namespace.

                                                              If they are just processes, then why do containers ship with different distros like Alpine or Debian?

                                                              Because containers are spun up based on a container image, which is essentially a tarball that gets extracted to the container process’ root filesystem.

                                                              Said filesystem contains stuff (tools, libraries, defaults) that represents a distribution, with one exception: the kernel itself, which is provided by the host machine (or a VM running on the host machine, à la Docker for Mac).

                                                              Can you just run a container with a standalone executable? Is that desirable?

                                                              Yes, see my prometheus image’s filesystem, it strictly contains the prometheus binary and a configuration file.

                                                              In my experience, minimising a container image’s contents is a good thing, but for some cases you may not want to. Applications written in interpreted languages (e.g. Python) are very hard to reduce down to a few files in the image, too.

                                                              I’ve had most success writing minimal container images (check out my GitHub profile) with packages that are either written in Go, or that have been around for a very long time and there’s some user group keeping the static building experience sane enough.

                                                              1. 3

                                                                I find the easier something is to put into a docker container, the less point there is. Go packages are the ideal example of this: building a binary requires 1 call to a toolchain which is easy to install, and the result has no library dependencies.

                                                              2. 2

                                                                They’re not just processes: they are isolated process trees.

                                                                Why Alpine: because the images are much smaller than others.

                                                                Why Debian: perhaps because reliable containers for a certain application happen to be available based on it?

                                                                1. 1

                                                                  Afaik: Yes, you can and yes, it would be desirable. I think dynamically linked libraries were the reason why people started to use full distributions in containers. For a Python environment you would probably have to collect quite a few different libraries from your OS to copy into the container so that Python can run.

                                                                  If my words are true then in the Go environment you should see containers with only the compiled binary? (I personally installed all my go projects without containers, because it’s so simple to just copy the binary around)

                                                                  1. 3

                                                                    If you build a pure Go project, this is true. If you use cgo, you’ll have to include the extra libraries you link to.

                                                                    In practice, for a Go project you might want a container with a few other bits: ca-certificates for TLS, /etc/passwd and /etc/group with the root user (for “os/user”), tzdata for timezone support, and /tmp. gcr.io/distroless/static packages this up pretty well.

                                                                    1. 1

                                                                      You can have very minimal containers. Eg. Nix’s buildLayeredImage builds layered Docker images from a package closure. I use it to distribute some NLP software, the container only contains glibc, libstdc++, libtensorflow, and the program binaries.

                                                                1. 2

                                                                  The “only one RUN block” thing really irks me, every time I see it recommended. Trying to minimize layers this way is an uphill battle, and if you are concerned about it you will be tempted to make bad decisions like omitting LABELs or ENVs etc that make your image much more useful at the expense of layers. Use multiple RUN commands; benefit from individual RUN command caching and more legible Dockerfiles; use a tool to squash your layers post-build. Consider using something like cekit as your image source instead of Dockerfiles.

                                                                  “Build your image FROM scratch” is bad advice (and a misleading heading, because the section talks about not doing this, but using multi-stage builds. Which is a good idea.)

                                                                  I particularly like the “Keep your images in two registries simultaneously” advice.

                                                                  1. 2

                                                                    “Build your image FROM scratch” is bad advice (and a misleading heading, because the section talks about not doing this, but using multi-stage builds. Which is a good idea.)

                                                                    May you elaborate on why using FROM scratch as the last stage is bad advice? I sincerely wouldn’t want something to be wrong in the post. Thanks!

                                                                    1. 2

                                                                      What you are actually suggesting in the article is fine advice, IMHO: but the heading “Build your image FROM scratch” is a little misleading because you aren’t building FROM scratch; your build phase is doing so from (in this case) Debian; which is fine.

                                                                      1. 2

                                                                        Oh, alright!

                                                                        I’ll try to change it up for something more precise. Thanks!

                                                                  1. 1

                                                                    Hey @ricardbejarano – just wanted to flag that your code portions are unreadable. It’s turning the background grey with a white text on top.

                                                                    1. 1

                                                                      Hey, thanks for pointing it out!

                                                                      I’m working on it.

                                                                      Edit: should be ok by now.

                                                                    1. 1

                                                                      I’m loving the “log stdout” part, everything else can basically be ignored.

                                                                      1. 2

                                                                        That’s definitely an improvement over the syslog situation, at least for our deployments. The native Prometheus export is neat as well; saves having to build an adapter to run alongside for metrics.

                                                                        1. 2

                                                                          I was partly joking. Not being able to log to stderr or stdout has caused so many problems because it’s basically impossible to debug haproxy without syslog being present (and HAProxy has the annoying tendence to stop logging if syslog hangs up such as happens when the network has a hiccup in an rsyslog sitaution)

                                                                          1. 1

                                                                            That exporter is one of the oldest: https://github.com/prometheus/haproxy_exporter

                                                                            1. 2

                                                                              Nope, this is a new, exporter-less endpoint, built into HAProxy itself: https://www.haproxy.com/blog/haproxy-exposes-a-prometheus-metrics-endpoint/

                                                                        1. 7

                                                                          but most Docker images that use it are badly configured.

                                                                          Man, this has just been the trend lately hasn’t it? Official Java images using “mystery meat” builds, root users improperly configured in Alpine for years.

                                                                          This isn’t meant to be an off-topic rant, but I think simply backs up the central point of the article. Containers are subtly different than a VM, but on the other hand, they’re also not “just a process” as some would like to believe. The old rules of system administration still apply and pretending that containers will fix all your problems puts you in situations like this.

                                                                          I’m a huge advocate of containers, but they’re definitely easy to learn and difficult to master. If you want to run containers in production reliably, you need to have a solid understanding of all the components which support them.

                                                                          1. 6

                                                                            I’m a huge advocate of containers, but they’re definitely easy to learn and difficult to master. If you want to run containers in production reliably, you need to have a solid understanding of all the components which support them.

                                                                            Yes, it’s a massive amount of details. I’m working on a prepackaged template for Dockerizing Python applications (https://pythonspeed.com/products/pythoncontainer/), and to build good images you need to understand:

                                                                            1. Dockerfile format, including wacky stuff about “use syntax A not syntax B if you want signals handling to work”. (see https://hynek.me/articles/docker-signals/ for that and all the other tiny details involved).
                                                                            2. Docker layer-image format and its impact on image size
                                                                            3. Docker’s caching model, as it interacts with 1 and 2.
                                                                            4. The way CI will break caching (wrote about this bit in earlier post: https://pythonspeed.com/articles/faster-multi-stage-builds/)
                                                                            5. The details of the particular base image you’re choosing, and why. E.g. people choose Alpine when it’s full of jagged broken edges for Python (and probably other langauges)—https://pythonspeed.com/articles/base-image-python-docker-images/
                                                                            6. Operational processes you need to keep system packages and Python packages up-to-date.
                                                                            7. Enough Python packaging to ensure you get reproducible builds instead of random updates every time.
                                                                            8. Random stuff like the gunicorn config in this post (or if you’re using uWSGI, the 7 knobs you have to turn off to make uWSGI not broken-out-of-the-box).
                                                                            9. Be wary enough of bash to either avoid it, or know about bash strict mode and shellcheck and when to switch to Python.
                                                                            10. Not to run container as root.

                                                                            And then there’s nice to haves like enabling faulthandler so when you segfault you can actually debug it.

                                                                            And then there’s continuous attention to details even if you do know all the above, like “oh wait why is caching not happening suddenly… Oh! I’m adding timestamp as metadata at top of the Dockerfile and that invalidates the whole cache cause it changes every time.”

                                                                            Some of this is Dockerfiles being no good, but the majority is just the nature of ops work.

                                                                            1. 2

                                                                              At the risk of going even further off-topic: do you have any recommendations for properly applying the old rules of system administration to containers? For example, frequent updating of docker containers could be a cron job that stops the container, then rebuilds with the dockerfile (unless there’s a better way to do live-updating?), but how do you handle things like setting a strong root password or enabling ssh key auth only (with pre-configured accepted keys) when the container configuration is under public source control?

                                                                              1. 2

                                                                                Typically you don’t run ssh in a container, so that’s less of an issue. For rebuilds: Dockerfile is more of an input, so update process is usually:

                                                                                1. Build new image from Dockerfile.
                                                                                2. When that’s done, kill old container.
                                                                                3. Start new container.

                                                                                And you really want to rebuild the image from scratch (completely, no caching) once a week at least to get system package updates.

                                                                                1. 1

                                                                                  There are legitimate cases where running ssh in a container is desired (e.g. securely transferring data).

                                                                                  Anyways, what about the root password bit of my question?

                                                                                  1. 2

                                                                                    First thing that comes to mind: you can copy in a sshd config that allows public key auth only, and pass in the actual key with a build arg (https://docs.docker.com/engine/reference/commandline/build/#set-build-time-variables---build-arg)

                                                                              2. 2

                                                                                Containers are subtly different than a VM, but on the other hand, they’re also not “just a process” as some would like to believe.

                                                                                I might be one of the “some” that “would like to believe”, as I said containers are isolated processes in a recent blog post.

                                                                                I still think containers are processes. Let me explain and please correct me if I’m wrong:

                                                                                A container might not end up as a single process, but it definitely begins as one, execing into the container’s entrypoint and maybe forking to create children. Therefore they might not be a single process but a tree, with the one that execs into the image’s entrypoint as the root of the tree.

                                                                                And while under my logic of container = process you could call everything a process (i.e., the operating system, because it all begins with PID 1), it’s not all that wrong, and I said so in my post so that people realised containers are more like processes than VMs.

                                                                                Therefore, is it right then, to define “container” as “a heavily isolated process tree”?

                                                                                1. 1

                                                                                  That definition makes sense if you use the terms “docker container” and “containers” interchangeably, but that is not the case (as you point out in your article!). Containers are a collection of namespaces and cgroups; are emphasized as a container literally consists of these underlying linux-provided components.

                                                                                  Seeing that you can create all of these items that make up a “container” without ever having a process running within it I think that’s a good reason that the “a heavily isolated process tree” definition is not accurate.

                                                                                  1. 1

                                                                                    Ooh… Alright, thanks!

                                                                              1. 1

                                                                                The best base image you can use is FROM scratch + Python + your source.

                                                                                1. 7

                                                                                  I’m surprised that musl is bigger than glibc. Isn’t size and simplicity like the whole point of musl?

                                                                                  1. 4

                                                                                    Yup, I was surprised at it too!

                                                                                    I’ll try to get the musl image to build statically, as that’s what musl was thought for, and in containers there’s no point in dynamically linking. So that might reduce size.

                                                                                    But yeah, weird.

                                                                                    Taking a glance with dive I see the musl-based binary takes 14MB while the glibc image has a 3.5MB HAProxy binary. The /lib folder takes 7.9MB in glibc and 5.3MB in musl.

                                                                                    Weird. I’ll look into in over the weekend. Thanks!

                                                                                    1. 6
                                                                                      $ ls -lh /lib/libc-2.29.so /lib/musl/lib/libc.so /lib/libc.a /lib/musl/lib/libc.a
                                                                                      -rwxr-xr-x 1 root root 2.1M Apr 17 21:11 /lib/libc-2.29.so*
                                                                                      -rw-r--r-- 1 root root 5.2M Apr 17 21:11 /lib/libc.a
                                                                                      -rw-r--r-- 1 root root 2.5M Apr 16 13:49 /lib/musl/lib/libc.a
                                                                                      -rwxr-xr-x 1 root root 595K Apr 16 13:49 /lib/musl/lib/libc.so*
                                                                                      

                                                                                      Sounds like an issue with compile flags or whatnot.

                                                                                      1. 1

                                                                                        The author doesn’t need the .a, do they? I thought the .a were the static libraries. I don’t think I’ve seen them since I did a Linux From Scratch build over 10 years ago.

                                                                                        1. 2

                                                                                          I’m not sure how cc -static works to be honest, I just included it to demonstrate that musl is smaller on my system both as a dynamic library and a static one.

                                                                                          1. 1

                                                                                            cc -static should look into the .a file (which is an archive of .o files) and pick out only the parts actually needed to build the static binary and then you don’t need the .a file anymore.

                                                                                    2. 1

                                                                                      I’ve been looking into if during my free time. I’ve managed to get rid of all other shared objects but libc, whose removal would cause a segfault. Adding CFLAGS="-static" and LDFLAGS="-static" to the make step doesn’t help.

                                                                                      It does not reduce binary size, though, right now it’s down to 18.6MB the image and 17.2M the binary (with the other objects statically linked, of course).

                                                                                      See the changes in this branch.