1. 4

    https://www.robustperception.io/blog covers the Prometheus monitoring system, how to use it and why it is the way it is.

    1. 2

      Could one store the IP address of the initial request that causes you to generate a JWT in the token itself? Then you can validate that the current request comes from the same IP. If they’re different, then force them to log in again from their current IP.

      The user would need to re-login if they turn on a VPN or change locations, but that’s a small price to pay if that reduces the possibility for certain types of attacks. I’m definitely not a security expert, but working on a fairly sensitive app where a breach would be bad for a user. The fact that I haven’t seen this suggested next to more complex safeguards makes me think there’s a fundamental flaw in it that I’m just not thinking of.

      1. 5

        IPs aren’t a great factor to base stuff like this one, although that’s a good idea.

        I think what’s better is something like token binding (https://datatracker.ietf.org/wg/tokbind/documents/) which is a way to pin a certain token to a specific TLS session. This way you have some basic guarantees. But in the real world things are sorta messy =p

        1. 2

          Most home users would have to re log in every day. Services that tie my login to an IP address piss me off so much because they are constantly logging me out.

          1. 2

            The fact that I haven’t seen this suggested next to more complex safeguards makes me think there’s a fundamental flaw in it that I’m just not thinking of.

            It’s not a safe presumption that a users requests will always come from the same IP - even from request to request. Their internet access could be load balanced or otherwise change due to factors like roaming.

            1. 1

              Yeah that is also a common technique for cookies. If the remote IP changes you can invalidate the cookie.

            1. 4

              Nice article.

              Beware that InstrumentHandler is deprecated, and the functions in https://godoc.org/github.com/prometheus/client_golang/prometheus/promhttp are the recommend replacement.

              Splitting out latency with a success/failure label is also not recommended as a) if you have only successes or only failures, your queries break and b) users tend to create graphs of only success latency and miss all those slow failing requests. Separate success and failure metrics are better, and also easier to work with in PromQL.

              1. 3

                Thanks for the suggestions Brian! promhttp package contains even more nice things like in flight requests. Maybe we should explicitly say ok in docs that InstrumentHandler is deprecated in favor of promhttp types? I don’t mind making a PR in docs

                1. 1
              1. -1

                [Title] /proc/<pid>/stat is broken

                This sounds serious! Is the content of the pseudo-file associating incorrect PIDs or parent PIDs to processes?

                Let’s continue…

                Documentation (as in, man proc) tells us to parse this file using the scanf family, even providing the proper escape codes - which are subtly wrong.

                So it’s a documentation issue…

                When including a space character in the executable name, the %s escape will not read all of the executable name, breaking all subsequent reads

                I have literally never encountered an executable with a space in the name, although it’s perfectly legal from a file name perspective. (I’ve been a Linux user since 1998).

                The only reasonable way to do this with the current layout of the stats file would be to read all of the file and scan it from the end […]

                So… let’s do this instead?

                The proper fix (aside from introducing the above function) however should probably be to either sanitize the executable name before exposing it to /proc//stat […]

                Sounds reasonable to me.

                […], or move it to be the last parameter in the file.

                Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?

                This problem could potentially be used to feed process-controlled data to all tools relying on reading /proc//stat

                I can’t really parse this. Do you mean “affect” instead of “used”?

                In conclusion: I can’t see any evidence of the functionality of this proc pseudo-file being “broken”. You have encountered an edge case (an executable name with a whitespace character in it). You’ve even suggested a workaround (scan from the end). If you had formulated this post as “here’s a workaround for this edge case” I believe you would have made a stronger case.

                1. 5

                  I have literally never encountered an executable with a space in the name

                  Well, tmux does this, for example. But my primary concern is not has it ever happened to me but, if it happens, what will my code do?. As this is a silent failure (as in, the recommended method fails in a non-obvious way without indicating failure), no action is taken by most implementations to guard against this. That, in my mind, counts as broken, and the least thing to do is to fix the documentation. Or expose single parameters in files instead of a huge conglomeration with parsing issues. Or… see above.

                  So… let’s do this instead?

                  I do, but only after I got sceptical while reading the documentation, ran some tests and had my hunch confirmed. Then I checked to see others making that very mistake.

                  Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?

                  No, I don’t think so - except for introducing single-value files (and leaving /proc/<pid>/stats be as it is).

                  This problem could potentially be used to feed process-controlled data to all tools relying on reading /proc//stat

                  I can’t really parse this. Do you mean “affect” instead of “used”?

                  Admittedly, English is not my first language, I do however think that sentence parses just fine. The discussed problem (which is present in several implementations based on the documentation), can potentially be used to inject data (controlled by the process, instead of the kernel) into third-party software.

                  In conclusion: I can’t see any evidence of the functionality of this proc pseudo-file being “broken”.

                  That depends on your view of broken - if erroneous documentation affecting close to all software relying on it with a silent failure does not sound broken to you, I guess it is not.

                  You have encountered an edge case (an executable name with a whitespace character in it).

                  I actually did not encounter it per se, I just noticed the possibility for it. But it is an undocumented edge case.

                  You’ve even suggested a workaround (scan from the end).

                  I believe that is good form.

                  If you had formulated this post as “here’s a workaround for this edge case” I believe you would have made a stronger case.

                  Maybe, but as we can see by the examples of recent vulnerabilities, you’ll need a catchy name and a logo to really get attention, so in my book I’m OK.

                  1. 1

                    Thanks for taking the time to answer the questions I have raised.

                    The discussed problem (which is present in several implementations based on the documentation), can potentially be used to inject data (controlled by the process, instead of the kernel) into third-party software.

                    Much clearer, thanks.

                    On the use of “broken”

                    I’m maybe extra sensitive to this as I work in supporting a commercial software application. For both legal and SLA[1] we require our customers to be precise in their communication about the issues they face.

                    [1] Service level agreement

                    1. 1

                      Followup: can you give a specific example of how tmux does this? I checked the running instances of that application on my machine and only found the single word tmux in the output of stat files of the PIDs returned by pgrep.

                      1. 2

                        On my Debian 9 machine, when starting a tmux host session, the corresponding /proc/<pid>/stat file contains:

                        2972 (tmux: client) S 2964 2972 2964 […]

                  2. 3

                    “Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?”

                    I will never get the 100ms it took to read this sentence back….

                    1. 1

                      I dunno, maybe just duplicate the information at the end of the current format, in the author’s preferred format, and delimited by some character not otherwise part of the spec.

                      It’s not trivial, though.

                      That was my point.

                    2. 1

                      this was clearly overlooked when the api was designed, nobody is parsing that file from the end and nobody is supposed to

                      1. -1

                        What was overlooked? That executables can have whitespace in their names?

                        I can agree that this section of the manpage can be wrong (http://man7.org/linux/man-pages/man5/proc.5.html, search for stat):

                        (2) comm  %s
                            The filename of the executable, in parentheses.
                            This is visible whether or not the executable is
                            swapped out.
                        

                        From the manpage of scanf:

                        s: Matches a sequence of non-white-space characters; the next
                            pointer must be a pointer to the initial element of a
                            character array that is long enough to hold the input sequence
                            and the terminating null byte ('\0'), which is added
                            automatically.  The input string stops at white space or at
                            the maximum field width, whichever occurs first.
                        

                        So it’s clear no provision was made for executables having whitespace in them.

                        This issue can be simply avoided by not allowing whitespace in executable names, and by reporting such occurrences as a bug.

                        1. 8

                          This issue can be simply avoided by not allowing whitespace in executable names, and by reporting such occurrences as a bug

                          Ahhh, the Systemd approach to input validation!

                          Seriously, if the system allows running executables with whitespace in their names, and your program is meant to work with such a system, then it needs to work with executables with whitespace in their names.

                          I agree somewhat with the OP - the interface is badly thought out. But it’s a general problem: trying to pass structured data between kernel and userspace in plain-text format is, IMO, a bad idea. (I’d rather a binary format. You have the length of the string encoded in 4 bytes, then the string itself. Simple, easy to deal with. No weird corner cases).

                          1. 1

                            I agree it’s a bug.

                            However, there’s a strong convention that executables do not have whitespace in them, at least in Linux/Unix.[1]

                            If you don’t adhere to this convention, and you stumble across a consequence to this, does this mean that a format that’s been around as long as the proc system is literally broken? That’s where I reacted.

                            As far as I know, nothing crashes when you start an executable with whitespace in it. The proc filesystem isn’t corrupted.

                            One part of it is slightly harder to parse using C.

                            That’s my take, I’m happy to be enlightened further.

                            I also agree that exposing these kind of structures as plain text is arguably … optimistic, and prone to edge cases. (By the way, isn’t one of the criticisms of systemd that it has an internal binary format?).

                            [1] note I’m just going from personal observation here, it’s possible there’s a subset of Linux applications that are perfectly fine with whitespace in the executable name.

                            1. 3

                              I agree with most of what you just said, but I myself didn’t take “broken” to mean anything beyond “has a problem due to lack of forethought”. Maybe I’m just getting used to people exaggerating complaints (heck I’m surely guilty of it myself from time to time).

                              It’s true that we basically never see executables with a space (or various other characters) in their names, but it can be pretty frustrating when tools stop working or don’t work properly when something slightly unusual happens. I could easily see a new-to-linux person creating just such an executable because they “didn’t know better” and suffering as a result because other programs on their system don’t correctly handle it. In the worst case, this sort of problem (though not necessarily this exact problem) can lead to security issues.

                              Yes, it’s possible to correctly handle /proc/xxx/stat in the presence of executables with spaces in the name, but it’s almost certain that some programs are going to come into existence which don’t do so correctly. The format actually lends itself to this mistake - and that’s what’s “broken” about it. That’s my take, anyway.

                              1. 2

                                Thanks for this thoughtful response. I believe you and I are in agreement.

                                Looking at this from a slightly more usual perspective, how does the Linux system handle executables with (non-whitespace) Unicode characters?

                                1. 3

                                  Well, I’m no expert on unicode, but I believe for the most part Linux (the kernel) treats filenames as strings of bytes, not strings of characters. The difference is subtle - unless you happen to be writing text in a language that uses characters not found in the ASCII range. However, UTF-8 encoding will (I think) never cause any bytes in the ASCII range (0-127) to appear as part of a multi-byte encoded character, so you can’t get spurious spaces or newlines or other control characters even if you treat UTF-8 encoded text as ASCII. For that reason, it poses less of a problem for things like /proc/xxx/stat and the like.

                                  Of course filenames being byte sequences comes with its own set of problems, including that it’s hard to know encoding should be used to display filenames (I believe many command line tools use the locale’s default encoding, and that’s nearly always UTF-8 these days) and that a filename potentially contains an invalid encoding. Then of course there’s the fact that unicode has multiple ways of encoding the exact same text and so in theory you could get two “identical” filenames in one directory (different byte sequences, same character sequence, or at least same visible representation). Unicode seems like a big mess to me, but I guess the problem it’s trying to solve is not an easy one.

                                  (minor edit: UTF-8 doesn’t allow 0-127 as part of a multi-byte encoded character. Of course they can appear as regular characters, equivalent to the ASCII).

                                  1. 1
                                    ~ ❯ cd .local/bin
                                    ~/.l/bin ❯ cat > ą << EOF
                                    > #/usr/bin/env sh
                                    > echo ą
                                    > EOF
                                    ~/.l/bin ❯ chmod +x ą 
                                    ~/.l/bin ❯ ./ą
                                    ą
                                    
                                2. 2

                                  If you don’t adhere to this convention, and you stumble across a consequence to this, does this mean that a format that’s been around as long as the proc system is literally broken?

                                  Yes; the proc system’s format has been broken (well, misleadingly-documented) the whole time.

                                  As you note, using pure text to represent this is a problem. I don’t recommend an internal, poorly-documented binary format either: canonical S-expressions have a textual representation but can still contain binary data:

                                  (this is a canonical s-expression)
                                  (so "is this")
                                  (and so |aXMgdGhpcw==|)
                                  

                                  An example stat might be:

                                  (stat
                                    (pid 123456)
                                    (command "evil\nls")
                                    (state running)
                                    (ppid 123455)
                                    (pgrp 6)
                                    (session 1)
                                    (tty 2 3)
                                    (flags 4567)
                                    (min-fault 16)
                                    …)
                                  

                                  Or, if you really cared about concision:

                                  (12345 "evil\nls" R 123455 6 1 16361 4567 16 …)
                                  
                              2. 3

                                nobody is parsing that file from the end

                                As an example the Python Prometheus client library uses this file, and allows for this.

                          1. 3

                            Nope. It’s 2017, it’s time to stop parsing strings with regular expressions. Use structured logging.

                            No thanks! I’ll stick to strings.

                            1. 2

                              Could you please explain why? Your comment, as it is, is not bringing any value.

                              1. 1

                                Not the OP but here’s why I don’t like structured logging

                                • logs will ultimately be read by humans and extra syntax gets in the way.
                                • structured logging tends to bulk the log with too much useless information.
                                • most of the use cases of structured logging could be better handled via instrumentation/metrics.
                                • string based logs can be emitted by any language without dependencies so every system you manage could have compatible logging.

                                Arguably a space separated line is a fixed-schema structured log with the least extraneous syntax possible.

                                1. 6

                                  To me (in the same order):

                                  • logs are ultimately read by human once correctly parsed/sorted. Which means that it should be machine readable first so that it can be processed easily to create a readable message.
                                  • Too much informations is rarely a problem with logging, but not enough context is often an issue.
                                  • Probably, but structured logging still offers some simpler ways for this.
                                  • You just push the format problematic from the sender (that can use a simple format) to the receiver (that has to parse different formats according to what devs fancy)

                                  To me the best recap on why I like structured logging is: https://kartar.net/2015/12/structured-logging/

                                  1. 2

                                    most of the use cases of structured logging could be better handled via instrumentation/metrics.

                                    Speaking as a developer of Prometheus, you need both. Metrics are great for an overall view of the system and all its subsystems, but can’t tell you about every individual user request. Logs tell you about every request, but are limited in terms of understanding the broader system.

                                    I’ve wrote a longer article that touches on this at https://thenewstack.io/classes-container-monitoring/

                              1. 4

                                The public Prometheus/Grafana dashboard for the streaming: https://dashboard.congress.ccc.de/?refresh=5m&orgId=1

                                1. 3

                                  I’d recommend reading My Philosophy on Alerting. Systems like Prometheus are designed to allow more sophisticated alerting, such as predicting when a disk will fill.

                                  1. 1

                                     I had to spend time justifying the presence of that ‘0’ many times over the years as other developers questioned its purpose.

                                    I don’t get why he wouldn’t just put a code comment, or a regression test that checks memory usage. Then he doesn’t need to say anything.

                                    1. 10

                                      Per the article, he did both of those.

                                      1. 3

                                        That’s what I get for skimming and commenting. I deserve to look dumb there.

                                    1. 9

                                      tl;dr sudo parses /proc, fucks up

                                      1. 18

                                        I think that sells it a little short. There’s something to be said about a system design where parsing strings in /proc is a thing, how A leads to B leads to root, etc.

                                        1. 4

                                          It also illustrates why procfs in and of itself is bad for security.

                                          1. 4

                                            I’d take it more that hard to parse formats due to a poor choice of how to handle field separators that appears in your data lead to bugs. This particular parsing issue is one I’ve run into myself.

                                            1. 4

                                              Really it illustrates that plain text (byte sequences) as popularized by unix is a poor interface format. Unfortunately it continues to be popular because C’s retrograde type system and poor literal support discourage those who still write C from using anything better.

                                              1. 3

                                                This really has nothing to do with C, and the language blaming is unwarranted. There are perfectly good (C) APIs that do not involve parsing text, and such an API could have been used here. But some people think parsing text is still the way to go.

                                                1. 2

                                                  For most things it’s very difficult to express a good API without sum types. In C you can’t even fake them with polymorphism and the visitor trick.

                                              2. 2

                                                Human readable formats vs machine readable formats, really…

                                            2. 2

                                              What does OpenBSD do there?

                                              1. 2

                                                In general, sysctl.

                                          1. 5

                                            Something I’ve noticed is docs that are reference have a tendency over time to expand also into user guides.

                                            I’ve never seen it work out, as the two use cases are very different.

                                            One requires quite technical and specific information. If you mix in guides, the reader is left to carefully read the guide to see if there’s a subtlety explained therein that is relevant.

                                            The other is more along the lines of a tutorial. Mentioning all the fine print only confuses a new user, as such details aren’t relevant to them at this stage.

                                            1. 5

                                              Strong agree. Long ago, I gave a talk about this: https://air.mozilla.org/rust-meetup-december-2013/

                                              TL;DR, API docs, guides, and reference materials have three different audiences, and so need to be three different things.

                                              1. 1

                                                Sorry, I’m too lazy to watch the talk. I can understand the difference between API docs and guides. What makes reference materials different to API docs?

                                            1. 2

                                              Processing of FOSDEM videos is ongoing, about 80% are processed. 544 are currently available.

                                              1. 0

                                                See, if we wanted to make the world a better place, everyone would pick a month to submit patch requests to all open source projects to replace their config files with JSON and to add a converter for legacy configs.

                                                And yes, I also want a pony. And world peace.

                                                1. 30

                                                  Wait, making every configuration file with JSON would make the world a better place? I think it would turn it far worse! A lack of comments in my configuration file makes future me confused and frustrated at past me.

                                                  1. 5

                                                    When I’m stuck with JSON-for-config, I’ll sometimes duplicate keys in the file - putting the comment in the first one. Every JSON decoder I’ve used so far ignores the first value and takes the second one.

                                                    I’m well aware of how insane that sounds, but I’d rather have comments than sensible files.

                                                    1. 3

                                                      I do something similar but make it an _{{key}}-comment (e.g. password has a sibling _password-comment before it). This works great in config files as it can be ignored by the application. It does fall apart in something like package dependencies though =(

                                                      1. 3

                                                        I do something similar, except I duplicate a specific key, so order is not important:

                                                        {
                                                          "//": "Here's a comment",
                                                          …
                                                          "//": "Another comment"
                                                        }
                                                        

                                                        But yeah, I’d rather just avoid JSON for such things.

                                                        1. 1

                                                          Some of these problems are addressed by JSON5: http://json5.org/ There is obviously the whole https://xkcd.com/927/ problem, but IMO it’s not as bad as it is with something dramatically different like TOML.

                                                          But in general I think it’s much better for a whole host of reasons to use Lua for configuring end-user applications.

                                                          1. 1

                                                            JSON5

                                                            Kind of reminds me of UCL.

                                                        2. 2

                                                          One problem with this, and with @alva’s solution below, is that the comments won’t survive deserialization/serialization in a meaningful way. @twolfson’s solution resolves that, though there’s still no guarantee the comment key is serialized anywhere near the key it is meant to be commenting.

                                                        3. 1

                                                          Having all configuration files in the same format would make the world a better place.

                                                          JSON does have some technical issues though, such as inability to represent all floating point values.

                                                          1. -1

                                                            Is the world really betterr off having different formats for grub, xorg, Apache, nginx, rust, npm, and all the other myriad ways of storing the same flavor of data?

                                                            1. 7

                                                              No, but there are enough standards for this that don’t suck as much for configuration data as json.

                                                              1. 1

                                                                yeah, all three of the systems suggested in the OP are better than json. i believe yaml has some parsing issues, but the other two look very clean and usable.

                                                                1. -1

                                                                  Other than the common “muh comments” complaint, why do you think JSON sucks for configuration data?

                                                                  1. 12

                                                                    Two reasons that come to mind for me are: you get a lot of syntactic ceremony that other formats don’t require (every file starting with ‘{’, string quoting, commas between list items, etc.) and the selection of types is odd. Your JSON parser is going to convert bare numbers to numbers but you have to quote your strings. What do you do if you have other types, like URLs or date/times or something particular to your application? You’ll have to quote them as strings and then do a second pass over your JSON configuration object to convert it to something else. What happens if your language has a more interesting suite of numeric types than JSON does? Do you have to tell people to quote their numbers so that you can parse them properly? The behavior is hidden from you by your JSON parser, so you aren’t likely to be able to detect when something like this has gone wrong.

                                                                    I like my code to require a certain amount of ceremony to catch problems before running, but I like my configuration files to be fairly lenient; if I can recover something from them, I can alert the user that the parse failed on these items or whatever and proceed. These options get narrowed when you conflate a programming language with a configuration language.

                                                                    In fact, even my JSON parser, I want it to be strict when I’m dealing with user input or form submissions or whatever, but if I’m reading a file like ~/.foorc, I want to be lenient. Does your JSON parser have options for that?

                                                                2. 5

                                                                  Just because there is not one standard it doesn’t mean that JSON is a suitable one. In fact I would even prefer XML over JSON for this particular usecase, just because it has comments.

                                                              2. 7

                                                                JSON config files?! Do you have a shrine to Satan in your house as well?

                                                                1. 1

                                                                  I might not mind if most json parsers sucked totally when it comes to error messages. rather i would prefer a good config parser which generates error messages and documentation WITH EXAMPLES.

                                                                1. 2

                                                                  I think what the author of this article doesn’t understand is that Java and Go are very different languages and as such the characteristics of a good GC for Java are quite different from a good GC for Go. Java creates an order of magnitude more garbage than Go, often creating large numbers of objects with very short lifetimes. This means the GC is very stressed all the time in aggressively cleaning up short lived garbage. Go on the other hand allocates most small objects on the stack so the GC never has to worry about them at all. This frees up the GC to deal with mostly larger, mostly longer lived objects which is a much easier task than the Java GC has to deal with.

                                                                  The other side of GC being an easier task for Go is that it’s much easier to optimise. When you think about it it’s not surprising that Go’s GC performs significantly better than Java’s. It’s just a much easier task.

                                                                  1. 13

                                                                    This means the GC is very stressed all the time in aggressively cleaning up short lived garbage. Go on the other hand allocates most small objects on the stack so the GC never has to worry about them at all.

                                                                    I develop a Go application which with only moderate load generates 100MB/s of very short lived, small objects on the heap. This results in many gigabytes of RSS overhead which a different GC design may not have.

                                                                    Which GC is appropriate is more about allocation patterns than languages.

                                                                  1. 7

                                                                    I’ve always trusted my ASUS laptops and I’ve heard good things about their ZenBook. It’s aluminum, seems to have decent specs for development, and reportedly works fine booting Linux distros.

                                                                    1. 7

                                                                      Can vouch for the ZenBook. Currently on the Asus ZenBook UX305. It’s a great laptop, and runs Linux perfectly (I’ve even run OpenBSD on it with very little trouble, apart from lacking trackpad support). Only issue with the hardware is the screen hinge is a little loose, so it flops around a bit (but I’m sure if I bothered to pop it open I could tighten the screw), but apart from that it’s great.

                                                                      1. 4

                                                                        Thirding the UX305. I bought a Broadwell-based one earlier this year to play with OpenBSD based on a @tedu post. Outside of the trackpad, it’s been great for on-the-go and taking to meetups.

                                                                        1. 1

                                                                          Forthing the UX305. Nice and light. SSD could be bigger and the aluminium scratches a bit easily but a nice performance/price point.

                                                                        2. 1

                                                                          My friend has a UX305 with Broadwell running Ubuntu he uses and he seems to really like it. It looks pretty nice!

                                                                          1. 1

                                                                            I have one for work and it’s pretty nice. The only caveat would be that if you live in Canada, avoid the one that has the French/English keyboard. It has a split the shift key on the left side and it is… not optimal.

                                                                        1. 4

                                                                          I’d forgotten that Stephen Fry had mentioned gNewSense. We managed to cram 3 distinct puns into that name.

                                                                          1. 7

                                                                            I own a B2B business and take payments directly into the (Irish) company current account.

                                                                            One thing I’ve found is that most transfers from the US will have between €6 and €30 mysteriously missing from them due to bank fees. I have this covered in my standard contract, but it is a bit of a pain to deal with.

                                                                            It has worked fine from every other country I’ve invoiced, both EU and non-EU. The only thing to be aware of is that the SWIFT code and the BIC are the same thing, and since I added that titbit to my invoices it hasn’t come up since.

                                                                            1. 3

                                                                              International wire transfer usually has an associated charge (usually between $10 and $50), but it’s charged by the receiving bank. This means that the sender has to know about the relationship between their bank and your bank to add the appropriate amount, or they need to instruct their bank to revert all fees to the sender (usually a separate option in the user-interface).

                                                                              I generally don’t have to worry so much about $50 missing on an invoice, and simply include it in the next invoice.

                                                                              1. 3

                                                                                My bank has no special charge for receiving transfers, as long as they’re in Euros.

                                                                                1. 2

                                                                                  I lost 80$ this summer on 400$ invoice when money moved from US bank to mine (Slovenia, EU). My bank assures me they didn’t skim anything.

                                                                                  In practice I don’t really care which bank in chain did (apparently 40$ was taken by sender’s bank and other 40$ somewhere along the way), but it did make PayPal’s rate comparatively wonderful.

                                                                              1. 7

                                                                                Excellent writeup, as always.

                                                                                But, why? Why would I want 50 containers on the same machine?

                                                                                I’m always put in mind of this image whenever reading about the trials and tribulations of Docker et al.

                                                                                1. 4

                                                                                  But, why? Why would I want 50 containers on the same machine?

                                                                                  The most efficient way to do things is to have massive machines with many many jobs on them. This is as it reduces resource overhead (think of all the management daemons that run per machine), makes scheduling easier and allows for over-provisioning.

                                                                                  1. 6

                                                                                    This makes great sense if you’re the one renting out those massive machines–but again, I’m curious about the developers and products themselves.

                                                                                    Would we make these same decisions if there wasn’t a billion-dollar industry predicated on developer outreach and education/advertising/brainwashing pushing this as a solution?

                                                                                    I’m seeing technology like the Erlang runtime and wondering if maybe people should just write on less bloated platforms. Maybe that could solve the efficiency problem–50 Ruby VMs is hardly more efficient than a 50 C programs, for example.

                                                                                    1. 5

                                                                                      For us, 50 containers on the same machine are 50 different builds running 50 different test suites. We have a cluster that runs tests in parallel, in containers, across many machines. I can’t imagine getting anything done if I had to run all those tests locally.

                                                                                      People also use sorts of Linux or macOS (with all sorts of Linux VMs). Now the build team doesn’t have to support all the different setups. They also don’t have to do flag days as much, they just update the build container and move on.

                                                                                      1. 3

                                                                                        I run a couple apps in containers on a DO box, each of which shells out to a few other programs written in a variety of languages (ruby, c, node, java). None of these have to care what OS they’re running on or if the packages for those programs are available, or where those binaries are going to be installed to. I just ship the container and I’m good.

                                                                                        Shipping each app with it’s own isolated environment seems beautiful to me, and I don’t see why you wouldn’t want such an abstraction. It’d be great if all the programs I depended on were all written as libraries for the language I wrote my webapp in, but they weren’t.

                                                                                        1. 3

                                                                                          The current popular implementation of containers is strictly less useful than packages as they are OS specific. As someone who does not run Linux, using containers is a pain for me so I have to run a Linux VM to run the container. Whereas the source artifacts I could build for my OS. So containers have taken OS-independent languages like Python, Ruby, and Java and made OS-dependent artifacts. Personally, I think our lives would be much much simpler if we standardized on a package manager like Nix rather than standardized on a binary artifact such as a container.

                                                                                        2. 2

                                                                                          I’m seeing technology like the Erlang runtime and wondering if maybe people should just write on less bloated platforms.

                                                                                          For the sake of clarity, do you think Erlang is bloated, or an example of a non-bloated platform?

                                                                                      2. 3

                                                                                        I’ve come around to the idea of containers, now that Kubernetes exists.

                                                                                        The idea is that you ought to be able to submit a program, bounded with memory and disk and bandwidth expectations, and have that program encapsulated in such a way that it is trivial to operate (and ideally has no or few state expectations). Then, scaling and operation become super easy.

                                                                                        It’s not for every use case, but it’s a pretty nice sweet spot with acceptable tradeoffs if you have lots of independent services, large scale, generally stateless operation (or have magic, like Google does), many different development teams and/or styles, an unknown scaling requirement on almost every service, enough professionalism to write and keep API contracts, and an ops team that really doesn’t want to deal with your nonsense and wants a rollback button (and/or wants you to deploy and undeploy your own code without blowing up the world).

                                                                                        The big problem is state; Google’s datacenters are literally magic, so they’ll be fine. Everyone else, currently, that’s a problem.

                                                                                        1. 11

                                                                                          Those all sound like problems that should be being solved at the application layer, and perhaps with better support at the OS layer.

                                                                                          My current impression–and please do correct me if I’m wrong!–is that containers are nowadays basically tacit acknowledgements that, because people lack the discipline/skill/tooling to write fairly self-contained blobs, we need to provide an artificial boundary sack to hold all their garbage in to reduce the ops headaches they would otherwise be creating.

                                                                                          It would seem to me that the sustainable, long-term solution would be to have developers that understand how not to pull in a clowncar of dependencies (cough node ruby cough) and how to use the tools that their systems provide them out of the box for managing quotas/networking/filesystems and so forth. Of course, that’s expensive, and there isn’t money to be made by just telling devs to RTFM.

                                                                                          1. 14

                                                                                            better support at the OS layer

                                                                                            That’s what containers are.

                                                                                            because people lack the discipline/skill/tooling to write fairly self-contained blobs

                                                                                            How do I ensure they are actually correctly self-contained? “Be more disciplined” sounds suspiciously like “don’t write bugs”. Containers are to multi-tenant deployment as immutable data structures are to multi-threaded programming.

                                                                                            It would seem to me that the sustainable, long-term solution would be to have developers that understand how not to pull in a clowncar of dependencies (cough node ruby cough)

                                                                                            This is irrelevant. Both node and ruby support full namespace isolation of dependencies, without containers. I could install every version of every rubygem on my laptop and my only problem would be lack of disk space.

                                                                                            how to use the tools that their systems provide them out of the box for managing quotas/networking/filesystems

                                                                                            The advantage of overlayfs was already discussed in Julia Evans' post. As for quotas, containers aren’t really about quotas as much as homogenous deployment tooling and safe multi-tenancy. Quotas are a stability afterthought—so much of an afterthought that they aren’t even that well implemented in public container projects.

                                                                                            1. 1

                                                                                              As someone who does not run Linux, the modern container movement is a regression for me. With source artifacts, I could build packages for my OS (FreeBSD) and start jails to deploy them on for development. With modern containers, I need to run a Linux VM to run an opaque binary artifact just to do development. This makes my environment more complex rather than simpler. Oh well, I’m about a year or two from just being a curmudgeon yelling at people to get off my OS lawn :)

                                                                                              1. 1

                                                                                                With source artifacts, I could build packages for my OS (FreeBSD) and start jails to deploy them on for development.

                                                                                                Jails are another name for containers, if you squint a little.

                                                                                                1. 1

                                                                                                  Yes, they are containers (heck, the first containers). However my point was specifically about how containers are working out right now, which are Linux-specific binary artifacts. At best, one has to emulate a Linux system to run a a docker image.

                                                                                                  1. 1

                                                                                                    Docker is horrible and linux-specific. That’s a contingent fact about docker, not a fundamental issue with containerization.

                                                                                                    1. 1

                                                                                                      Yes, I agree. My wording was not clear it seems. But “modern container movement” I’m referring to docker and friends.

                                                                                                      1. 1

                                                                                                        I believe we’ll have a standard image format soon enough, but you’ll still need to put in the effort to build containers targeting the BSDs, since they presumably won’t run linux binaries. Well, I guess freebsd has emulation, but I’m not sure what the state of it is. Dragonfly and OpenBSD don’t, though I don’t think OpenBSD has jails to begin with.

                                                                                                        Jails also unfortunately lack the granularity of namespaces but I don’t know how huge of a deal that is.

                                                                                              2. 4

                                                                                                Oh let’s do be totally clear: the container ecosystem today is an ad hoc, informally-specified, bug-ridden, slow implementation of half of Erlang and a tenth of Unix. That said, the number of fellow Erlang programmers I’ve ever physically met could fill a car, so pragmatically there’s something to be said for providing the good guarantees and affordances of Erlang in a form that even a node.js programmer can probably not fuck up.

                                                                                                1. 2

                                                                                                  It would seem to me that the sustainable, long-term solution would be to have developers that understand how not to pull in a clowncar of dependencies (cough node ruby cough)

                                                                                                  There are a few problems with this part of your statement:

                                                                                                  1. The cost of switching languages at an established tech company is measured in millions. It would take any company years to retool, retrain or rehire talent, all at the expense of not developing new features to pull something like that off. Also, good luck convincing anyone in management that that’s worthwhile.
                                                                                                  2. Using an abstraction to conveniently package applications for multi-tenancy and to alleviate dependency hell solves actual problems. If you work at a company with thousands of apps, dependency conflicts are (almost - monorepos help solve this) inevitable. If you’re relying on OS packages, multi-tenancy is much harder.
                                                                                                  3. This argument sounds very anti-library. While the dependency sprawl of node is inarguable, going the opposite direction and re-implementing and embedding the same set of functions in every project in order to be self-contained is absurd, and hard to maintain. Again, at the scale of a large company, it makes much more sense to dump common functions into libraries, or make use of the tens of thousands FOSS libraries out there.

                                                                                                  and how to use the tools that their systems provide them out of the box for managing quotas/networking/filesystems and so forth

                                                                                                  On Linux, containers are partially more or less an abstraction on top of the “modern” way of enforcing quotas

                                                                                                  Outside of work, I’m not really a Linux user. However, what I do love about the modern Linux container movement is the idea of producing a self-contained image that’s easy to distribute, run and apply run-time constraints on. I’d love to see this applied to FreeBSD Jails. I’m not aware of any attempt at an implementation of this, other than the apparently abandoned jetpack.

                                                                                                  1. 2

                                                                                                    Sustainable in what terms?

                                                                                                    It’s stunningly expensive and time-consuming (decades!) to train people up to the level where they can build simple, reliable software with few dependencies. The training process produces a lot of complex, unreliable, useful working software. Is it sustainable to not use it?

                                                                                                    1. 6

                                                                                                      I think you’re correct in pointing out “sustainable” as something in need of better specification.

                                                                                                      For me, the current situation is “unsustainable” because:

                                                                                                      • Small programs (from a user perspective) are requiring more and more infrastructure to support even when it their function is fairly simple and limited.
                                                                                                      • Ever-growing infrastructure is much like ever-growing use of abstractions…you can solve any problem with more tooling/abstraction, except for the problem of too much tooling/abstraction.
                                                                                                      • This increased emphasis on “screw it throw it in a container” makes security patching even more of a nightmare than it would be otherwise.
                                                                                                      • This emphasis on multitenancy is great at transferring wealth from startups to service providers while failing to cultivate useful standalone technical expertise. This only works as long as startups have the funding to slosh around on service providers–which is probably not going to go on forever.

                                                                                                      There are absolutely people (in this thread for example) that use Docker et al for things like multi-tenant builds, which is a use-case for where it makes sense.

                                                                                                      But the unsustainable thing is when people avoid fixing the shittiness of their tooling (Kafka, ML stuff, database deployments, etc.) by shoving it in a container and pretending they don’t have to learn how it works or how to manage it. And there seems, to me, to be a lot of that in the wind these days.

                                                                                                      1. 2

                                                                                                        This increased emphasis on “screw it throw it in a container” makes security patching even more of a nightmare than it would be otherwise

                                                                                                        I have found it surprising that some of the same people who are so opposed to static compilation (with cries about how you would have to download updates for many programs instead of just openssl), somehow are just fine with containers.

                                                                                                        1. 1

                                                                                                          I’m sure there’s more of it than there used to be - tech has grown, fast.

                                                                                                          I’m somewhat less convinced that the fraction of technology in the category you’ve described has grown.

                                                                                                      2. 1

                                                                                                        I don’t understand how else you’re supposed to allow people to run applications in an isolated environment, like AWS.

                                                                                                        1. 4

                                                                                                          Install a package? AWS is already running a VM.

                                                                                                      3. 3

                                                                                                        “The idea is that you ought to be able to submit a program, bounded with memory and disk and bandwidth expectations, and have that program encapsulated in such a way that it is trivial to operate (and ideally has no or few state expectations). ”

                                                                                                        That’s a great description of processes running on QNX microkernel circa 1990’s. They also had a desktop on a floppy + self-healing to a degree. Something tells me there’s some middle ground between that and how container platforms are doing same thing with many MB of software. Later on, researching along lines of separation kernels, TU Dresden’s TUDOS demonstrator let me run a new user-mode VM a second for Linux on top of L4 microkernel on crappy hardware. Fire them up as fast as I can click them. That’s just an academic prototype. Real-world stuff from RTOS vendors let you do Linux, Ada runtimes, Java subsets, C apps, and so on with same, efficient microkernel with autogenerate communication stubs for making them work together. Ada was safest but C and Java parts can support all the mainstream stuff. Also, vendors like Aonix supported lightweight VM’s with AOT compilation into self-contained ROM’s you can link in.

                                                                                                        I’m with angersock in at least how the modern stuff looks like giant piles of crap on top of piles of crap compared to even 90’s version of better architectures. Let’s not even talk about security where modern stuff’s security for multi-tenancy is like a joke compared to GEMSOS or KeyKOS from the 1980’s or Karger’s security kernel for VAX VMM from 90’s. I’m disagreeing with angersock in that it can be a wise solution to chose one of these piles if you’re working with many other piles in a way that needs isolation and management that can’t be easily done with better architecture. If you have to.

                                                                                                        1. 3

                                                                                                          I was thinking about this as I rode my bike this morning: how many abstractions or tools in software development push simplicity or understanding in only 1 direction? For example, containers can allow one who has the discipline to build simple systems and deploy them. They also allow a less disciplined engineer to make piles upon piles. The best example of a tool that only the needle in one direction, and I could be wrong, is a non-turing complete type system. As complex as any type system is, it is always a matter of understanding a static formalism and the language comes with a tool to test if mutations of the application of the type system are still correct. Personally, I think turing completeness is a bad thing and I hope in 100 years we look back at all these turing complete languages as a misstep.

                                                                                                          1. 1

                                                                                                            “The best example of a tool that only the needle in one direction, and I could be wrong, is a non-turing complete type system.”

                                                                                                            It’s one. These are straight-forward to apply and understand. John Nagle gave another example when the DAO got hit: decision tables. Pointed out both humans and computers could understand that formalism while it was decidable. There’s API’s that are also easy to use correctly. Ethos project investigates those for security. One last example is DSL’s. That leads to next one.

                                                                                                            “ Personally, I think turing completeness is a bad thing and I hope in 100 years we look back at all these turing complete languages as a misstep.”

                                                                                                            Maybe. There’s a tension between expressiveness and decidability. The DSL’s were able to productively express the problem in a concise way that’s easy to work with. They became a nightmare, esp integrating them, when the problem was beyond the DSL. The Turing-complete languages can handle all of it but can be a nightmare anyway. One middle ground I like is powerful languages with DSL support where you create Turing-incomplete DSL’s for each problem whose underlying properties are similar and integrate well. sklogic’s toolkit on Github’s combinatorylogic page is an example where he wrote a LISP compiler framework then DSL’s for it, including Standard ML. He might code in it in strong typing and restricted expressiveness unless the problem is too hard to solve that way. Then, he can fall back on LISP. If it’s XML or whatever, he’s got a DSL for that too that makes it easy. And so on. We’re seeing something similar occurring with Haskell DSL crowd but theoretically even easier analysis of those.

                                                                                                            So, Turing Complete may not be a bad thing so much as overusing the concept. Perhaps a Turing-complete foundation that we always constrain just enough to express current problem makes more sense. Heck, that even sounds like POLA pattern from my field. ;)

                                                                                                        2. 1

                                                                                                          Google’s datacenters are literally magic

                                                                                                          FWIW Google’s datacenters all run on containers.

                                                                                                        3. 3

                                                                                                          If containers are a zero-cost abstraction, is it any different from running 50 programs on your machine?

                                                                                                          My understanding is that containers are reaching zero-cost, so being able to, say, easily deploy multiple programs with different deployment requirements easily seems like a major win for ops.

                                                                                                          Everything’s a tradeoff, of course. But getting a consistent environment is still a major difficulty due to how Unices deal with so many things like fonts, localisation, etc (Env variables! conf files! daemon behavior!). Being able to fix everything about a program’s environment is awesome.

                                                                                                          Try using any PDF rendering tool in an undefined environment and getting consistency of output.

                                                                                                          In a world where you rely on software written by other people, being able to isolate those programs and define their dependencies is very helpful! At least in theory.

                                                                                                        1. 1

                                                                                                          This is all interesting, and I’m glad it’s documented because some of the gotchas are worth remembering, but I fear this often gets oversimplified and people run away from using rdtsc even when it would work just fine. Like the possibility that the cycle counter on a newly attached hotplug CPU may not be synced is something that has never once concerned me, but it’s the kind of objection that always seems to turn up.

                                                                                                          1. 1

                                                                                                            On a platform with a gethrtime() or clock_gettime() that doesn’t need a context switch to operate, there’s precious little need to muck around with the TSC directly at all. Despite what the article says, I struggle to imagine an application that needs this operation to be faster than the 15-40ns it takes on a modern system.

                                                                                                            1. 1

                                                                                                              If you’re doing instrumentation, it’ll be around 10-15ns to increment a counter in the most efficient thread-safe ways. If you add 15-40ns because it’s timing something, that’s quite a hit in an inner loop - though you’d need to be running it millions of times per second for it to matter.

                                                                                                          1. 11

                                                                                                            I’ve been playing a good bit of multiplayer Factorio the past week.

                                                                                                            Imagine the fully software development cycle, but in the form of a physical factory. There are factory factories and factory constructors as you get to the late game.

                                                                                                            1. 4

                                                                                                              I just launched my first rocket in factorio a few weeks ago and wow it felt like a real journey. I had a huge rail system set up to try and keep up with the plastic and sulfur needs of my assembly lines and had at least two power outage crises featuring bug swarms halfway through

                                                                                                            1. 9

                                                                                                              I never use regex unless I can’t figure out a way to do it just with string manipulation functions, and although it isn’t because of stories like this (it’s mostly just because I’m bad at regex), they sure do make me feel vindicated in doing this :)

                                                                                                              1. 5

                                                                                                                “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”

                                                                                                                jwz

                                                                                                                1. 2

                                                                                                                  For Prometheus we’ve got several places where we’re dealing with effectively unstructured data (or at least it has a structure that’s unknown to us) but it’s important that users be able to extract that data. So we fall back to regexes.

                                                                                                                  The regexes themselves tend not to be the problem, lots of people don’t get anchoring though.

                                                                                                                  1. 4

                                                                                                                    If you’re running user-supplied regexes on arbitrary data (using popular “regex” implementations - it’s not an issue with actual regular expressions) you’ll hit this issue sooner or later. Make sure you do the right thing when your users' regexes run for exponential time.

                                                                                                                    If what you want is to allow users to run Turing-complete code on the data (which is what you’re doing, via a rather inefficient encoding, with popular regex implementations), maybe consider embedding a popular/standardized scripting language? That way users at least have the option of using more maintainable approaches (e.g. parser combinators).

                                                                                                                    1. 4

                                                                                                                      We’re using RE2 which doesn’t have this issue.

                                                                                                                      If what you want is to allow users to run Turing-complete code on the data

                                                                                                                      We explicitly don’t want that, we’re only looking for data extraction. If a user’s metrics/services/machine taxonomy is so complex that they need a Type-3 grammar to handle it, they have bigger problems and we’ll point them towards our various plugin interfaces to let them code it up themselves in a language of their choice.