1. 4

    This week will mostly be busy with job interviews, though I’d like to finish implementing an rtpMIDI backend implementation for MIDIMonster. Ideally with native support for the AppleMIDI session protocol, though the whole specification is…. not that good.

    Other than that, my to-do list contains a whole host of items, including publishing a new article on my homepage that has been in the works for long enough now.

    1. 7

      In preparation for tax season, I’m tweaking my plaintext accounting setup. I recorded all of 2017: every transaction that touched my bank or credit cards is queryable. I’m switching processors for 2018 to a setup that’s already saved me about two hours per month. I’d love to get to a point where I can do make irs1040 and it spits out the content of all of the boxes that my transaction data can reflect.

      I’ve already got make networth and make cashflow working, complete with nice graphs in the terminal courtesy of iTerm2’s imgcat.

      I’m hoping to generalize my setup so that I can release it eventually. I’ve found very few examples of workflows so I want to contribute mine to the world so that the plaintext accounting ecosystem is more approachable.

      1. 2

        Some friends and me collected quite a few interesting ledger-cli graphs (using GNUplot and a pie-charting tool of mine) on Github: https://github.com/cbdevnet/ledger-reports

        Though some of them break when using them with multiple commodities… Might have to get around fixing that some time after my exams :)

        1. 2

          This looks really awesome! Any chance you could throw up some example images?

          1. 1

            Good suggestion, thanks! I’ve just opened an issue for that. As I’m not quite comfortable with using my own data for presentation purposes, this will include generating a somewhat-random example ledger file and generating the demo statistics from that :)

        2. 1

          I’d be very interested in seeing your setup when you’re done. The notion of using a CLI program for doing this sort of accounting intrigues me a bit.

          1. 3

            A good start is to watch a recording of one of my talks and those of some others who have produced some content about it: https://www.youtube.com/results?search_query=plaintext+accounting&page=&utm_source=opensearch

            I’m probably still a few months away from having my stuff sufficiently abstracted.

            1. 1

              “The notion of using a CLI program for doing this sort of accounting intrigues me a bit.”

              It was the default on most cash registers of big chains for a long time. The ones in my area I shop at just now all got the graphical apps. It was mainframe, terminal-style stuff on backend for some. So, CLI on each side. Here is one for UNIX I found when I looked up that stuff.

              1. 3

                There’s a difference between CLI and TUI. Most cash registers I’ve seen were TUI, IIRC.

                Still, there’s a certain Spartan flavor to TUIs, such that they can be quite efficient for their tasks.

                1. 1

                  Ok, yeah, TUI was more what I was thinking of. The ones I dealt with were usually more efficient than their GUI replacements. Way more reliable, too.

            1. 3

              I actually think that the creation of many “new” languages in an effort to make programming more accessible to the “masses” backfired in a sense:

              Yes, it has become harder to decide “which language to start with”, as everyone stands at the ready to put forth his or her own favorite toy of the moment. The push for languages to be more easily learnable has also introduced a lot of abstraction.

              This can be, on the one hand, a good thing: Things that “feel” easy to do (for example, getting the current exchange rate of Bitcoin), are just an import away. On the other hand, it hides a lot of the complexities that these things rely on, in some cases actively preventing people from understanding what it is these APIs and frameworks actually do.

              In consequence, few of the people using these constructs would be able to implement them themselves, forcing them into a kind of learned helplessness: if there is no module for it, it’s obviously impossible. This finds its culmination in things like the famous left-pad package.

              When people feel at home in an ecosystem, few tend to leave it. This eventually leads to languages and environments that were once meant for learning increasingly becoming more full-featured (there’s nothing more permanent than a temporary fix….), leading to things like full-fledged applications in, eg., scratch. The lack of formal education in things like software security and software engineering for most of these “new programmers” tends to further increase this territorial behavior.

              This also creates a kind of class system, where programmers using “real” languages look down, for some reason or other, on people staying within these ecosystems. As it becomes less important to know exactly how the computer executes code in order to write it, fewer people care about how to actually work with their systems. This may be where the feeling that “Learning to program is getting harder” comes from. And the fact of the matter is, even though these abstracted environments (eg. the “cloud”) are important, someone is still going to have to create the tools that get you there: Operating systems, Browsers, Firmware on switches, routers, etc. Not everything can be taught with cloud-based REPLs.

              If someone just wants to learn to program, they shouldn’t have to learn system administration first.

              If someone just wants to learn to program, they shouldn’t have to learn operating system concepts first.

              These are the core points of the article that I disagree with. I think that learning how to express thoughts in code (which is what is being adressed very well by the “new” languages) is only one part of programming. Learning how the system works, how to interact with eg. the command line in some form or other, how the environment finds its files, and even how file formats work are also important parts that often get left behind as the level of abstraction increases (up to the cloud, where nothing matters anymore and everything is an abstract resource).

              Disclaimer: I may be wrong. These are just my feelings on the matter.

              1. 4

                Currently preparing for my last two university exams and searching for a future job on the side.

                As for software, recently implemented a miniature domain-specific language for our window-manager-manager (https://github.com/fsmi/rpcd), which is now being used to automate our signage display windows. Up next is a custom wireless network infrastructure monitoring application (basically tracking WiFi clients via their access point association, regardless of vendor).

                1. 3

                  Do you have any idea how it compares (performance-wise) to USBIP (http://usbip.sourceforge.net/ also in the Linux Kernel)?

                  1. 3

                    No, we actually did no formal performance benchmarks, but we use it quite heavily for multiplayer emulation games, and so far everyone was very satisfied with latency, responsiveness and general controller-feel. Very interesting point though, I’d like to do that some time!

                    One feature that I imagine is hard to do with USBIP is filtering the input - since we don’t want people using their keyboard/mouse on the gaming rig, we filter for the gamepad axes and buttons, which works a treat!

                    In addition, we can use non-USB devices on the client (e.g. PS/2) and even non-devices (the repo contains the slightly out-of-date osc-xlater, which translates a set of OSC messages sent by eg. a smartphone into a gamepad emulation). I also believe our approach requires fewer privileges in comparison to USBIP, as we can run with user privileges for both components if configured correctly.

                  1. 3

                    Perhaps the “human-friendly” /proc/$PID/status might in fact be the more robustly-parseable option – it escapes linefeeds (the relevant separator) in argv[0] (and includes ppid, for what it’s worth).

                    1. 1

                      True, but it also incurs a parsing overhead in having to match strings and requiring to read a lot more data. I didn’t test this, but I could see the same problem coming up with executable names with newlines in them.

                      1. 4

                        I could see the same problem coming up with executable names with newlines in them.

                        Except that, as I said, newlines (a.k.a. linefeeds) are escaped: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/array.c?id=c643401218be0f4ab3522e0c0a63016596d6e9ca#n98

                        1. 1

                          You’re right, thanks. Parsing status would indeed sidestep that problem.

                    1. -1

                      [Title] /proc/<pid>/stat is broken

                      This sounds serious! Is the content of the pseudo-file associating incorrect PIDs or parent PIDs to processes?

                      Let’s continue…

                      Documentation (as in, man proc) tells us to parse this file using the scanf family, even providing the proper escape codes - which are subtly wrong.

                      So it’s a documentation issue…

                      When including a space character in the executable name, the %s escape will not read all of the executable name, breaking all subsequent reads

                      I have literally never encountered an executable with a space in the name, although it’s perfectly legal from a file name perspective. (I’ve been a Linux user since 1998).

                      The only reasonable way to do this with the current layout of the stats file would be to read all of the file and scan it from the end […]

                      So… let’s do this instead?

                      The proper fix (aside from introducing the above function) however should probably be to either sanitize the executable name before exposing it to /proc//stat […]

                      Sounds reasonable to me.

                      […], or move it to be the last parameter in the file.

                      Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?

                      This problem could potentially be used to feed process-controlled data to all tools relying on reading /proc//stat

                      I can’t really parse this. Do you mean “affect” instead of “used”?

                      In conclusion: I can’t see any evidence of the functionality of this proc pseudo-file being “broken”. You have encountered an edge case (an executable name with a whitespace character in it). You’ve even suggested a workaround (scan from the end). If you had formulated this post as “here’s a workaround for this edge case” I believe you would have made a stronger case.

                      1. 5

                        I have literally never encountered an executable with a space in the name

                        Well, tmux does this, for example. But my primary concern is not has it ever happened to me but, if it happens, what will my code do?. As this is a silent failure (as in, the recommended method fails in a non-obvious way without indicating failure), no action is taken by most implementations to guard against this. That, in my mind, counts as broken, and the least thing to do is to fix the documentation. Or expose single parameters in files instead of a huge conglomeration with parsing issues. Or… see above.

                        So… let’s do this instead?

                        I do, but only after I got sceptical while reading the documentation, ran some tests and had my hunch confirmed. Then I checked to see others making that very mistake.

                        Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?

                        No, I don’t think so - except for introducing single-value files (and leaving /proc/<pid>/stats be as it is).

                        This problem could potentially be used to feed process-controlled data to all tools relying on reading /proc//stat

                        I can’t really parse this. Do you mean “affect” instead of “used”?

                        Admittedly, English is not my first language, I do however think that sentence parses just fine. The discussed problem (which is present in several implementations based on the documentation), can potentially be used to inject data (controlled by the process, instead of the kernel) into third-party software.

                        In conclusion: I can’t see any evidence of the functionality of this proc pseudo-file being “broken”.

                        That depends on your view of broken - if erroneous documentation affecting close to all software relying on it with a silent failure does not sound broken to you, I guess it is not.

                        You have encountered an edge case (an executable name with a whitespace character in it).

                        I actually did not encounter it per se, I just noticed the possibility for it. But it is an undocumented edge case.

                        You’ve even suggested a workaround (scan from the end).

                        I believe that is good form.

                        If you had formulated this post as “here’s a workaround for this edge case” I believe you would have made a stronger case.

                        Maybe, but as we can see by the examples of recent vulnerabilities, you’ll need a catchy name and a logo to really get attention, so in my book I’m OK.

                        1. 1

                          Thanks for taking the time to answer the questions I have raised.

                          The discussed problem (which is present in several implementations based on the documentation), can potentially be used to inject data (controlled by the process, instead of the kernel) into third-party software.

                          Much clearer, thanks.

                          On the use of “broken”

                          I’m maybe extra sensitive to this as I work in supporting a commercial software application. For both legal and SLA[1] we require our customers to be precise in their communication about the issues they face.

                          [1] Service level agreement

                          1. 1

                            Followup: can you give a specific example of how tmux does this? I checked the running instances of that application on my machine and only found the single word tmux in the output of stat files of the PIDs returned by pgrep.

                            1. 2

                              On my Debian 9 machine, when starting a tmux host session, the corresponding /proc/<pid>/stat file contains:

                              2972 (tmux: client) S 2964 2972 2964 […]

                        2. 3

                          “Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?”

                          I will never get the 100ms it took to read this sentence back….

                          1. 1

                            I dunno, maybe just duplicate the information at the end of the current format, in the author’s preferred format, and delimited by some character not otherwise part of the spec.

                            It’s not trivial, though.

                            That was my point.

                          2. 1

                            this was clearly overlooked when the api was designed, nobody is parsing that file from the end and nobody is supposed to

                            1. -1

                              What was overlooked? That executables can have whitespace in their names?

                              I can agree that this section of the manpage can be wrong (http://man7.org/linux/man-pages/man5/proc.5.html, search for stat):

                              (2) comm  %s
                                  The filename of the executable, in parentheses.
                                  This is visible whether or not the executable is
                                  swapped out.
                              

                              From the manpage of scanf:

                              s: Matches a sequence of non-white-space characters; the next
                                  pointer must be a pointer to the initial element of a
                                  character array that is long enough to hold the input sequence
                                  and the terminating null byte ('\0'), which is added
                                  automatically.  The input string stops at white space or at
                                  the maximum field width, whichever occurs first.
                              

                              So it’s clear no provision was made for executables having whitespace in them.

                              This issue can be simply avoided by not allowing whitespace in executable names, and by reporting such occurrences as a bug.

                              1. 8

                                This issue can be simply avoided by not allowing whitespace in executable names, and by reporting such occurrences as a bug

                                Ahhh, the Systemd approach to input validation!

                                Seriously, if the system allows running executables with whitespace in their names, and your program is meant to work with such a system, then it needs to work with executables with whitespace in their names.

                                I agree somewhat with the OP - the interface is badly thought out. But it’s a general problem: trying to pass structured data between kernel and userspace in plain-text format is, IMO, a bad idea. (I’d rather a binary format. You have the length of the string encoded in 4 bytes, then the string itself. Simple, easy to deal with. No weird corner cases).

                                1. 1

                                  I agree it’s a bug.

                                  However, there’s a strong convention that executables do not have whitespace in them, at least in Linux/Unix.[1]

                                  If you don’t adhere to this convention, and you stumble across a consequence to this, does this mean that a format that’s been around as long as the proc system is literally broken? That’s where I reacted.

                                  As far as I know, nothing crashes when you start an executable with whitespace in it. The proc filesystem isn’t corrupted.

                                  One part of it is slightly harder to parse using C.

                                  That’s my take, I’m happy to be enlightened further.

                                  I also agree that exposing these kind of structures as plain text is arguably … optimistic, and prone to edge cases. (By the way, isn’t one of the criticisms of systemd that it has an internal binary format?).

                                  [1] note I’m just going from personal observation here, it’s possible there’s a subset of Linux applications that are perfectly fine with whitespace in the executable name.

                                  1. 3

                                    I agree with most of what you just said, but I myself didn’t take “broken” to mean anything beyond “has a problem due to lack of forethought”. Maybe I’m just getting used to people exaggerating complaints (heck I’m surely guilty of it myself from time to time).

                                    It’s true that we basically never see executables with a space (or various other characters) in their names, but it can be pretty frustrating when tools stop working or don’t work properly when something slightly unusual happens. I could easily see a new-to-linux person creating just such an executable because they “didn’t know better” and suffering as a result because other programs on their system don’t correctly handle it. In the worst case, this sort of problem (though not necessarily this exact problem) can lead to security issues.

                                    Yes, it’s possible to correctly handle /proc/xxx/stat in the presence of executables with spaces in the name, but it’s almost certain that some programs are going to come into existence which don’t do so correctly. The format actually lends itself to this mistake - and that’s what’s “broken” about it. That’s my take, anyway.

                                    1. 2

                                      Thanks for this thoughtful response. I believe you and I are in agreement.

                                      Looking at this from a slightly more usual perspective, how does the Linux system handle executables with (non-whitespace) Unicode characters?

                                      1. 3

                                        Well, I’m no expert on unicode, but I believe for the most part Linux (the kernel) treats filenames as strings of bytes, not strings of characters. The difference is subtle - unless you happen to be writing text in a language that uses characters not found in the ASCII range. However, UTF-8 encoding will (I think) never cause any bytes in the ASCII range (0-127) to appear as part of a multi-byte encoded character, so you can’t get spurious spaces or newlines or other control characters even if you treat UTF-8 encoded text as ASCII. For that reason, it poses less of a problem for things like /proc/xxx/stat and the like.

                                        Of course filenames being byte sequences comes with its own set of problems, including that it’s hard to know encoding should be used to display filenames (I believe many command line tools use the locale’s default encoding, and that’s nearly always UTF-8 these days) and that a filename potentially contains an invalid encoding. Then of course there’s the fact that unicode has multiple ways of encoding the exact same text and so in theory you could get two “identical” filenames in one directory (different byte sequences, same character sequence, or at least same visible representation). Unicode seems like a big mess to me, but I guess the problem it’s trying to solve is not an easy one.

                                        (minor edit: UTF-8 doesn’t allow 0-127 as part of a multi-byte encoded character. Of course they can appear as regular characters, equivalent to the ASCII).

                                        1. 1
                                          ~ ❯ cd .local/bin
                                          ~/.l/bin ❯ cat > ą << EOF
                                          > #/usr/bin/env sh
                                          > echo ą
                                          > EOF
                                          ~/.l/bin ❯ chmod +x ą 
                                          ~/.l/bin ❯ ./ą
                                          ą
                                          
                                      2. 2

                                        If you don’t adhere to this convention, and you stumble across a consequence to this, does this mean that a format that’s been around as long as the proc system is literally broken?

                                        Yes; the proc system’s format has been broken (well, misleadingly-documented) the whole time.

                                        As you note, using pure text to represent this is a problem. I don’t recommend an internal, poorly-documented binary format either: canonical S-expressions have a textual representation but can still contain binary data:

                                        (this is a canonical s-expression)
                                        (so "is this")
                                        (and so |aXMgdGhpcw==|)
                                        

                                        An example stat might be:

                                        (stat
                                          (pid 123456)
                                          (command "evil\nls")
                                          (state running)
                                          (ppid 123455)
                                          (pgrp 6)
                                          (session 1)
                                          (tty 2 3)
                                          (flags 4567)
                                          (min-fault 16)
                                          …)
                                        

                                        Or, if you really cared about concision:

                                        (12345 "evil\nls" R 123455 6 1 16361 4567 16 …)
                                        
                                    2. 3

                                      nobody is parsing that file from the end

                                      As an example the Python Prometheus client library uses this file, and allows for this.

                                1. 4

                                  On my PC at least, the process name has parens around it, so you can parse it with the Lua pattern %b(). Fishing out the parent PID would be like %S+ %b() %S+ (%S+).

                                  I don’t think the problem here is that /proc/pid/stat is broken, rather the string parsing tools in libc are not good enough and shouldn’t be used.

                                  The OpenBSD guys already ripped the patterns code out of Lua for their web server (patterns.c/.h), so it’s simple to add it to your own code base.

                                  1. 9

                                    Assuming the rules for process names is the same as the rules for file names, a process name can contain unbalanced parens. What if a process is named for example hello) world? The string in /proc//stat would be (hello) world), and %b() would just find (hello). This doesn’t seem like something which could be fixed by smarter parsing, other than by reading from the end.

                                    1. 2

                                      Oh good point, %((.*)%) works, assuming they don’t add parens to any other fields.

                                    2. 2

                                      As noted in the original post, even using the parentheses does not guard against this. The only thing I can think of to safely use the current format is scanning the whole file into a buffer and searching for the last closing parenthesis, then taking everything from the first opening parenthesis to the last as executable name.

                                      This is also not specific to C or libc, this format is bad to parse (and the mistake easy to make) with any language.