Threads for philpennock

  1. 2

    That simple script won’t reliably exit if preconditions fail, because the exit 2 is conditional upon the echo to stderr succeeding. So running with 2>&- to close stderr will cause the fail path to fail to successfully exit.

    1. 1

      Indeed. One of several simplifications (like excluding the variables) made to reduce line count. I wondered who’d call it out :)

    1. 2

      Do not embed your business logic in the database.

      1. 19

        Only a Sith deals in absolutes. (:

        What about tools like PostgREST that allow you to embed all the business logic in the database?

        1. 9

          Why not?

          1. 8

            There are conflicting opinions on this.

            Some people think they should push everything into the DB and the UI basically equates to pretty eye candy and user experience(UX).

            Some people prefer a middle ground, and some people think all the business logic should live outside the DB.

            I personally don’t think there is any one right answer. I think it depends on where the API boundary is, which mostly depends on the application.

            If you have a need/desire to give end users DB access, then you almost certainly want a lot(if not all) business logic in the DB.

            If you treat the DB as nothing more than a convenient place to store some data, then putting business logic there is stupid.

            Most databases can support any/all of these options.

            1. 5

              zie’s answer is good. Another perspective: single responsibility for microservices.

              You don’t reimplement logic in multiple apps writing to the same store. So either you have a microservice dedicated to providing the API needed while remaining the single-source-of-truth, or you have business logic in the RDBMS so that the RDBMS effectively embeds that microservice.

              And then it’s a question of available skillsets of your staff, and how many people are good at debugging stored procedures vs issuing distributed traces, etc.

              It’s all trade-offs.

              1. 5

                so that the RDBMS effectively embeds that microservice

                That’s an awesome way to describe the approach that I haven’t heard before. It sounds like it could also be used to make some db engineers twitch when I refer to their stored procedures repo as a microservice. Great! :-)

                1. 4

                  With a custom wire RPC format no less.

                2. 1

                  I agree with this as a valid perspective.

              2. 4

                The example given wasn’t really business logic, right? Just an abstraction over normalization?

                1. 3

                  It’s not business logic, it’s pretty much like a database view, which certainly belongs into a database. I think abstractions for data belong close to the data. That also means that you can change, replace, the thing that interacts with the data, the actual business logic.

                  One can be cut by this easily, when designing database schemas to close to a framework, ORM, etc. only for things to change and having a horrible migration path, potentially having to replicate very unidiomatic behavior.

                  So I’d argue, data and data representation doesn’t belong in your business logic.

                  Or at least for keeping logic and data separate, which you don’t do if you essentially build parts of the schema or views in your business logic.

                  1. 2

                    …unless you can put all the logic in there. …or, you need to do selects on that data. …or, other business units have views into your schema …or, you have compliance requirements to do so

                    etc

                    1. 7

                      The author is confused by the distinction between the SASL framework for authentication and one of the authentication mechanisms available in SASL. They are describing a salted mechanism (referring to the SASL SCRAM RFC so probably SCRAM, but I haven’t double-checked in detail) and presenting it as being SASL.

                      1. 1

                        Actually, I wrote my learning on reading the SASL code on the rust postgres driver. https://github.com/sfackler/rust-postgres/blob/master/postgres-protocol/src/authentication/sasl.rs

                        Probably, I’m wrong. Let me verify and correct it.

                        1. 5

                          Yep your right.

                          Love this community. I’ll correct it.

                        1. 3

                          I make k a function which inspects the args and splices in $KUBE_CONTEXT and $KUBE_NAMESPACE in appropriate places, so that I can use .envrc and direnv(1) to work in the appropriate kubernetes context and namespace for a given app.

                          1. 2

                            God yes. When people see I have that they chuckle, and then after maybe 10 minutes of working in k8s with them, they immediately make the alias too.

                          1. 13

                            The dangerous part here is that it exposes a “what you don’t know will hurt you” issue: a completely different function probably needs to be called to even detect the error.

                            puts() is highly likely to succeed, because it writes to an internal buffer and doesn’t flush. So fflush() is needed to get the failure, so there’s a whole extra conceptual layer which newcomers have to learn, to do with caching, to even know how to check for a failure.

                            1. 4

                              I guess it is a good example, then. C is deceptively simple but full of subtle bugs.

                              1. 3

                                Indeed, I was disappointed that this issue wasn’t even mentioned in the article.

                                For completeness’s sake, an alternative to fflush() is to explicitly call fclose() and check for errors there. However, it’s also possible to just use write() directly, instead of going through buffered I/O:

                                #include <stdlib.h>
                                #include <unistd.h>
                                
                                int main(void)
                                {
                                    return write(1, "hello, world\n", 13) > 0 ? 0 : EXIT_FAILURE;
                                }
                                

                                (However, this is POSIX-defined rather than pure ANSI C. And, of course, a purist would argue that this program should also check for a short positive return value and restart the write() in case it was interrupted. )

                                1. 4

                                  That’s buggy: write() returns the number of bytes written. If you write 5 bytes before failing, then you will exit 0. This is usually seen when stdout is a pipe or socket and you have an incomplete writes not being retried issue.

                                  1. 4

                                    Yes, you are the purist that I explicitly acknowledged in the original post.

                                    1. 2

                                      Somehow I missed the last paragraph. Sorry,

                              1. 3

                                I do this so often that I have a hotkey that pastes the following header:

                                #!/usr/bin/env zsh
                                set -euo pipefail
                                HERE="$(dirname "$(realpath -s $0)")"
                                cd "$HERE"
                                
                                1. 4

                                  FWIW

                                  1. When running bin/oil, set -euo pipefail is on by default. And when running bin/osh, you can also do shopt -s oil:basic which includes that plus some more error handling.
                                  2. It also has $_this_dir because I have the HERE thing in almost every shell script as well.
                                  • So you can do $_this_dir/mytool or cd $_this_dir.
                                  • The variables prefixed with _ are “special” globals that are automatically set by the interpreter

                                  Testing/feedback is appreciated! https://www.oilshell.org/

                                  1. 1

                                    I think you should quote $0, since the path may contain a space. I’d be surprised if shellcheck didn’t complain about this, although I haven’t tested.

                                    1. 1

                                      No, as its zsh, so unless the ~/.zshrc file sets the SH_WORD_SPLIT option, it won’t split on whitespace.

                                      (If only deploying to systems with a modern env, then #!/usr/bin/env -S zsh -f would avoid sourcing files and risks of custom options affecting this)

                                  1. 2

                                    I recently converted a repo where “make” was more task runner than dependency tracker to using task, https://taskfile.dev/.

                                    It’s yet another YAML system, but is clearly scoping to avoid doing programming in YAML: it’s structured and feels more like a Circle or GHActions flow. Except unlike GHActions, it’s shell syntax on every platform because of the use of https://github.com/mvdan/sh to parse the instructions and run them. Having recently had to debug and get GHActions working across a build matrix of platforms, including Windows, I really appreciate this design choice.

                                    1. 2

                                      Bear in mind you don’t need to keep track in memory of all the device:inode pairs for every file seen, unless you have an unusual situation: you just need to look at the link-count of regular files and keep track for those with a link-count greater than 1. So it’s “usually fine”, unless you have people hard-linking entire trees.

                                      In an era with bind/loopback mounts, that’s much less likely to be encountered.

                                      Honestly, the bigger issue will be if you’re trying to rsync across a unified presentation view of the filesystem, where the same underlying FS is bind/loopback-mounted into multiple locations: you’ll no longer have the same information available to detect that this has happened, and the device numbers will be different so existing deduplication will fail. It might mean needing rsync to be aware of OS-specific bind/loopback mechanisms and how they present, and then both unifying/normalizing device numbers and ignoring the link-count and instead tracking every inode encountered.

                                      1. 1

                                        Maybe bind mounts should increment the reported hardlink count of all inodes in both themselves and the filesystem they’re duplicating. (Of course I realise the futility of “maybe everyone should change everything everywhere because of this tiny detail that comes up rarely” :) )

                                        1. 1

                                          If you stat each directory, then you can tell when you’re crossing a mount point (the device changes between the parent and subdirectory). That gives you a place to look for aliased filesystems. With FreeBSD nullfs mounts, the inode numbers are unchanged, but the device node is different for each mount, I believe the same is true for bind mounts on Linux. This means that you need to do something OS-specific to parse the mount table and understand the remapping.

                                          It would be nice if stat could be extended to provide an ‘underlying device’ device node field, so that you could differentiate between the device providing the mapping and the raw filesystem. That would be simpler than incrementing the link count because only nullfs and similar things would need to be modified to provide anything other than the default value for this field.

                                      1. 4

                                        Might want to look at net/textproto in the Go stdlib.

                                        1. 13

                                          While I haven’t actually turned off ipv4, occasionally I’m checking which findings I connect to are not v6 compatible. Turns out that from my daily set, there are two most common groups: GitHub, and sites on providers which support v6 but haven’t configured the entries.

                                          I really wish that big providers like CloudFlare and AWS flipped their steps in manuals at some point. Instead of “here’s an example how to configure your service (with v4 of course), oh btw we do support v6” make it “here’s an example how to configure your service with v6 and v4 addresses” (in that order).

                                          1. 2

                                            I turned up a new service on CloudFlare earlier this month; IPv6 was enabled by default. (New account too, which might affect it?)

                                            1. 1

                                              Yeah, for the ipv6 defaults at least CF is good. And I just looked through their docs - they seem to have both v4 and v6 listed everywhere now, so that’s cool. I’ll stop using them as an example :-)

                                            2. 1

                                              I disagree that v6 should come before v4, since plenty of ISPs (including mine) usually block v6 traffic.

                                              Maybe once a month I’ll notice that I can ping a v6 address; a few hours later, it’s a no-go.

                                              Pressure first needs to go on ISPs to support v6 traffic. Only then can we consider making v6 the “default”.

                                            1. 2

                                              I get 502 bad gateway on mobile and I thought it was satire. And somehow it is, ironic , I guess.

                                              Like a 502, on a bad gateway…

                                              1. 6

                                                Remembering to pass that flag each time you run git blame is a bit too much, but you can set a global ignoreRevsFile in your config. By naming the file .git-blame-ignore-revs consistently, it will be used to skip any useless commits in a repo.

                                                git config –global blame.ignoreRevsFile .git-blame-ignore-revs

                                                Seems like it might be better to not pass –global, and just configure the ignoreRevsFile on a per-repository basis.

                                                1. 4

                                                  The benefit of global is that it automatically works for all repos following the convention. Git doesn’t raise any warning if the file doesn’t exist.

                                                  1. 1
                                                    % git --version
                                                    git version 2.35.1
                                                    % git blame mime.types
                                                    fatal: could not open object name list: .git-blame-ignore-revs
                                                    

                                                    I just had to revert setting it globally because, at least in the current git release, it fatals out instead of performing the blame operation.

                                                    1. 1

                                                      I can swear I tested this before, but it appears that it is in fact fatal. I guess I don’t blame from the CLI often.

                                                      I found a thread discussing making it optional appears not to have gone anywhere: https://public-inbox.org/git/xmqq5ywehb69.fsf@gitster.g/T/

                                                1. 8

                                                  To me, this looks like a great example of the cascading complexity that results from a mismatch between limits imposed by an API (in this case a protocol) and actual hardware limits. If TCP had a 4 byte or 8 byte port range, this would presumably not be an issue. There would be no need to share ports.

                                                  How can we design protocols and APIs that can adapt to increasing hardware capabilities? Perhaps we should always use variable-length integers? There presumably will always be some absurdly large limit that we can safely impose, e.g. the number of atoms in the observable universe (roughly 2^265, or 256 bits for a nice round number), so we don’t necessarily need to allow any arbitrary integer.

                                                  1. 5

                                                    We had one protocol that allowed more flexibility and it didn’t end well - ipv6. All the “flexible” bits are pretty much deprecated because it’s impossible to implement a fast router when you don’t know where the fields (like tcp ports) are.

                                                    I wouldn’t drive conclusion that 16 bits for a port is limited.

                                                    IMO the conclusion is that the BSD sockets api kinda requires for 2-tuple to be locked, while the user doesn’t want that. For connected socket we expect 4-tuple to be locked. It would be nice to have some API that can express that. For tcp we have IP_BIND_ADDRESS_NO_PORT, for udp we don’t have anything plus there is the overshadowing issue

                                                    1. 3

                                                      With IPv6 you don’t even need ports, you could just use the last N bytes of the address as the port :D

                                                      More practically it might actually make sense to “expand” the ephemeral port range into the address, i.e. just use lots of different source addresses under the subnet your machine has.

                                                      1. 3

                                                        This is one of the reasons why it’s good to have a /64. One of the privacy options for IPv6 recommends that you keep a small number of stable IPv6 addresses for incoming connections and periodically cycle the one that you use for outbound connections. With a /64 and SLAAC you can to this more or less by picking a new 64-bit random number. In the most extreme configuration, you pick a new IPv6 address for every outbound connection. This means that a server shouldn’t be able to distinguish between two connections from different machines in the subnet and two from the same machine. This doesn’t help much for home networks (where everyone on the /64 is likely to be in the same family, at least), but is good for networks with more users.

                                                        I believe this is very rarely done in practice because the higher-level protocols (for example, HTTP) provide tracking information (cookies, browser fingerprinting, and so on) that’s so much more useful than IP address that tracking based on IP is of fairly negligible value.

                                                        1. 2

                                                          This is done for DNS resolvers already, it got popular after the security issues around lack of space for entropy in the ID field. You route a /64 to the resolver and tell it to prefer IPv6 and to use that whole /64 as the outgoing source.

                                                          See, eg, Unbound’s outgoing-interface setting.

                                                      1. 2

                                                        One additional pet bug-bear not covered: current PQ is currently computationally expensive and exposing that to the Internet as the first layer is a DoS attack issue, if you’re doing client authentication. We don’t deploy 16384-bit RSA keys or the like, but from what I’ve seen a lot of the PQ seems to be equivalently heavy.

                                                        So for link security, using classic crypto as the first layer protects you against a CPU exhaustion attack because anyone who can cause you to “waste” time on the PQ crypto must already have a CRQC to break your classic crypto.

                                                        When running a web-server, there’s a huge difference between “three intelligence agencies and two corporations can break the classic crypto and we need PQ crypto as a second layer, and they can impose costs on us” and “every script kiddie on the planet can just use a botnet to trivially DoS the server by just starting handshakes”.

                                                        None of this applies to server-only authentication, you’re having to do the session work for every visitor anyway. This is entirely about “the side which verifies” limiting how much work they can be forced to waste.

                                                        1. 3

                                                          Note that ANSI C didn’t require that argc be greater than zero; all the rules about argv interpretation are predicated upon an argv > 0. It’s Unix (POSIX) which adds requirements that argc be greater.

                                                          IIRC, on the Amiga if argc was non-zero you were in a CLI and if argc was 0 then you were launched from Intuition, the GUI, and should use a library call to get the relevant data needed. That was “a neat hack”.

                                                          1. 17

                                                            (I think this is on-topic because it’s using Unix, it’s problem-solving with computers, and it’s statistical analysis and cryptanalysis, so I’m not avoiding Wordle on Lobste.rs).

                                                            Note that the letter distributions for Wordle are quite different from those I see in /usr/share/dict/words; in particular S drops in frequency enough to skew the results significantly. It looks as though Wordle drops all plurals which end S, such that of the 36 remaining ????S words, 21 end SS and 12 end US.

                                                            curl -Lo wordle_words.crlf https://github.com/AllValley/WordleDictionary/raw/main/wordle_solutions_alphabetized.txt
                                                            tr -d $'\r' < wordle_words.crlf > wordle_words
                                                            rm wordle_words.crlf
                                                            LC_COLLATE=C grep -E '^[a-z]{5}$' /usr/share/dict/words > latin_words
                                                            grep -E '^[[:lower:]]{5}$' /usr/share/dict/words | recode -f utf8..flat > normalized_words
                                                            LC_COLLATE=C grep -E '^[a-z]+$' /usr/share/dict/words > latin_all
                                                            

                                                            Then, using zsh syntax (I can’t be bothered to rewrite the array expansions to be bash-compatible):

                                                            WL=(latin_all normalized_words latin_words wordle_words)
                                                            
                                                             # Letter occurs counts:
                                                            for F in $WL; do
                                                              { echo "= $F"; for L in {a..z}; do echo "$(grep -c $L $F) $L"; done | sort -nr } >t.$F
                                                            done; paste t.$WL | column -t; rm t.$WL
                                                            
                                                             # Total counts of letters, including repeated letters:
                                                            for F in $WL; do
                                                              { echo "= $F"; for L in {a..z}; do echo "$(fold -bw 1 < $F | grep -c $L) $L"; done | sort -nr } >t.$F
                                                            done; paste t.$WL | column -t; rm t.$WL
                                                            
                                                             # Letters repeated the most:
                                                            for F in $WL; do
                                                              { echo "= $F"; for L in {a..z}; do present=$(grep -c $L $F); total=$(fold -bw 1 < $F | grep -c $L);
                                                                echo "$((total - present)) $L" ; done | sort -nr } > t.$F
                                                            done; paste t.$WL | column -t ; rm t.$WL
                                                            

                                                            That should get you three tables of the letter distributions, across four dictionaries each. The first is “counts of words containing that letter”. The second is “counts of that letter” The third is “difference between the two prior counts”.

                                                            1. 2

                                                              I posted this in another thread that was deleted for some reason, but it’s sad to see that websites like this (and the mentioned mail-tester.com present SPF as a requirement, rather than as a deprecated standard that is superseded by DKIM.

                                                              SPF has as its core assumption that mail originating from a domain is always delivered by a mail server designated by that domain. In that world, mailing lists and forwards don’t exist. This was noted even when it was introduced, and DKIM doesn’t share its failings. Because mail server practices seem to be dominated by cargo culting, The recommendation to use SPF remains, long after the reasonable timeframe during which server operators could’ve switched to DKIM.

                                                              And yes, there are certain use cases for SPF, but they are limited to scenarios where delivery is less important than spam prevention.

                                                              1. 3

                                                                I don’t know about SPF as a whole being deprecated, but I do know that the specific SPF DNS RR has been deprecated in favor of serving up SPF via the TXT RR.

                                                                1. 2

                                                                  SPF is still widespread and covers a different scenario. SPF does not inhibit mailing-lists. I don’t like SPF and the externalization of cost it set a precedent for, in a pattern followed by DMARC.

                                                                  Mailing-list managers rewrite the SMTP Envelope Sender to point to an address which will feed back to the MLM, for bounce processing. SPF is enforced on the SMTP Envelope Sender. This is why SPF breaks .forward files which don’t use SRS to rewrite the sender to chain back through the forwarder. It’s unfortunate, but a pragmatic reality today that if you’re doing forwarding, then (a) you probably regret it; (b) you should use SRS.

                                                                  Now, DMARC enforcement breaks mailing-lists and leads to privacy violations of the list’s subscriber base and required MLMs to take some kind of action because people verifying DKIM signatures and then enforcing the DMARC policy would reject the mail through the mailing-list, bouncing it, causing one sender from a p=reject domain to cause a lot of other mailing-list subscribers to get disabled.

                                                                  In weighing whether or not linking to one of my old blog posts here is self-pimping, I decided that since I wrote some things above which someone is sure to dispute and claim is FUD, I’d better link to a starting-point for understanding; note that the two earlier posts referenced cover the privacy violations. https://bridge.grumpy-troll.org/2014/04/dmarc-stance/

                                                                  1. 1

                                                                    That blog is interesting, because I realize now I’ve seen the behavior you mentioned - by a message sender using SPF. As far as I can tell, it’s almost impossible to do Return-path rewriting on a message originating from an SPF domain, without also doing From rewriting. DKIM, at least, leaves the Return-path alone, and typically only imposes restrictions on the Subject field, which I find more acceptable.

                                                                    It’s interesting to me that you single out DKIM moreso than SPF in this instance.

                                                                    1. 1

                                                                      DKIM != DMARC. DKIM is fine, it’s the policy decisions for DMARC around forcing only using From: as the verifier. The big webmail providers didn’t want to change their UI to present List headers or Sender or anything, so they forced the rest of the world to overload From instead and change the semantics of authorship.

                                                                      I think DKIM is fine, I think DMARC is Very Flawed But Sometimes Necessary (if you disable the privacy violations).

                                                                  2. 1

                                                                    I’m hosting my business email with Runbox (self hosting my private email) and got an email from them saying this (among other things):

                                                                    We’ve recently become aware that Google via its Gmail service has started filtering messages from domains that do not have a SPF (Sender Policy Framework) record to the spam folder of their users. We’ve had a steady stream of reports about this so we are confident this is a new policy they have in place. This will also affect people using their own domain with Google’s email service and not just people with @gmail.com addresses.

                                                                    This would make not using SPF less of an option. Just adding this here in case someone stumbles upon this thread later on.

                                                                    1. 1

                                                                      Is this an official announcement by Google, does it apply to people with no SPF, but DKIM, etc? I’m filing this under the “email shamanism” that self-hosting administrators do to appease the mysterious gods.

                                                                      For what it’s worth, I’ve been exchanging emails with people on Gmail today, from my personal, SPF-less email domain, without issue.

                                                                      1. 1

                                                                        It’s not an official announcement from Google, but rather observed behavior as seen by a lot of support tickets sent to Runbox.

                                                                  1. 1

                                                                    zsh: ${(Q)${(z)cmd}}

                                                                    Whether this is good or not is left to the reader’s discretion.

                                                                    1. 2

                                                                      Regarding unsigned multiplication, there’s a false statement about Go:

                                                                      // But it also even has an explicit n * m form as well.
                                                                      make([]Object, n, m);
                                                                      

                                                                      That’s not a calloc-style n * m, that’s a length/capacity split. make([]Object, n, m) allocates memory to hold m items and sets the initial length to n.

                                                                      Really, make([]Object, n) is already n*m because it uses the size of Object and then n is a straight object count always.

                                                                      Also, Go is not based on LLVM, so limitations of LLVM do not apply; so the language claims are not proven factually incorrect.