Threads for pilif

  1.  

    Most ISPs default to a dynamic v6 prefix. Just like with ipv4 before (in case of mine, all I needed was an email to ask for a static prefix)

    Your infrastructure needs to be prepared for this, and while I personally prefer a static prefix, a dynamic one offers some (very slight) security and privacy benefits and allows the ISP to keep less state.

    1. 5

      FYI, this has to be applied after every system update.

      1. 1

        This is one of the biggest hurdles I have with macOS. I understand the security concerns and system integrity protections, but messing up my SSH config files is really nasty, and now this.

        1. 1

          I believe Apple treats much of the system as immutable, even if you can technically modify it.

          1. 3

            It’s more of a configuration file issue. The problem is the same as many linux bistros have where they naturally have to ship some default configuration with their packages. If those files are changed by the user and then the package is updated, what happens? Do you wipe the user config? Do you leave the user-modified file alone (potentially breaking the updated package completely if the update has non-backwards-compatible changes)?

            Debian asks the user, but that’s something a mainstream OS like macOS probably can’t afford to do (no user would understand the options and/or their consequences)

            Sometimes, the packages ship with a configuration that allows to include secondary files (for example /etc/sudoers which includes arbitrary files in /etc/sudoers.d in Debian and also macOS) which solves the problem with package-updates when the new package is compatible with the old config files, but of course it still doesn’t help when the new package isn’t compatible with the old files, but that’s rarely the case.

            I’m not aware that PAM has a means for including a directory full of files, so I don’t see how this compromise solution could work for PAM, so Apple would be stuck with either never touching the files after initial install (possibly breaking their intended configuration after an update) or always resetting them after an install (and bring the system to a clean state).

            I totally understand why they do the latter.

            1. 2

              Yeah, I get it. To be clear I’m not that upset about it, but wanted to give people a heads up since the post doesn’t mention it. While this sudo functionality is great in theory, my system ends up not having Touch ID for sudo more often that it does.

              The real solution is just for Apple to add this to the default pam config, but seemingly they have reasons not to.

              1. 1

                RPM usually keeps both files and you can review changes with rpmconf.

                Sometimes, the packages ship with a configuration that allows to include secondary file.

                I’ve been off apple ecosystem for years, but I clearly remember I was very annoyed when minor updates kept deleting exactly these files.

        1. 4

          Another thing to consider is that a cryptographic hash can reliably give you 128 bits that will uniquely and deterministically identify any piece of data. Very handy if you need to make UUIDs for something that doesn’t have them.

          1. 1

            But even a cryptographic hash has collisions, however unlikely. So there is always a chance that two distinct pieces of data will end up with the same id. But probably the same can happen, for example, with random UUID (except here you rely on the quality of your randomness source rather than the quality of your hash function and the shape of your data). Somehow using a crypto hash as a long-term id always feels iffy to me.

            1. 3

              Given the size of hashes, using the hash of the content as its id is totally safe. The hashing algorithms are designed such that collisions are so unlikely that they are impossible to happen.

              If this wasn’t the case, all systems based on content adressing would be in serious trouble. Systems like git or ipfs

              1. 1

                If this wasn’t the case, all systems based on content adressing would be in serious trouble. Systems like git or ipfs

                Any system that assumes a hash of some content uniquely identifies that content is in fact in serious trouble! They work most of the time, but IPFS is absolutely unsound in this regard. So is ~every blockchain.

                1. 2

                  I’ve seen several cryptographic systems rely on exactly this fact for their security. So while it’s probabilistic, you’re relying on the Birthday Paradox to ensure it’s highly unlikely.

                  From that table, for a 128-bit hash function, for a 0.1% chance of a collision, you’d need to hash 8.3e17 items. In practical terms, a machine that can hash 1,000,000 items per second would need to run for just over 26 millennia to have 0.1% chance of a collision.

                  For systems that use 256-bit digests (like IPFS), it would take many orders of magnitude longer.

                  1. 1

                    I’ve seen several cryptographic systems rely on exactly this fact for their security. So while it’s probabilistic, you’re relying on the Birthday Paradox to ensure it’s highly unlikely.

                    If collisions are an acceptable risk as long as their frequency is low enough, then sure, no problem! Engineering is all about tradeoffs like this one. You just can’t assert that something which is very improbable is impossible.

                    1. 1

                      I can still pretend and almost certainly get away with it. If the chances of getting bitten by this pretense is ten orders of magnitude lower than the chance of a cosmic ray hitting the right transistor in a server’s RAM and cause something really bad to happen, then for all practical purposes, I’m safe to live in blissful ignorance. And what a bliss it is; Assuming a SHA-256 hash uniquely identifies a given string can immensely simplify your system architecture.

                    2. 1

                      You’ve misread the table. 1.5e37 is for 256 bit hashes. For 128 bits it’s 8.3e17, which is obviously a lot smaller.

                      For context with IPFS, the Google search index is estimated to contain between 10 and 50 billion pages.

                      1. 1

                        You’ve misread the table

                        Thanks. Although it’s what happens when you start with an 256bit example, then remember everyone’s talking about UUIDs, and hastily re-calculate everything. :/

                2. 3

                  No, this is absolutely not a consideration. Your CPU and RAM have lower reliability than a 128-bit cryptographic hash. If you ever find a collision by chance it’ll be more likely to be false positive due to hardware failure (we’re talking 100-year timespan at a constant rate of billions of uuids per second).

                  And before you mention potential cryptographic weaknesses, consider that useful attacks need a preimage attack, and the “collision” attacks known currently are useless for making such uuids collide.

                  1. 2

                    Whenever you map from a set of cardinality N (content) to a set of cardinality less than N (hashes of that content) by definition you will have collisions. A hash of something is just definitionally not equivalent to that thing, and doesn’t uniquely identify it.

                    As an example, if I operate a multi-tenant hosting service, and a customer uploads a file F, I simply can’t assert that any hash of the content of F can be used as a globally unique reference to F. Another customer can upload a different file F’ which hashes identically.

                    “Highly unlikely” isn’t equivalent to “impossible”.

                    1. 4

                      It’s absolutely impossible for all practical purposes. It’s a useless pedantry to consider otherwise.

                      Remember we’re talking about v4 UUID here which already assumes a “risk” of collisions. Cryptographic hashes are indistinguishable from random data, and are probably more robust than your prng.

                      The risk of an accidental collision is so small, you can question whether there’s enough energy available to our entire civilisation to compute enough data to ever collide in the 128-bit space in it’s lifetime.

                      1. 1

                        It’s absolutely impossible for all practical purposes. It’s a useless pedantry to consider otherwise.

                        I mean it literally isn’t, right? “Absolutely impossible” is just factually not equivalent to “highly improbable” — or am I in the wrong timeline again? 😉 Going back to the hosting example, if you want to use UUIDs to identify customer documents that’s fine, but you can’t depend on the low risk of collisions to establish tenant isolation, you still have to namespace them.

                        1. 1

                          By your definition of impossible, there is literally no possible solution since it would require infinite memory. At that point you should question why using a computer at all.

                          The fact is that people don’t quite understand uuid and use it everywhere in meme fashion. Most uuid usages as database keys i’ve seen, it is even stored as a string. Which creates much more serious problems than those in discussion here.

                          1. 1

                            You don’t need infinite memory to uniquely identify documents among customers. You just need a coördination point to assign namespaces.

                            I agree that UUID using in the wild is…. wild. I don’t think most developers even really understand that the dashed-hex form of a UUID is actually just one of many possible encodings of what is ultimately just 16 bytes in memory.

                      2. 3

                        The only system behaviour that the universe guarantees is that all systems will eventually decay.

                        For any other behaviour you have to accept some probability that it won’t happe (hardware failure, bugs, operator error, attacks, business failure, death, and, yes, hash collisions).

                        Hash collisions with a good algorithm will often be a risk much lower than other factors that you can’t control. When they are, what sense does it make to worry about them?

                        1. 1

                          There is a categorical difference between hash collisions and the kind of unavoidable risk you’re describing, e.g. solar flares flipping a bit in my main memory.

                      3. 1

                        After doing some more thinking/reading, I would agree with you for something like SHA-256. But using a 128-bit hash still seems like a bad idea. I found this paragraph from a reply on HN summarizes it quite well:

                        In cases where you can demonstrate that you only care about preimage resistance and not collision resistance, then a 128-bit hash would be sufficient. However often collision attacks crop in in unexpected places or when your protocol is used in ways you didn’t design for. Better to just double the hash size and not worry about it.

                        I think Git’s messy migration from SHA-1 is a good cautionary tale. Or do you believe it’s completely unnecessary?

                        1. 1

                          Git’s migration is due to a known weakness in SHA-1, not due to the hash size being too small. I believe git would be perfectly fine if it used a different 128-bit cryptographic hash.

                          The first sentence you’ve quoted is important. There are uses of git where this SHA-1 weakness could matter. For UUID generation it’s harder to imagine scenarios where it could be relevant. But remember you don’t need to use SHA-1 — you can use 128 bits of any not-yet-broken cryptographic hash algorithm, and you can even pepper it if you’re super paranoid about that algorithm getting cracked too.

                          1. 1

                            The first sentence you’ve quoted is important. There are uses of git where this SHA-1 weakness could matter.

                            Yes, but isn’t git’s situation exactly what the rest of that paragraph warns about: SHA-1 is succeptible to a collision attack, not a preimage attack. And now everyone is trying to figure out whether this could be exploited in some way even though on the surface git is just a simple conten-addressable system where collision attacks shouldn’t matter. And as far as I can tell there is still no consensus either way.

                            And as the rest of that reply explains, 128-bit is not enough to guarantee collision resistance.

                            1. 1

                              If the 128-bit space is not enough for you, then it means you can’t use UUID v4 at all.

                              Their whole principle of these UUIDs is based on the fact that random collisions in the 128-bit space are so massively improbable that they can be safely assumed to never ever happen. I need to reiterate that outputs of a not-broken cryptographic hash are entirely indistinguishable from random.

                              Resistance of a hash algorithm to cryptographic (analytic) attacks is only slightly related to the hash size. There are other much more important factors like the number of rounds that the hash uses, and that factor is independent of the output size, so it’s inaccurate to say that 128-bit hashes are inherently weaker than hashes with a larger output.

                              Please note that you don’t need to use SHA-1. SHA-1’s weakness is unique its specific algorithm, not to 128-bit hashes in general. You can pick any other algorithm. You can use SHA-2, SHA-3, bcrypt/scrypt, or whatever else you like, maybe even a XOR of all of them together.

                  1. 1

                    The source appears to be linked from the website: https://github.com/withfig/autocomplete

                    1. 4

                      This is only the source code for autocomplete plugins. The app itself is closed source.

                      Took me a while to find out too and I believe that it’s somewhat intentionally obscured

                      1. 1

                        This is source for the server side. It’s like LSP where there is a node service that takes in a command like and spits out structured data about a completion. There’s a client, currently in rust for macos only, that calls and interprets the output.

                        The example shows running a new completion.

                    1. 11

                      These are all worthy and probably excellent enhancements - but they don’t address my core beef with PHP: The APIs.

                      The weird mix of pre OO big_long_namespaced_function_names() with some OO libraries make my skin crawl :)

                      (I’m looking at you, get_file_contents() :)

                      1. 12

                        It’s file_get_contents() btw 😊

                        1. 17

                          Details are important. Thanks for the correction.

                          On the up-side, this means I have succeeded in my goal of wiping PHP from my brain :)

                        2. 4

                          I think it’s probably hopeless to fix that stuff because there’s so much legacy code that depends on it.

                          My core beef is the existence of the PHP configuration file, but same story there.

                          1. 3

                            Yeah. I worked for a PHP shop a few jobs back, and one of the web devs responding to my carping about it said “Yeah but you see, the thing is, PHP isn’t for YOU or even US. It’s for EVERYONE, even non programmers”.

                            At the time I thought he was just defending it because I was tipping his sacred cow, but I do think there’s some truth in there.

                          2. 6

                            These are all worthy and probably excellent enhancements - but they don’t address my core beef with PHP: The APIs.

                            There’s two ways to address this, as far as I’m aware.

                            1. Use Composer to install an open source library that has the API you want. (e.g. https://github.com/thephpleague/flysystem)
                            2. Demand the PHP core team break backwards compatibility with the software that powers 80% of the Internet to satisfy your aesthetic opinions about APIs.
                            1. 4

                              So just to be clear I realize I am absolutely being the Princess and the Pea with my API issues. PHP is clearly a tool that makes many MANY people around the world productive, and Wordpress and Drupal and their ilk power huge swaths of the internet.

                              That said, I don’t have to particularly enjoy working with the language even if I do appreciate its many and varied strengths :)

                          1. 4

                            I’ve seen similar articles to this, but what I haven’t seen is a compelling list of reasons why you’d want to do this. Yeah the world is tight on public IPv4 addresses, but NAT is a thing and it doesn’t seem as dire as everyone said it was ~25 years ago.

                            1. 15

                              NAT means extra call latency. It means paying for extra IPs if you want to have separate data and management endpoint. It means getting rate limited and captcha’d because someone using the same ISP as you was misbehaving.

                              1. 8

                                As @viraptor said, NAT is bad for latency due to circuitous routing. It also makes direct connections on the net really difficult which makes it hard for any P2P protocol to take root. The limited IPv4 range also makes it really hard to send email or do anything else where IP reputation matters since there’s a high likelihood that a bad actor had an IP at some given point in time.

                                1. 2

                                  I agree on the latency, but you can’t expect the return of P2P connectability thanks to ipv6 because everybody will still be running a stateful firewall that drops all unsolicited incoming packets.

                                  There are some upnp like mechanisms for ipv6 to punch holes through firewalls, but they are much less common than their ipv4 counterparts and even if they were, at most you get as good connectivity, but hardly better.

                                  1. 2

                                    The limited IPv4 range also makes it really hard to send email or do anything else where IP reputation matters since there’s a high likelihood that a bad actor had an IP at some given point in time.

                                    On the flip side, won’t this make it very difficult to block bad actors?

                                    1. 10

                                      Relying on IP reputation has always been a terrible way to do security. There’s much better ways to do security.

                                      1. 2

                                        Moving away from IP-based reputation seems like a decent way to get back to a world where running your own mail server is possible again.

                                        1. 1

                                          or have the opposite effect, because google, etc decide to only allow a whitelisted group of IPs from “good” mail providers.

                                      2. 5

                                        I’d rather have a hard time blocking bad actors than accidentally block good ones

                                    2. 5

                                      Ironically, private IPv4 ranges and NAT make it much easier to actually have a home network where all gear has its own fixed address and you can connect to it.

                                      Most providers that bother to provide IPv6 on consumer connections at all use DHCP-PD in the worst possible way—the prefix they give you actually changes from time to time. That way you never know what exact address a device will get, and need a service discovery mechanism.

                                      With NAT, even if the ISP gives me a different WAN IPv4 address every time, that doesn’t affect any hosts inside the network.

                                      1. 7

                                        The big thing in IPv6 is “multiple addresses all the things”. Yeah, the public address for your device will change a lot, both due to prefix changes and due to privacy extensions. If you want a stable local address at home, don’t use the public one, use a ULA prefix.

                                        1. 2

                                          Giving things names is a lot nicer to work with than remembering IP addresses, though. mDNS+DNS-SD is good tech.

                                          1. 3

                                            mDNS is problematic for security because, well, there isn’t any. Any device on your network can claim any name. No one issues TLS certificates in the .local TLD that mDNS uses and so you also can’t rely on TLS for identity unless you’re willing to run a private CA for your network (and manage deploying the trusted root cert to every client device that might want to connect, which will probably trigger any malware detection things you have installed because installing a new trusted root is a massive security hole).

                                            1. 1

                                              It’s only for the local network, and I trust my network. It get trickier if you don’t, of course.

                                              1. 2

                                                It’s not about trusting your network, it’s about trusting every single entity on the network. Any phone that someone brings to your house and connects to the WiFi can trivially claim any mDNS name and replace the device that you think is there. This is mostly fine for things like SSH, where key pinning give you an extra layer of checks, but isn’t for most protocols.

                                                1. 2

                                                  I meant to write that I trust the devices on my network, but to access my wifi they’ll need my password - which they don’t get if I don’t trust them 🤷‍♂️

                                                  Given the convenience and lack of reasonable things to fear could happen it’s a net win for me, at least.

                                                  1. 2

                                                    Do you ever hand out the password to people that visit your house? Do you allow any IoT devices that don’t get security updates anymore? Do you run any commodity operating systems that might be susceptible to malware? If you answer ‘yes’ to any of these, then mDNS provides trivial a mechanism for an attacker who compromises any of these devices to impersonate other devices on your network.

                                                    1. 2

                                                      Don’t use the same network for all those? :)

                                                      I have a separate subnet (with no outbound internet access other than to an NTP server) for “Internet LAN of Things” devices, another one for guest Wi-Fi, and another one for my personal devices that I can actually trust.

                                                      1. 1

                                                        I use a separate vlan for guests. Problem solved.

                                          2. 4

                                            fundamentally the amount of devices behind NAT is limited by the amount of open sockets the firewall can have for tcp connections made by internal clients… the shortage is still relevant but we have kicked the can down the proverbial road, I’d wager another 10 years before enough connected devices seriously clog the available ipv4 space.

                                            1. 6

                                              Centralisation has also played a big part. 15 years ago, we expected to have a load of Internet connected devices in houses that you’d want to be able to reach from any location in the world. We now have that but they all make a single outbound connection to a cloud IoT service and that’s the thing that you connect to from anywhere in the world. You need a single routable IPv4 address for the entire service, not one per lightbulb. That might not be great for reliability (it introduces a single point of failure) but it seems to be the direction that has succeeded in the market.

                                              1. 7

                                                I think technology (lack of IPv4 addresses) and business needs (having your customers create an account and letting you see how they use your products is incredibly valuable) have converged to the “cloud service” model.

                                                Although, even if every lightbulb in your home has its own IPv6 address services to help manage them would spring up quite quickly, and the natural way to solve the problem would be a semi-centralized service gathering them all under one “account”.

                                          1. 37

                                            I saw this yesterday and I strongly suspect that this is a result of designing the drive firmware for the filesystem. APFS is a copy-on-write filesystem. This means that, as long as you commit the writes in the correct order, the filesystem remains in a consistent state. As long as the drive has sufficient internal battery power to commit all pending writes within a reorder window to persistent storage, then you don’t lose data integrity. Most Linux filesystems do block overwrites as part of normal operation and so this is not the case and you need journals along the side, along with explicit commit points, to guarantee integrity.

                                            1. 16

                                              As per this tweet, using F_BARRIERFSYNC to get writes in the correct order on macOS is also very slow, so this would very much be a problem for APFS too.

                                              1. 4

                                                F_BARRIERFSYNC

                                                Interesting! Does any other operating system have that?

                                                To implement a transaction, you need a write barrier. Nothing more; nothing less. Be it a CoW filesystem doing its thing, the OS atomically renaming a file, or an application doing the read-modify-update dance when you save, you need a way to ensure that the new data is written before you point to it. That is the fundamental problem with transactions, no matter which layer implements it. Only that way can you guarrantee that you either get the old data or the new data – the definition of a transaction.

                                                Waiting for the first part of the transaction to finish before you even submit the final write that completes it is such a poor man’s write barrier that you would think this would have been solved a long time ago. It obviously kills performance, but it does nothing for “durability” either: Compared to communicating a write barrier all the way down, the added latency just reduces the chances of saving your data before a power loss. If you care about hastily saving data, you can of course also fsync the final write, but that is a separate discussion: You could do that anyway; it’s irrelevant to the transaction. I think a write barrier could and should replace regular fsync in 999‰ of cases, on every OS.

                                              2. 6

                                                As long as the drive has sufficient internal battery power to commit all pending writes within a reorder window

                                                which is tricky for desktops running this type of hardware because they not hot have a battery.

                                                Then again, given the success of the M1 desktops and the lack of people complaining about file corruption, I have a feeling that at least under macOS, this is a theoretical issue at which point, why not be quick at the cost of no practical down-side.

                                                1. 19

                                                  Usually the battery is internal to the drive. It typically needs to be able to supply power to the drive for under a second, so can be very small.

                                                  1. 20

                                                    In the replies it is shown, that there seems to be no last-ditch attempt to commit the pending writes back to flash, as plugging the power out of a Mac Mini results in data loss of the last few seconds of written data, so it appears that the drive has no sufficient internal battery power, which leaves you with a system that doesn’t have data integrity by default.

                                                    1. 8

                                                      Consumer grade SSDs generally don’t have anything to do last-ditch flushes, I have definitely seen uncommitted ZFS transactions on my WD NVMe drives.

                                                      The only integrity problem here is that macOS does fake fsync by default, requiring a “FULLSYNC” flag for actual fsync.

                                                      1. 14

                                                        A thing I learned from the thread is that Linux is actually the outlier in having standard fsync be fullsync. FreeBSD I guess has the same problem as macOS as it is permitted by POSIX.

                                                        What marcan42 was most shocked by was the disk performance. He was only able to get 42 IOPS with FULLSYNC, whereas Linux usually does FULLSYNC all the time and gets much better performance.

                                                        Either (a) all drives except for Apple’s lie about what happens in FULLSYNC and don’t do enough, or (b) something is wrong with Apple’s drives.

                                                        1. 14

                                                          FreeBSD did fsync correctly long before Linux fixed theirs; so did ZFS on all platforms. Correctly meaning not just waiting for the flush but also checking that the flush itself actually succeeded (other OSes not checking is what took Postgres devs by surprise there).

                                                          or (b) something is wrong with Apple’s drives

                                                          Yeah, marcan’s suspicion is that they didn’t optimize the controller firmware for that case because macOS cheats on fsync so it never came up.

                                                    2. 1

                                                      If that’s true and a battery/large cap is on the board/controller/ssd, then the initial complaint is a bit overblown and full honest-to-god fsync really isn’t necessary?

                                                    3. 5

                                                      lack of people complaining about file corruption

                                                      APFS is CoW so of course those complaints would be very unexpected. What would be expected are complaints about the most recent FS transactions not getting committed due to macOS doing fake fsync by default (requiring a “full sync” flag to really sync). But either nobody runs high-durability production databases on their Macbooks (:D) or all those databases use the flag.

                                                      1. 21

                                                        They do. SQLite uses F_FULLSYNC and has done so since it was first incorporated into macOS in 2004. LMDB uses it. CouchDB does. I know Couchbase does because I put the call in myself. I would imagine any other database engine with Mac support does so too.

                                                        1. 1

                                                          Hmm. Then I’d imagine they’re seeing performance dips on M1 as well, right? I wonder how they’re dealing with that—treating it as a bug or just an unavoidable regression.

                                                          1. 4

                                                            I work with SQLite a lot in my day job and haven’t noticed a regression on my M1 MBP (quite the opposite, really.)

                                                            It’s always been important when working with SQLite to batch multiple writes into transactions. By default every statement that changes the db is wrapped in its own transaction, but you don’t want that because it’s terrible for performance, on any filesystem. So for example my code handles streaming updates by batching them in memory briefly until there are enough to commit at once.

                                                            1. 4

                                                              It’s always been important when working with SQLite to batch multiple writes into transactions.

                                                              Yes. Transaction speed is a very limited resource on many storage devices. The SQLite FAQ says it well: If you only get 60 transactions per second, then yes, your harddisk is indeed spinning at 7200rpm.

                                                              my code handles streaming updates by batching them in memory briefly until there are enough to commit at once.

                                                              Nice! I wish all developers were that responsible. I fondly remember having to clone a codebase to /dev/shm to be able to run its SQLite tests in reasonable time before I had an SSD. When you have a mechanical harddisk, it becomes loudly evident when somebody is abusing transactions. That was also before SQLite got its new WAL (write-ahead log) transaction mode that can supposedly append commits to a special *.wal journal file before merging it in with a normal transaction. Have you tried it? It sounds like it would do much of the same as you do in terms of fsync load.

                                                              1. 4

                                                                WAL is a big improvement in many ways — besides being faster, it also doesn’t block readers during a transaction, which really improves concurrency. But I think it still does an fsync on every commit, since even an append-only log file can be corrupted without it (on some filesystems.)

                                                    4. 3

                                                      Is apfs actually guaranteed CoW-only? Other CoWs do optimisations where some updates will write in place where it’s deemed safe. Only the log filesystems guarantee no updates if I remember correctly.

                                                      1. 7

                                                        apparently no:

                                                        [APFS lead dev] made it clear that APFS does not employ the ZFS mechanism of copying all metadata above changed user data which allows for a single, atomic update of the file system structure

                                                        APFS checksums its own metadata, but not user data […] The APFS engineers I talked to cited strong ECC protection within Apple storage devices

                                                        Since they do not have to ensure that the whole tree from the root down to every byte of data is valid, they have probably done precisely that kind of “optimization”. Welp.

                                                        1. 3

                                                          Oh… that makes me sad.

                                                          But thank you for the link, this is a quote I love for a few reasons:

                                                          For comparison, I don’t believe there’s been an instance where fsck for ZFS would have found a problem that the file system itself didn’t already know how to detect. But Giampaolo was just as confused about why ZFS would forego fsck, so perhaps it’s just a matter of opinion.

                                                      2. 3

                                                        Do you mean people ought to use a copy-on-write filesystem on Linux and all performance gains (and integrity risks) are gone? Or is it not that simple?

                                                        1. 3

                                                          You can read up on nilfs v2 or f2fs and look at some benchmarks to get an idea of where things stand. They’re both CoW, but don’t bring in the whole device management subsystem of btrfs or ZFS.

                                                        2. 2

                                                          marcan manage to trip apfs data loss and data corruption :-/

                                                          1. 5

                                                            Data loss but not data corruption. The GarageBand error came from inconsistent state due to data loss. I didn’t see anything indicating any committed file data was corrupted, but navigating Twitter is a nightmare and I may have missed some tweets.

                                                            1. 3

                                                              If GarageBand saves enough data to presumably partially overwrite a file and leave it in an inconsistent state is that not corruption?

                                                              That said it seems more an APFS problem as they should know that the fullsync call needs to be made. It’s not ok to skip doing it just because the hardware is apparently absurdly slow :-/

                                                              1. 3

                                                                That wasn’t my interpretation. It sounded like there were multiple files involved, and one of them was missing, making the overall saved state inconsistent but not necessarily any file corrupt.

                                                                The tweet:

                                                                So I guess the unsaved project file got (partially?) deleted, but not the state that tells it to reopen the currently open file on startup.

                                                                So probably the project file wasn’t synced to disk but some CoreData state was. I’m not saying APFS cannot corrupt files, it probably can, but I don’t see any strong evidence that it does in these tests. This sounds like write loss to me.

                                                        1. 9

                                                          To get this wrong does take some effort, and in well-crafted code should never happen

                                                          And then proceeds to show an example where the difference between well-crafted and non-well-crafted code is a single character suffix to a string literal.

                                                          But that’s just par for the C/C++ course. Not a day passes where some well-crafted code turned out to not be and at the same time no day passes without a C programmer telling us that their code is fine and the mistakes were made by others

                                                          1. 1

                                                            I don’t think that’s entirely fair. The example is particularly contrived to show the difference (and is something that static analysis will pick up, even if it did pass code review). In practice, std::string_view is used almost entirely for non-captured function arguments. In this usage, it is entirely safe (the caller must own the underlying character storage and if you want to pass a string literal, you’ll just pass a string literal directly and not explicitly construct the string view).

                                                          1. 3

                                                            There is also what appears to be a sound card, along with a single 36-pin connector I don’t recognize.

                                                            Probably a connector for a CD-ROM. Before there was IDE CD-ROM drives, there were a few proprietary interfaces, often offered by sound cards

                                                            1. 1

                                                              Maybe, but the pre-IDE CD connectors I know of are either 40-pin (Mitsumi, Panasonic) or 34-pin (Sony), not 36. And the WaveBlaster header is 26 pins.

                                                              1. 1

                                                                It’s really hard to say without seeing pictures. 36-pin makes me want to say Centronics connector, which were used to connect to various peripherals, including CD-ROM players, but also a lot of other things. But Compaq also had a weird proprietary 36-pin ISA slot way back, that may simply be an external connection cable (although… you know, it wouldn’t make that much sense for it to go on the sound card).

                                                              1. 7

                                                                I understand that this is the last event in a long series, but when he refused in 2020 to stop supporting big corporations, why didn’t he change the license to a more restrictive one? To my knowledge, there are many big companies allergic to the GPL family of license, which would have lead to the wished outcome.

                                                                1. 4

                                                                  If he had already accepted outside contributions, it may have been difficult to get existing contributors to accept a license change.

                                                                  1. 1

                                                                    Ah good point.

                                                                  2. 3

                                                                    He didn’t even need to change the license file in the main, he could have just modified the license file to explicitly exclude companies by name, by market cap, by country, by any arbitrary condition he wanted. It would no longer have been considered an open source license by any definition of the term, but it would have been usable by everyone he wanted to use it.

                                                                    Step two though, would have been license violation detection, which is a whole different issue, especially once the JavaScript gets minified. And we haven’t even gotten to enforcement.

                                                                    This stuff is tricky and I believe in doing the right thing. But it is very easy for any company to do the wrong thing, intentionally or not. If you really really care about company’s not using your code, you might want to just consider not open sourcing.

                                                                    1. 2

                                                                      Or use a copyleft license like the GPL, which will likely induce the companies you don’t want to use your software to voluntarily avoid using it.

                                                                      1. 2

                                                                        I have a feeling this isn’t as much about companies not wanting to use your software as it is about getting paid for your work.

                                                                        Which of course is tricky given the trivial nature of many of these packages: Yes. It’s very convenient to use a pre-existing package for emitting colors on the console, but also, yes, it’s much easier to either not emit colors or to quickly hack up something to emit the requires ANSI color bytes rather than going through all of the red-tape to get the author of that package paid.

                                                                        And even if you do: The functionality offered by such a relatively simple package can be implemented by an engineer in less than a day, so as a company I’m wondering how much I’m actually willing to pay for this.

                                                                        It gets worse though: While the packages in question were certainly downloaded by some big companies CI systems, it’s not even clear whether this was a direct dependency of their software. This might have very well been a package used by a package used by a package of some (much more high-value) dependency of theirs which they actually might be sponsoring in one way or another.

                                                                        So is it now the expectation of open source projects which accept sponsorships to then further pay off all their dependencies? And their dependencies dependencies? What percentage of your income should you forward downstream?

                                                                        Or in fewer words: This is a complicated mess

                                                                        1. 1

                                                                          Not to forget about who is deciding about what deps should be added to a library? Is it acceptable to add colored text output to a build tool? How necessary is it? Did anyone asked for it or did they added so the coloring lib can demand it’s share?

                                                                          1. 1

                                                                            Is it acceptable to add colored text output to a build tool

                                                                            that’s up to the build tool author to decide, but I would argue that nowadays it’s not just acceptable but required to be competitive with other build tools. It sucks, but that’s the world we live in. Bling sells.

                                                                            But it should not matter for the Fortune 500 company C using (and sponsoring) build tool B whether that tool also happens to have library L as its dependency and whether that dependency is compensated for their effort.

                                                                            You can’t expect company C to be going through the whole dependency graph of B and add micro-sponsorings to each of those deps, even more so as I would assume the author of library L wouldn’t be happy with whatever percentage of the sponsoring pie they would get from company C even though colored output on the command-line is only a tiny subset of their usage of B and possibly even disabled on the CI (but still downloaded).

                                                                            So the author would still be disgruntled and this would still happen.

                                                                  1. 2

                                                                    If it’s stupid, but it works … it’s not stupid.

                                                                    Well, maybe sometimes it’s a little bit dumb. I’ll take dumb but solves a problem over not solving the problem though.

                                                                    1. 1

                                                                      It does expose your TOTP code to the network.

                                                                      1. 1

                                                                        It is a fun hack, nothing anyone should use.

                                                                        1. 2

                                                                          i feel like maybe we should discuss that…

                                                                          is it exposing your TOTP code to the network? isn’t the whole point of TOTPs that any knowing the TOTP would not expose the underlying algorithm?

                                                                          is it even possible to guess a TOTP given knowledge of n previous TOTPs? i do know it’s fairly easy to brute force a TOTP when there is no rate limiting in place, and i think this would definitely be one of those cases

                                                                          1. 2

                                                                            Since it’s time-based, and nothing that I see (from my quick skim) is keeping track of which codes have been used, a network observer who sees what IP addresses you’re talking to should be able to bypass your TOTP protection as long as they connect to the same IP address within that 30 second window or whatever.

                                                                            1. 2

                                                                              I checked a few TOTP implementations out there and not all of them invalidate codes after use. Github for example happily accepts the same code multiple times within the same time period.

                                                                              I agree that blacklisting codes after use is good practice, but it’s just one more safety measure. Only checking the TOTP without blacklisting is not the same as not checking a TOTP

                                                                              1. 2

                                                                                Github for example happily accepts the same code multiple times within the same time period.

                                                                                That’s against the specs and a pretty serious bug. It’s called “one time” for a reason.

                                                                              2. 1

                                                                                If they can guess the IP then they have already broken your TOTP anyway…

                                                                                1. 4

                                                                                  Somebody who can watch your IP traffic (watch, not decrypt!) does not need to guess the IP.

                                                                                  1. 3

                                                                                    sure, but they still would need the SSH key to access the machine.

                                                                                    1. 1

                                                                                      TOTP is supposed to be the second factor that protects you when someone has stolen your first factor. If your security is only as good as the first factor, then you don’t have 2FA.

                                                                                    2. 2

                                                                                      Oh, sure, so they have a handful of seconds to try cracking your password before it rotates.

                                                                                      1. 1

                                                                                        Absolutely; that’s why a solution like fail2ban is probably the better idea and more comfortable to use.

                                                                                        1. 1

                                                                                          Yes, so at least it would provide that much protection – reducing the window of exposure.

                                                                              3. 1

                                                                                How? All the ip addresses exist, it just changes the firewall rules. You would have to bruteforce the code in the time to find it, no?

                                                                                1. 2

                                                                                  no TLS for the TOTP “code”, it’s plain in the connection IP

                                                                            1. 4

                                                                              Automounters existed since the 80s but sure, let’s just reinvent the wheel… hang on a minute it’s not round any more - it’s square!

                                                                              1. 10

                                                                                yawn systemd bashing, how boring, and how expected

                                                                                anyway, this is a typical strawman response.

                                                                                automounting is not a novel concept… but noone claimed it is. the post shows how systemd nicely integrates this concept with other systemd concepts so that, for example, it becomes easy to start services that depend on such a network mount, ensuring the right ordering during boot, etc.

                                                                                1. 3

                                                                                  yawn systemd bashing, how boring, and how expected

                                                                                  My comment wasn’t systemd-specific - it was aimed at any solution in need of a problem - my point was that automounters have existed for well over three decades and they do well what they’re good at.

                                                                                  Also, unless I’m missing something, the article describes automating an fstab(5) mount - it isn’t a real automount as in, mount on request/access and unmount if not in use, not to mention other features such us various substitutions (key, wildcard, variable, etc.), etc.

                                                                                2. 4

                                                                                  None of the auto mounters I know of integrated with system service ordering and dependencies very well. Also systemd already is the place where fstab handling happens, so it effectively needs to be an (auto)mounter anyway. I don’t think reinventing the wheel criticism really applies here.

                                                                                  1. 1

                                                                                    I’m not the biggest fan of systemd… But I do think your comment isn’t very constructive nor a good criticism of this feature of systemd.

                                                                                    1. 1

                                                                                      I have moved from autofs to using the systemd built-in feature in the cases where we are automounting. The advantage is that systemd is already installed, so an additional package is not needed any more, it uses the same syntax for defining mounts as we’re already used to for services and service startup order can be nicely integrated into the auto mount availability.

                                                                                      This as done away with quite a few sleep hacks in old style scripts and so far worked perfectly and did not necessitate new hacks.

                                                                                      Do I need my init system to have built-in automount support? No. But it’s very convenient and very robust, so I’m more than happy to use it.

                                                                                      If you are concerned by bloat or unhappy with the functionality provided by systemd, feel free to continue using autofs or anything else for your mounts.

                                                                                      The one thing I wish was different is if I could have the mount and automount unit in a single file. systemd is very boilerplaty that way and while I see that in some cases the split makes sense, in my simple use cases, it’s just baggage

                                                                                    1. 4

                                                                                      systemd compares the system time to a builtin epoch, usually the release or build date of systemd. If it finds the system time is before this epoch, it resets the clock to the epoch

                                                                                      I think this is a great example of why systemd is capable of so many features. You could totally let this be done by a different service, but then you’ll have to re-introduce this kind of trigger into systemd - in a way that it does run before all the things that need DNS. Otherwise your init system plays “crash everything” on bootup which is totally worthless.

                                                                                      1. 2

                                                                                        This can easily run as your own unit though. There is no need for systemd to do it specifically. You can create a job like that and make sure it runs before network.target.

                                                                                        1. 5

                                                                                          While true, you cannot run a unit as early in the boot process as this systemd method, which happens in main before targets and units are started. In most cases this doesn’t matter, but it’s nice to not have all my system logs starting 41 years ago.

                                                                                          1. 3

                                                                                            I see there point, however I prefer “obviously wrong time” to “maybe from the last boot or maybe the bootup was actually hanging for minutes” timestamps.

                                                                                            1. 1

                                                                                              as you see in the article, that comes at some drawbacks and might cause manual intervention when a somewhat wrong timestamp could be good enough for everything to come back up where NTP will be used to correct the timestamps.

                                                                                              I guess it’s something where you have to decide for each machine whether audibility of timestamps or resilience against manual intervention is more important.

                                                                                              1. 1

                                                                                                It seems to me that what you really need, then, is a log message saying “time changed from x to y”? Does systemd’s implementation include such a message?

                                                                                                1. 1

                                                                                                  AFAICT, journald does not, but the NTP client I use does log when it adjusts time when it starts up, which is good enough.

                                                                                                  As for “maybe from the last boot” I usually invoke journalctl --boot=0 unless I’m specifically looking for logs from previous boots.

                                                                                                  1. 1

                                                                                                    Ah, by “systemd’s implementation” I meant systemd-timesyncd. Though the journal noticing this would probably work too, now that I think about it

                                                                                        1. 8

                                                                                          For ranges, have a look at the dedicated range types rather than adding two columns. This solves questions about whether they should be inclusive or not, provides many helper functions and operators and also allows to add an easy check constraint to ensure ranges are not overlapping.

                                                                                          Of course the information provided in the article is perfectly valid, but given that we are talking less known Postgres features, I thought I list one that’s related.

                                                                                          In general, I would recommend to always look at native data types if they are available because they will ensure correct data at the time of insertion and because you get to use all of the provided functions and operators to manipulate and query them.

                                                                                          1. 3

                                                                                            Range types can be useful for the type of use case presented in the article (scheduling). PostgreSQL 14 even went further and added a Multirange Types that can potentially be used to represent the entire schedule.

                                                                                            However, there are a few downsides to using custom types. Two that I often consider:

                                                                                            • ORM support
                                                                                            • How easy would it be for non-developers to interact with these types (from reporting tools etc.).
                                                                                          1. 3

                                                                                            Ubuntu 21.10 brings the all-new PHP 8 and GCC 11 including full support for static analysis

                                                                                            Why is PHP of all things suddenly the headliner?

                                                                                            1. 6

                                                                                              PHP 8 is much faster. That’s pretty good for something that’s basically old and boring tech nowadays.

                                                                                              1. 3

                                                                                                Going purely off memory here, but doesn’t Wikipedia run on Ubuntu and use PHP?

                                                                                                1. 2

                                                                                                  PHP is still pretty massive.

                                                                                                  1. 1

                                                                                                    brings the all-new PHP 8

                                                                                                    also, the next major release 8.1 is about to be released in about a month. I don’t think “all new” is a valid qualifier any more.

                                                                                                  1. 1

                                                                                                    I didn’t see how this new setup, as good as it is (and it does appear to be a good set up) solves the initially stated issue of when employees leave and revoking their OpenVPN certificates? The same would still have to happen here, albeit just with a more friendly web interface. Does Tailscale support integration with things like Yubikeys (or other similar devices)?

                                                                                                    1. 5

                                                                                                      There is no “revoking” certs. We just deprovision their GSuite account and we’re done.

                                                                                                      1. 3

                                                                                                        The great thing a out Tailscale is that they do all authentication via external services like Google or even GitHub.

                                                                                                        While you can easily forget to revoke a client cert (plus: certificate revocation is still tricky), you probably won’t forget to revoke GitHub org access.

                                                                                                      1. 2

                                                                                                        I’m currently evaluating the possibility of using Tailscale for a similar setup, but even with ACLs I’m concerned about the always-online state of Tailscale as a means for accessing the prod network even with ACLs in place.

                                                                                                        Having production machines constantly reachable asks for mistakes being made by authorized users or all too easy access for compromised client machines (which is probably the most likely attack vector anyways).

                                                                                                        I was considering using the Tailscale API to dynamically set ACLs for limited time periods, but at that point why not just use a session based VPN - especially considering that Tailscale API keys have a limited time validity and can only be renewed manually.

                                                                                                        What are everybody’s opinions about using an always-on VPN for prod network access?

                                                                                                        1. 8

                                                                                                          If you have serious data that you need to keep safe, you need to use applications and access mechanisms that are always authenticated anyway. Just because you’re on the VPN it doesn’t mean you should be able to drop tables or whatever – it’s just another perimeter layer. You should still be using SSH with hardware tokens, some kind of 2FA, etc. Your application components (internal services, databases, etc) should ideally use mutual TLS authentication to communicate. At that point, VPN access isn’t magically equivalent to superuser access, it’s just a convenient way to not have to give everything a public IP.

                                                                                                          1. 1

                                                                                                            Yeah I share similar concerns, and it seems like some kind of ephemeral access would be a big boon here. A la Teleport. If Tailscale could bake something like that (short or time based duration access) in to their system it’d be pretty cool.

                                                                                                          1. 2

                                                                                                            Neat! Any reason not to put it in the app store so you don’t have to worry about updates etc?

                                                                                                            1. 2

                                                                                                              Great question! I haven’t even started looking into the feasibility of that yet. The app does some pretty weird things in order to support installable plugins, and I’m not even sure where to start on checking if they are App Store compatible or not.

                                                                                                              1. 2

                                                                                                                if it can (and does) run sandboxed and does not call any private APIs (maybe electron does internally? I don’t know), it’ll be fine.

                                                                                                                1. 2

                                                                                                                  I need to fully understand what “runs sandboxed” means I think. I have it signed and notarized, but do I also need to opt into the “com.apple.security.app-sandbox” entitlement? I haven’t done that yet.

                                                                                                                  I have a few research notes on the sandbox here: https://github.com/simonw/datasette-app/issues/31

                                                                                                                  1. 3

                                                                                                                    Yes. You have to opt into the sandbox. If it still runs fine (activity monitor has a column to indicate whether the process is sandboxed or not) and if electron doesn’t use any private API, you’re good

                                                                                                            1. 10

                                                                                                              Can we not wait for the official announcement on Debian’s news page?

                                                                                                              1. 3

                                                                                                                I think it is fun to follow along the release process

                                                                                                                1. 4

                                                                                                                  At the risk of being the fun police, following along with processes isn’t an activity well suited to a link aggregator where a post like this one is going to hang around on the front page for a couple of days. It would be kinder to everyone to save them a pointless click and wait a moment for the actual release announcements.

                                                                                                                  1. 6

                                                                                                                    The debian team is actively promoting the release process on Twitter. The title of my submission also reflects that. If you are not interested, then ignore it. Geez, do people really have to complain about everything these days?

                                                                                                                    1. 3

                                                                                                                      I would grant you your point, but the actual release post was already downvoted once with „already posted“ even though that would be the better post to have a discussion of the release compared to this one