1. 38
  1. 9

    I don’t know if the lessons at the end are meaningful. A power surge that can fry hardware doesn’t care about your software setup or partitions. Electrons go where they want to go. Read-only partitions, or not, I expect things survived by accident only.

    1. 5

      The EFI partition and its used area may have been small and/or right at the (inner or outer) edge of the disk.

      Physical layout could have a hand in the odds

    2. 3

      Worth reading just for the last line.

      1. 3

        My experience is exactly the opposite. We all know what the plural of anecdote isn’t data, but still. I had EXT3/4 partitions suffer all kinds of power loss and hardware failures, and they were always recoverable.

        But that one time I decided to go with XFS for the root partition, a power loss event killed it instantly. The data partition, which was EXT3, just needed a routine fsck. I never used XFS since then: once bitten, twice shy, you know.

        I still haven’t tried BTRFS, so I can’t say anything on that subject yet.

        1. 2

          Out of curiosity, how long ago was this? I know XFS used to have pretty significant reliability issues in the past, but I’ve been using it nowadays for quite a while without issues.

          1. 1

            That XFS incident was in 2008 or such, very long time ago. None of my friends use XFS, so I had no way to know if it improved and after that event I didn’t feel like trying it again without a solid proof that it improved. :)

            1. 1

              Ah interesting. I know by around 2012 a lot of major improvements had either very recently been, or were soon to be, pushed into XFS (https://xfs.org/images/d/d1/Xfs-scalability-lca2012.pdf), which included the addition of checksums on metadata. I do also know that it had a strong tendency to lose data on power loss in the past, but as for some very anecdotal evidence, I’ve been using it for a few years now on my personal system, and it’s endured at least several dozen forced shutdowns without data loss.

          2. 2

            Indeed – this post and the ensuing discussion reinforces my belief in my Grand Unifying Theory of Filesystems.

            1. 1

              Paraphrase:

              For all filesystems there exists a user that says “$fs ate my data”

          3. 1

            mdadm is not discussed as part of the recovery, a serious oversight. It is not enterprise colocation if it doesn’t have a proper electrical system.

            1. 5

              Sometimes proper electrical systems fuck up too, speaking from experience. :-( Though you’re right, I’d have expected any large modern building to be able to take a lightning strike.

              Also I would expect the EFI partition to be mounted read-only, or at least not actually have anything writing to it basically ever, so it’s not super surprising it survived.

              1. 7

                Having lived in Africa in The Bad Old Days and in NZ…. I’ll tell you there is a vast geographical degree of variation in what constitutes a “lightning strike”.

                Something that will survive for years NZ will be fried within days in an African highveld summer.

                Yes, a bad strike can turn a UPS into an expensive fuse and you consider yourself lucky if that’s all that blows.

                In the end we just learnt to shut everything down and unplug during the bad ones.

                1. 3

                  Yeah…“the colo power will never fail”, “the ‘cloud’ service will never be unavailable”, “failover always works” and other lies we told ourselves before we gain some experience.

                  1. 2

                    proper electrical systems fuck up too

                    For any sort of campus PBX cabling, I learned to install lightning arresters at both ends, which fuses each telephone line to ground. One time, a lightning strike struck near a ped-mount gate lock/intercom and blew up the KSU interlock. Who’d’ve thunk?